AI Transparency and Explainability — SHAP, LIME, Attention, and When Each Method Works

Explainability method comparison by use case, computational cost, faithfulness to model behavior, and regulatory requirements for AI decision explanation.

Kenny Tan 15 April 2026

Your AI Made a Decision — Can You Explain Why in a Way That’s Both Accurate and Useful?

A lending AI rejects a loan application. The applicant has a legal right to know why. Your customer success team needs to explain the rejection in plain language. Your compliance team needs to verify the explanation is truthful. Your engineering team needs to debug why the model rejected a seemingly qualified applicant. These four audiences need four different types of explanation from the same decision — and most explainability tools serve only one of them well. This guide compares explainability methods by use case, computational cost, and a critical dimension most guides ignore: faithfulness — whether the explanation actually reflects what the model did.

The Explainability Problem Is Four Problems

Audience	What they need	Explanation type	Faithfulness requirement
End user	Actionable reason for the decision	Natural language, simplified	Low — approximate is acceptable
Customer support	Talking points for explaining decisions	Key factors, plain language	Medium — must not contradict model behavior
Compliance/audit	Verifiable evidence the decision was lawful	Feature attribution, bias metrics	High — must reflect actual model reasoning
Engineering	Debug signal for model behavior	Full feature importance, decision boundary	Highest — must be faithful to model internals

The core tension: Faithful explanations are complex. Simple explanations are unfaithful. No single method satisfies all four audiences.

Explainability Methods Compared

SHAP (SHapley Additive exPlanations)

Dimension	Value
What it measures	Marginal contribution of each feature to the prediction
Theoretical basis	Shapley values from cooperative game theory
Faithfulness	High for tree models; approximate for neural networks
Computational cost	O(2^n) exact; O(n·k) with sampling approximation
Output format	Per-feature importance values (positive/negative contribution)
Best for	Tabular data, tree-based models, compliance documentation
Limitation	Slow on high-dimensional data; correlated features produce misleading values

SHAP provides the most theoretically grounded feature attribution. For tree-based models (XGBoost, LightGBM, Random Forest), TreeSHAP computes exact Shapley values efficiently. For neural networks, KernelSHAP approximates but can be unreliable on correlated features.

Production cost at scale:

Model type	Features	SHAP computation time per prediction	Scalability
XGBoost (TreeSHAP)	50	5-20ms	Real-time viable
Random Forest (TreeSHAP)	50	10-50ms	Real-time viable
Neural network (KernelSHAP)	100	2-30 seconds	Batch only
LLM (token-level SHAP)	1,000+ tokens	10-60 minutes	Research only

LIME (Local Interpretable Model-agnostic Explanations)

Dimension	Value
What it measures	Local linear approximation of model behavior around a specific input
Theoretical basis	Perturbation-based local surrogate model
Faithfulness	Moderate — the linear approximation is inherently unfaithful to non-linear models
Computational cost	O(n·k) where n=features, k=perturbation samples (typically 1,000-5,000)
Output format	Per-feature weights in a local linear model
Best for	Quick explanations, model-agnostic applications, user-facing summaries
Limitation	Unstable — different random seeds produce different explanations; unfaithful on complex decision boundaries

LIME explains by asking “if I perturb the input slightly, how does the output change?” and fits a linear model to the perturbation results. This is intuitive but the linear approximation can be misleading for highly non-linear models.

LIME stability problem: On the same input, running LIME twice with different random seeds can produce different top features. This instability makes LIME problematic for compliance and audit use cases where explanations must be consistent.

Configuration	Stability (same top-3 features across 10 runs)	Computation time
100 perturbations	55-65%	0.5-2s
1,000 perturbations	75-85%	2-10s
5,000 perturbations	88-93%	10-30s
10,000 perturbations	93-97%	30-60s

Attention-Based Explanations

Dimension	Value
What it measures	Which input tokens the model attended to during generation
Theoretical basis	Transformer attention weights
Faithfulness	LOW — attention weights do not reliably indicate causal influence
Computational cost	Near zero (attention weights are a byproduct of inference)
Output format	Token-level attention heatmap
Best for	Debugging, rough intuition, visualization demos
Limitation	Jain & Wallace (2019) showed attention often does not correlate with feature importance

The attention faithfulness debate is settled. Multiple studies have demonstrated that:

Attention can be redistributed across tokens without changing the output
High attention on a token doesn’t mean that token influenced the prediction
Alternative attention patterns exist that produce identical outputs

Attention visualizations are useful for building intuition about model behavior. They are not reliable explanations of why a specific decision was made. Do not use attention-based explanations for compliance or audit purposes.

Counterfactual Explanations

Dimension	Value
What it measures	Minimum input change needed to flip the decision
Theoretical basis	Nearest counterfactual in feature space
Faithfulness	High — directly tests what the model responds to
Computational cost	Varies; optimization-based methods: 1-30 seconds per explanation
Output format	”If X were Y instead, the decision would have been different”
Best for	End-user explanations (actionable), loan rejections, hiring decisions
Limitation	May suggest infeasible changes; multiple valid counterfactuals exist

Counterfactual explanations are the most user-friendly format because they’re actionable: “Your loan was rejected. If your debt-to-income ratio were 35% instead of 48%, the loan would have been approved.” The user knows exactly what to change.

Counterfactual quality metrics:

Quality dimension	Definition	Good threshold
Proximity	How small is the change?	Minimum feature changes (ideally 1-3)
Plausibility	Is the changed input realistic?	Must pass domain constraints
Diversity	Are multiple paths to a different decision shown?	2-3 diverse counterfactuals
Actionability	Can the user actually make the suggested change?	Only changeable features
Sparsity	How many features change?	Fewer is better

Method Selection Matrix

Requirement	SHAP	LIME	Attention	Counterfactual
Regulatory compliance	Best	Adequate	Inadequate	Good
End-user explanation	Poor (too technical)	Moderate	Poor	Best
Debugging	Good	Moderate	Good (quick intuition)	Moderate
Stability/reproducibility	High (exact SHAP)	Low-moderate	High	Moderate
Speed (real-time)	Tree models only	Moderate	Best (free)	Slow
LLM applicability	Very limited	Limited	Available but unfaithful	Promising
Theoretical soundness	Strongest	Moderate	Weakest	Strong

Explainability for LLMs — The Hard Problem

Traditional explainability methods were designed for tabular models with interpretable features. LLMs operate on thousands of tokens with complex interactions. Applying traditional methods to LLMs is either computationally infeasible or produces unfaithful explanations.

Current LLM Explainability Approaches

Approach	What it provides	Faithfulness	Production viability
Extended thinking (Claude)	Model shows its reasoning process	Medium-high (reasoning is real but compressed)	Production-ready
Chain-of-thought	Model generates reasoning steps	Medium (reasoning may be post-hoc rationalization)	Production-ready
Token-level attribution	Which input tokens influenced output	Low-medium	Research only (very slow)
Retrieval attribution (RAG)	Which documents informed the answer	High (for the retrieval step)	Production-ready
Natural language explanation	Model explains its own decision	Low (models confabulate explanations)	Available but unreliable

The honest assessment for LLMs in 2026: There is no production-ready method for faithfully explaining why an LLM generated a specific response. Extended thinking and chain-of-thought show reasoning, but the displayed reasoning may not perfectly match the model’s actual computation. Retrieval attribution (showing which documents were used) is the most reliable form of LLM explainability currently available.

Regulatory Requirements for Explainability

Regulation	What it requires	Minimum acceptable method
GDPR Art. 22	”Meaningful information about the logic involved” for automated decisions	SHAP/LIME for tabular; retrieval attribution for RAG; natural language summary
EU AI Act Art. 13	”Sufficient transparency to enable users to interpret the system’s output”	Model card + explanation mechanism appropriate to risk level
US ECOA (fair lending)	Specific reasons for adverse action	Counterfactual or SHAP — must identify specific factors
NYC LL144	Bias audit summary published annually	SHAP feature importance for bias audit; counterfactual for individual decisions

The compliance gap: Regulations require “explainability” but don’t define what constitutes an adequate explanation. Document your explainability method choice, its limitations, and why it’s appropriate for your use case. A well-documented limited explanation is more defensible than an undocumented complex one.

How to Apply This

Use the token-counter tool to estimate token-level explanation costs for LLM applications.

Match your explainability method to your primary audience using the method selection matrix — different audiences need different explanations from the same system.

For regulatory compliance, SHAP on tabular models and counterfactual explanations for user-facing decisions provide the strongest defense.

For LLM applications, retrieval attribution (showing source documents) is the most reliable and production-ready approach — implement it before attempting more sophisticated methods.

Never use attention weights for compliance explanations — the research consensus is clear that attention does not reliably indicate causal importance.

Build explainability into your architecture from the start — retrofitting explanations to opaque systems is 5-10x more expensive than designing for explainability.

Honest Limitations

SHAP values for correlated features can be misleading — they distribute importance across correlated features rather than identifying the causal one. LIME’s instability makes it unsuitable as the sole compliance evidence. Counterfactual explanations may suggest infeasible changes if not constrained. No current method provides faithful explanations for LLM decisions at production scale. Regulatory requirements for “explainability” are ambiguous and evolving. This guide covers post-hoc explainability; inherently interpretable models (linear, decision trees) are an alternative but sacrifice accuracy. Multi-step reasoning explanations (chain-of-thought) may not reflect actual model computation.

Kenny Tan Co-founder & Technical Lead

Cross-domain expertise in software engineering, content systems, and infrastructure architecture.

15 April 2026

Continue reading

AI Bias Detection — Demographic Parity, Equal Opportunity, Calibration, and When Each Metric Applies

Fairness metric decision tree per use case, measurement methodology, regulatory requirements, and practical implementation for production AI systems.

AI Content Filtering — Guardrails That Block Without Breaking User Experience

False positive and negative rate comparison across filtering approaches, latency impact, implementation patterns, and the tradeoff between safety and usability.

Types of AI Hallucinations — Factual, Logical, Attribution, and How to Detect Each

Taxonomy of AI hallucination types with detection methods, failure rates by model and task, and a diagnostic decision tree for production systems.

All articles in ai safety responsible