AI Transparency and Explainability — SHAP, LIME, Attention, and When Each Method Works
Explainability method comparison by use case, computational cost, faithfulness to model behavior, and regulatory requirements for AI decision explanation.
Your AI Made a Decision — Can You Explain Why in a Way That’s Both Accurate and Useful?
A lending AI rejects a loan application. The applicant has a legal right to know why. Your customer success team needs to explain the rejection in plain language. Your compliance team needs to verify the explanation is truthful. Your engineering team needs to debug why the model rejected a seemingly qualified applicant. These four audiences need four different types of explanation from the same decision — and most explainability tools serve only one of them well. This guide compares explainability methods by use case, computational cost, and a critical dimension most guides ignore: faithfulness — whether the explanation actually reflects what the model did.
The Explainability Problem Is Four Problems
| Audience | What they need | Explanation type | Faithfulness requirement |
|---|---|---|---|
| End user | Actionable reason for the decision | Natural language, simplified | Low — approximate is acceptable |
| Customer support | Talking points for explaining decisions | Key factors, plain language | Medium — must not contradict model behavior |
| Compliance/audit | Verifiable evidence the decision was lawful | Feature attribution, bias metrics | High — must reflect actual model reasoning |
| Engineering | Debug signal for model behavior | Full feature importance, decision boundary | Highest — must be faithful to model internals |
The core tension: Faithful explanations are complex. Simple explanations are unfaithful. No single method satisfies all four audiences.
Explainability Methods Compared
SHAP (SHapley Additive exPlanations)
| Dimension | Value |
|---|---|
| What it measures | Marginal contribution of each feature to the prediction |
| Theoretical basis | Shapley values from cooperative game theory |
| Faithfulness | High for tree models; approximate for neural networks |
| Computational cost | O(2^n) exact; O(n·k) with sampling approximation |
| Output format | Per-feature importance values (positive/negative contribution) |
| Best for | Tabular data, tree-based models, compliance documentation |
| Limitation | Slow on high-dimensional data; correlated features produce misleading values |
SHAP provides the most theoretically grounded feature attribution. For tree-based models (XGBoost, LightGBM, Random Forest), TreeSHAP computes exact Shapley values efficiently. For neural networks, KernelSHAP approximates but can be unreliable on correlated features.
Production cost at scale:
| Model type | Features | SHAP computation time per prediction | Scalability |
|---|---|---|---|
| XGBoost (TreeSHAP) | 50 | 5-20ms | Real-time viable |
| Random Forest (TreeSHAP) | 50 | 10-50ms | Real-time viable |
| Neural network (KernelSHAP) | 100 | 2-30 seconds | Batch only |
| LLM (token-level SHAP) | 1,000+ tokens | 10-60 minutes | Research only |
LIME (Local Interpretable Model-agnostic Explanations)
| Dimension | Value |
|---|---|
| What it measures | Local linear approximation of model behavior around a specific input |
| Theoretical basis | Perturbation-based local surrogate model |
| Faithfulness | Moderate — the linear approximation is inherently unfaithful to non-linear models |
| Computational cost | O(n·k) where n=features, k=perturbation samples (typically 1,000-5,000) |
| Output format | Per-feature weights in a local linear model |
| Best for | Quick explanations, model-agnostic applications, user-facing summaries |
| Limitation | Unstable — different random seeds produce different explanations; unfaithful on complex decision boundaries |
LIME explains by asking “if I perturb the input slightly, how does the output change?” and fits a linear model to the perturbation results. This is intuitive but the linear approximation can be misleading for highly non-linear models.
LIME stability problem: On the same input, running LIME twice with different random seeds can produce different top features. This instability makes LIME problematic for compliance and audit use cases where explanations must be consistent.
| Configuration | Stability (same top-3 features across 10 runs) | Computation time |
|---|---|---|
| 100 perturbations | 55-65% | 0.5-2s |
| 1,000 perturbations | 75-85% | 2-10s |
| 5,000 perturbations | 88-93% | 10-30s |
| 10,000 perturbations | 93-97% | 30-60s |
Attention-Based Explanations
| Dimension | Value |
|---|---|
| What it measures | Which input tokens the model attended to during generation |
| Theoretical basis | Transformer attention weights |
| Faithfulness | LOW — attention weights do not reliably indicate causal influence |
| Computational cost | Near zero (attention weights are a byproduct of inference) |
| Output format | Token-level attention heatmap |
| Best for | Debugging, rough intuition, visualization demos |
| Limitation | Jain & Wallace (2019) showed attention often does not correlate with feature importance |
The attention faithfulness debate is settled. Multiple studies have demonstrated that:
- Attention can be redistributed across tokens without changing the output
- High attention on a token doesn’t mean that token influenced the prediction
- Alternative attention patterns exist that produce identical outputs
Attention visualizations are useful for building intuition about model behavior. They are not reliable explanations of why a specific decision was made. Do not use attention-based explanations for compliance or audit purposes.
Counterfactual Explanations
| Dimension | Value |
|---|---|
| What it measures | Minimum input change needed to flip the decision |
| Theoretical basis | Nearest counterfactual in feature space |
| Faithfulness | High — directly tests what the model responds to |
| Computational cost | Varies; optimization-based methods: 1-30 seconds per explanation |
| Output format | ”If X were Y instead, the decision would have been different” |
| Best for | End-user explanations (actionable), loan rejections, hiring decisions |
| Limitation | May suggest infeasible changes; multiple valid counterfactuals exist |
Counterfactual explanations are the most user-friendly format because they’re actionable: “Your loan was rejected. If your debt-to-income ratio were 35% instead of 48%, the loan would have been approved.” The user knows exactly what to change.
Counterfactual quality metrics:
| Quality dimension | Definition | Good threshold |
|---|---|---|
| Proximity | How small is the change? | Minimum feature changes (ideally 1-3) |
| Plausibility | Is the changed input realistic? | Must pass domain constraints |
| Diversity | Are multiple paths to a different decision shown? | 2-3 diverse counterfactuals |
| Actionability | Can the user actually make the suggested change? | Only changeable features |
| Sparsity | How many features change? | Fewer is better |
Method Selection Matrix
| Requirement | SHAP | LIME | Attention | Counterfactual |
|---|---|---|---|---|
| Regulatory compliance | Best | Adequate | Inadequate | Good |
| End-user explanation | Poor (too technical) | Moderate | Poor | Best |
| Debugging | Good | Moderate | Good (quick intuition) | Moderate |
| Stability/reproducibility | High (exact SHAP) | Low-moderate | High | Moderate |
| Speed (real-time) | Tree models only | Moderate | Best (free) | Slow |
| LLM applicability | Very limited | Limited | Available but unfaithful | Promising |
| Theoretical soundness | Strongest | Moderate | Weakest | Strong |
Explainability for LLMs — The Hard Problem
Traditional explainability methods were designed for tabular models with interpretable features. LLMs operate on thousands of tokens with complex interactions. Applying traditional methods to LLMs is either computationally infeasible or produces unfaithful explanations.
Current LLM Explainability Approaches
| Approach | What it provides | Faithfulness | Production viability |
|---|---|---|---|
| Extended thinking (Claude) | Model shows its reasoning process | Medium-high (reasoning is real but compressed) | Production-ready |
| Chain-of-thought | Model generates reasoning steps | Medium (reasoning may be post-hoc rationalization) | Production-ready |
| Token-level attribution | Which input tokens influenced output | Low-medium | Research only (very slow) |
| Retrieval attribution (RAG) | Which documents informed the answer | High (for the retrieval step) | Production-ready |
| Natural language explanation | Model explains its own decision | Low (models confabulate explanations) | Available but unreliable |
The honest assessment for LLMs in 2026: There is no production-ready method for faithfully explaining why an LLM generated a specific response. Extended thinking and chain-of-thought show reasoning, but the displayed reasoning may not perfectly match the model’s actual computation. Retrieval attribution (showing which documents were used) is the most reliable form of LLM explainability currently available.
Regulatory Requirements for Explainability
| Regulation | What it requires | Minimum acceptable method |
|---|---|---|
| GDPR Art. 22 | ”Meaningful information about the logic involved” for automated decisions | SHAP/LIME for tabular; retrieval attribution for RAG; natural language summary |
| EU AI Act Art. 13 | ”Sufficient transparency to enable users to interpret the system’s output” | Model card + explanation mechanism appropriate to risk level |
| US ECOA (fair lending) | Specific reasons for adverse action | Counterfactual or SHAP — must identify specific factors |
| NYC LL144 | Bias audit summary published annually | SHAP feature importance for bias audit; counterfactual for individual decisions |
The compliance gap: Regulations require “explainability” but don’t define what constitutes an adequate explanation. Document your explainability method choice, its limitations, and why it’s appropriate for your use case. A well-documented limited explanation is more defensible than an undocumented complex one.
How to Apply This
Use the token-counter tool to estimate token-level explanation costs for LLM applications.
Match your explainability method to your primary audience using the method selection matrix — different audiences need different explanations from the same system.
For regulatory compliance, SHAP on tabular models and counterfactual explanations for user-facing decisions provide the strongest defense.
For LLM applications, retrieval attribution (showing source documents) is the most reliable and production-ready approach — implement it before attempting more sophisticated methods.
Never use attention weights for compliance explanations — the research consensus is clear that attention does not reliably indicate causal importance.
Build explainability into your architecture from the start — retrofitting explanations to opaque systems is 5-10x more expensive than designing for explainability.
Honest Limitations
SHAP values for correlated features can be misleading — they distribute importance across correlated features rather than identifying the causal one. LIME’s instability makes it unsuitable as the sole compliance evidence. Counterfactual explanations may suggest infeasible changes if not constrained. No current method provides faithful explanations for LLM decisions at production scale. Regulatory requirements for “explainability” are ambiguous and evolving. This guide covers post-hoc explainability; inherently interpretable models (linear, decision trees) are an alternative but sacrifice accuracy. Multi-step reasoning explanations (chain-of-thought) may not reflect actual model computation.
Continue reading
AI Bias Detection — Demographic Parity, Equal Opportunity, Calibration, and When Each Metric Applies
Fairness metric decision tree per use case, measurement methodology, regulatory requirements, and practical implementation for production AI systems.
AI Content Filtering — Guardrails That Block Without Breaking User Experience
False positive and negative rate comparison across filtering approaches, latency impact, implementation patterns, and the tradeoff between safety and usability.
Types of AI Hallucinations — Factual, Logical, Attribution, and How to Detect Each
Taxonomy of AI hallucination types with detection methods, failure rates by model and task, and a diagnostic decision tree for production systems.