Your AI Made a Decision — Can You Explain Why in a Way That’s Both Accurate and Useful?

A lending AI rejects a loan application. The applicant has a legal right to know why. Your customer success team needs to explain the rejection in plain language. Your compliance team needs to verify the explanation is truthful. Your engineering team needs to debug why the model rejected a seemingly qualified applicant. These four audiences need four different types of explanation from the same decision — and most explainability tools serve only one of them well. This guide compares explainability methods by use case, computational cost, and a critical dimension most guides ignore: faithfulness — whether the explanation actually reflects what the model did.

The Explainability Problem Is Four Problems

AudienceWhat they needExplanation typeFaithfulness requirement
End userActionable reason for the decisionNatural language, simplifiedLow — approximate is acceptable
Customer supportTalking points for explaining decisionsKey factors, plain languageMedium — must not contradict model behavior
Compliance/auditVerifiable evidence the decision was lawfulFeature attribution, bias metricsHigh — must reflect actual model reasoning
EngineeringDebug signal for model behaviorFull feature importance, decision boundaryHighest — must be faithful to model internals

The core tension: Faithful explanations are complex. Simple explanations are unfaithful. No single method satisfies all four audiences.

Explainability Methods Compared

SHAP (SHapley Additive exPlanations)

DimensionValue
What it measuresMarginal contribution of each feature to the prediction
Theoretical basisShapley values from cooperative game theory
FaithfulnessHigh for tree models; approximate for neural networks
Computational costO(2^n) exact; O(n·k) with sampling approximation
Output formatPer-feature importance values (positive/negative contribution)
Best forTabular data, tree-based models, compliance documentation
LimitationSlow on high-dimensional data; correlated features produce misleading values

SHAP provides the most theoretically grounded feature attribution. For tree-based models (XGBoost, LightGBM, Random Forest), TreeSHAP computes exact Shapley values efficiently. For neural networks, KernelSHAP approximates but can be unreliable on correlated features.

Production cost at scale:

Model typeFeaturesSHAP computation time per predictionScalability
XGBoost (TreeSHAP)505-20msReal-time viable
Random Forest (TreeSHAP)5010-50msReal-time viable
Neural network (KernelSHAP)1002-30 secondsBatch only
LLM (token-level SHAP)1,000+ tokens10-60 minutesResearch only

LIME (Local Interpretable Model-agnostic Explanations)

DimensionValue
What it measuresLocal linear approximation of model behavior around a specific input
Theoretical basisPerturbation-based local surrogate model
FaithfulnessModerate — the linear approximation is inherently unfaithful to non-linear models
Computational costO(n·k) where n=features, k=perturbation samples (typically 1,000-5,000)
Output formatPer-feature weights in a local linear model
Best forQuick explanations, model-agnostic applications, user-facing summaries
LimitationUnstable — different random seeds produce different explanations; unfaithful on complex decision boundaries

LIME explains by asking “if I perturb the input slightly, how does the output change?” and fits a linear model to the perturbation results. This is intuitive but the linear approximation can be misleading for highly non-linear models.

LIME stability problem: On the same input, running LIME twice with different random seeds can produce different top features. This instability makes LIME problematic for compliance and audit use cases where explanations must be consistent.

ConfigurationStability (same top-3 features across 10 runs)Computation time
100 perturbations55-65%0.5-2s
1,000 perturbations75-85%2-10s
5,000 perturbations88-93%10-30s
10,000 perturbations93-97%30-60s

Attention-Based Explanations

DimensionValue
What it measuresWhich input tokens the model attended to during generation
Theoretical basisTransformer attention weights
FaithfulnessLOW — attention weights do not reliably indicate causal influence
Computational costNear zero (attention weights are a byproduct of inference)
Output formatToken-level attention heatmap
Best forDebugging, rough intuition, visualization demos
LimitationJain & Wallace (2019) showed attention often does not correlate with feature importance

The attention faithfulness debate is settled. Multiple studies have demonstrated that:

  • Attention can be redistributed across tokens without changing the output
  • High attention on a token doesn’t mean that token influenced the prediction
  • Alternative attention patterns exist that produce identical outputs

Attention visualizations are useful for building intuition about model behavior. They are not reliable explanations of why a specific decision was made. Do not use attention-based explanations for compliance or audit purposes.

Counterfactual Explanations

DimensionValue
What it measuresMinimum input change needed to flip the decision
Theoretical basisNearest counterfactual in feature space
FaithfulnessHigh — directly tests what the model responds to
Computational costVaries; optimization-based methods: 1-30 seconds per explanation
Output format”If X were Y instead, the decision would have been different”
Best forEnd-user explanations (actionable), loan rejections, hiring decisions
LimitationMay suggest infeasible changes; multiple valid counterfactuals exist

Counterfactual explanations are the most user-friendly format because they’re actionable: “Your loan was rejected. If your debt-to-income ratio were 35% instead of 48%, the loan would have been approved.” The user knows exactly what to change.

Counterfactual quality metrics:

Quality dimensionDefinitionGood threshold
ProximityHow small is the change?Minimum feature changes (ideally 1-3)
PlausibilityIs the changed input realistic?Must pass domain constraints
DiversityAre multiple paths to a different decision shown?2-3 diverse counterfactuals
ActionabilityCan the user actually make the suggested change?Only changeable features
SparsityHow many features change?Fewer is better

Method Selection Matrix

RequirementSHAPLIMEAttentionCounterfactual
Regulatory complianceBestAdequateInadequateGood
End-user explanationPoor (too technical)ModeratePoorBest
DebuggingGoodModerateGood (quick intuition)Moderate
Stability/reproducibilityHigh (exact SHAP)Low-moderateHighModerate
Speed (real-time)Tree models onlyModerateBest (free)Slow
LLM applicabilityVery limitedLimitedAvailable but unfaithfulPromising
Theoretical soundnessStrongestModerateWeakestStrong

Explainability for LLMs — The Hard Problem

Traditional explainability methods were designed for tabular models with interpretable features. LLMs operate on thousands of tokens with complex interactions. Applying traditional methods to LLMs is either computationally infeasible or produces unfaithful explanations.

Current LLM Explainability Approaches

ApproachWhat it providesFaithfulnessProduction viability
Extended thinking (Claude)Model shows its reasoning processMedium-high (reasoning is real but compressed)Production-ready
Chain-of-thoughtModel generates reasoning stepsMedium (reasoning may be post-hoc rationalization)Production-ready
Token-level attributionWhich input tokens influenced outputLow-mediumResearch only (very slow)
Retrieval attribution (RAG)Which documents informed the answerHigh (for the retrieval step)Production-ready
Natural language explanationModel explains its own decisionLow (models confabulate explanations)Available but unreliable

The honest assessment for LLMs in 2026: There is no production-ready method for faithfully explaining why an LLM generated a specific response. Extended thinking and chain-of-thought show reasoning, but the displayed reasoning may not perfectly match the model’s actual computation. Retrieval attribution (showing which documents were used) is the most reliable form of LLM explainability currently available.

Regulatory Requirements for Explainability

RegulationWhat it requiresMinimum acceptable method
GDPR Art. 22”Meaningful information about the logic involved” for automated decisionsSHAP/LIME for tabular; retrieval attribution for RAG; natural language summary
EU AI Act Art. 13”Sufficient transparency to enable users to interpret the system’s output”Model card + explanation mechanism appropriate to risk level
US ECOA (fair lending)Specific reasons for adverse actionCounterfactual or SHAP — must identify specific factors
NYC LL144Bias audit summary published annuallySHAP feature importance for bias audit; counterfactual for individual decisions

The compliance gap: Regulations require “explainability” but don’t define what constitutes an adequate explanation. Document your explainability method choice, its limitations, and why it’s appropriate for your use case. A well-documented limited explanation is more defensible than an undocumented complex one.

How to Apply This

Use the token-counter tool to estimate token-level explanation costs for LLM applications.

Match your explainability method to your primary audience using the method selection matrix — different audiences need different explanations from the same system.

For regulatory compliance, SHAP on tabular models and counterfactual explanations for user-facing decisions provide the strongest defense.

For LLM applications, retrieval attribution (showing source documents) is the most reliable and production-ready approach — implement it before attempting more sophisticated methods.

Never use attention weights for compliance explanations — the research consensus is clear that attention does not reliably indicate causal importance.

Build explainability into your architecture from the start — retrofitting explanations to opaque systems is 5-10x more expensive than designing for explainability.

Honest Limitations

SHAP values for correlated features can be misleading — they distribute importance across correlated features rather than identifying the causal one. LIME’s instability makes it unsuitable as the sole compliance evidence. Counterfactual explanations may suggest infeasible changes if not constrained. No current method provides faithful explanations for LLM decisions at production scale. Regulatory requirements for “explainability” are ambiguous and evolving. This guide covers post-hoc explainability; inherently interpretable models (linear, decision trees) are an alternative but sacrifice accuracy. Multi-step reasoning explanations (chain-of-thought) may not reflect actual model computation.