Your Production LLM Just Returned a User’s Private Document Content in Response to an Unrelated User’s Query, a Customer Has Screenshots, Legal Is Asking About GDPR Article 33 Notification Obligations, Your CEO Wants to Know What to Say Publicly, and the Engineering Team Has Not Yet Determined Whether This Is a Prompt-Injection Attack, a Retrieval-Misattribution Bug, a Training-Data Memorization Leak, or a Cache-Key Collision — You Have 72 Hours

AI safety incidents are not the failure mode of model misbehavior — they are the failure mode of the incident response process that activates when model misbehavior manifests in production. The model will misbehave; that is a statistical certainty. What separates the organizations that survive AI incidents from the ones that compound them into lawsuits, fines, and public-trust collapse is the response runbook — the predefined classification, severity, containment, notification, and postmortem workflow that activates under pressure. This runbook covers the eight incident classes that matter, severity definitions with SLA targets, the detect→contain→eradicate→recover→postmortem playbook per class, regulator-notification paths, communication templates, and the postmortem structure that converts incidents into systematic improvement rather than blame assignment.

Incident Classification Matrix

Eight incident classes cover the operational AI-safety failure surface:

ClassDefinitionDetection signalTypical discovery pathRepresentative example
Prompt injectionAttacker-crafted input hijacks system-prompt intentUnusual output patterns; system-prompt leakage in responses; tool-call anomaliesUser report, red-team probe, automated prompt-injection classifierUser input “ignore previous instructions” causes model to reveal system prompt
Data exfiltrationSensitive data in training/fine-tuning/RAG corpus leaks through model outputPII appearing in outputs; specific document fragments matching known-sensitive docsDLP pipeline alert, user complaint, regulator inquiryCustomer B’s account details appear in Customer A’s response
Harmful hallucinationConfident generation of false content causing user harmHigh-confidence answers contradicted by ground truth; user-reported harmUser complaint, safety-review sampling, external investigationMedical symptom-checker advises delaying ER visit for condition requiring immediate care
Bias manifestationSystematic differential quality/outcomes across protected attributesDemographic-parity metric divergence; fairness audit findingsBias-monitoring pipeline, fairness audit, disparate user complaintsHiring-assistant consistently scores one demographic lower with equal qualifications
Jailbreak exploitAttacker bypasses content-safety filters via evasion techniqueContent-filter hit rate change; output violating documented usage policiesSafety-filter monitoring, red-team, public jailbreak disclosureRole-play prompt enables content generation that directly-asked form would refuse
PII leakPersonal data exposed through model output or trainingPII-detection regex/classifier hit in outputs; memorization probingDLP alert, subject-access request revealing retained PIIModel completes a partial name + address from memorized training data
Model evasionAdversarial inputs cause misclassification (classification models)Confidence-distribution anomalies; adversarial-input detectionAdversarial monitoring, abuse-pattern reportsSafety classifier consistently passes a class of disguised harmful content
Dependency compromiseUpstream model, dataset, library, or API compromisedSupply-chain alert; upstream-provider disclosure; unexpected behavior shiftSecurity-advisory feed, provider notice, behavioral regressionEmbedding-provider had data-poisoning incident; downstream retrieval is affected

The misclassification-costs-hours reality: An incident is often initially ambiguous between classes. A data-exfiltration event may look like prompt-injection until investigation reveals the output matched training-data memorization rather than attacker-crafted extraction. Initial classification should be provisional; the runbook advances on the most-severe-plausible classification and downgrades only after investigation confirms.

Severity Levels With SLA Targets

SeverityDefinitionExamplesResponse SLAEscalation path
SEV-1Active harm to users, regulatory exposure, or public safety risk; active exploit in progressActive data-exfiltration; harmful medical/legal advice being served; wide-impact bias affecting protected groupsDetect-to-containment: ≤30 minutes. Detect-to-notification: ≤2 hours internal, ≤72 hours regulatorImmediate: CTO, CEO, General Counsel, DPO, security on-call
SEV-2Contained harm or high-likelihood exploit; single-user impact with regulatory implicationsSingle-user PII leak; isolated jailbreak with limited blast radius; contained bias findingDetect-to-containment: ≤2 hours. Detect-to-notification: ≤24 hours internalEngineering director, Security, Legal, Product
SEV-3Elevated risk without active harm; degraded safety postureSafety-filter degradation without confirmed harmful output; model-evasion finding in audit; dependency-vulnerability with no confirmed exploitDetect-to-remediation: ≤5 business daysEngineering manager, Security
SEV-4Safety-hygiene issue; documentation or process gapDocumentation-only drift; audit-log completeness gap; expired credential for monitoring toolRemediation: next sprintEngineering team

SLA discipline: Severity is assigned at detection based on plausible worst case, not confirmed scope. An ambiguous incident that could plausibly be SEV-1 is treated as SEV-1 until investigation downgrades it. The operational cost of a false SEV-1 is much smaller than the regulatory cost of a mis-classified SEV-1 that was handled as SEV-3.

Detect → Contain → Eradicate → Recover → Postmortem

The five-phase incident response process applies to every class with class-specific variations.

Phase 1 — Detect

Detection layerSignalTypical false-positive rate
Automated classifiersReal-time scoring of inputs and outputs against safety-violation classifiersModerate (10-25%) — tune per-class
DLP pipelinesRegex + classifier for PII, sensitive-document fragments, API keys in outputsLow to moderate (5-15%)
Heuristic alertsUnusual output patterns (length outliers, content-filter dodges, unusual-token distributions)Moderate-to-high (20-40%) — triage filter required
User reportsCustomer support tickets flagged for safety reviewVariable — quality depends on user composition
External disclosureResearcher/journalist/competitor/regulator notifying of findingTypically high severity — assume serious until investigated
Red-team discoveryInternal adversarial testing surfacing issuePrevention-phase finding — treat as SEV-3 unless exploit-in-wild
Canary queriesKnown-vulnerable test queries run on scheduleLow false-positive; signals regression

Detection-to-alert latency target: < 5 minutes for automated layers; < 15 minutes for triaged user reports.

Phase 2 — Contain

Containment stops the bleeding. Per-class containment actions:

Incident classFirst containment actionFallback containmentReversibility
Prompt injectionDeploy input-classifier in block mode; disable affected tool-use pathDisable feature entirelyReversible
Data exfiltrationBlock affected query class; purge cache; isolate affected tenantTake feature offlineReversible
Harmful hallucinationAdd safety-filter override on topic; route to safe-fallback responseDisable feature for affected user segmentReversible
Bias manifestationEnable differential-outcome monitoring; pause deploy pipelineRevert to prior-known-fair versionReversible
Jailbreak exploitAdd jailbreak pattern to filter; enable strict-mode filteringDisable content-sensitive featuresReversible
PII leakBlock affected output pattern; purge caches containing PIITake endpoint offlinePartially reversible (PII already leaked is unrecoverable)
Model evasionTighten classifier threshold; route to stronger verificationDisable automated decision-making on affected classReversible
Dependency compromiseSwitch to backup provider if available; disable dependent featuresTake feature offlineReversible

Containment-first discipline: Preserve evidence during containment. Snapshot affected caches, logs, and model versions before modifying them. Post-incident investigation requires evidence; containment actions routinely destroy it when runbooks don’t explicitly require preservation.

Phase 3 — Eradicate

Eradication addresses the root cause. Per-class eradication:

Incident classRoot-cause investigationEradication action
Prompt injectionAttack-pattern analysis, system-prompt-design reviewRedesign system-prompt isolation; tool-use sandboxing; output-validation gate
Data exfiltrationCorpus audit, access-control review, embedding-similarity leak analysisRemove sensitive content from corpus; retrain embeddings; tenant isolation enforcement
Harmful hallucinationTraining-data/RAG-corpus factuality audit for the affected topicAdd authoritative-source retrieval for topic; uncertainty calibration; refusal-on-unknown tuning
Bias manifestationFairness audit, training-data representation analysis, model-card reviewDebias mitigation (data + post-processing + model), monitoring baseline reset
Jailbreak exploitAttack-surface mapping, safety-filter architecture reviewMulti-layer safety architecture; adversarial-training fine-tune; monitored outputs
PII leakTraining-data PII audit, memorization probing, RAG-corpus PII scanCorpus-level PII redaction; differential-privacy retraining if systematic; RAG-filter layer
Model evasionAdversarial-input analysis, classifier-threshold calibrationRobust-classifier fine-tune; input-preprocessing normalization
Dependency compromiseSupply-chain audit, provider-postmortem reviewSwitch provider, implement defense-in-depth for primary dependencies

Eradication-versus-containment distinction: Containment can be complete while eradication is in progress. Do not declare incident-closed until eradication verification — containment masks the issue; eradication removes its recurrence.

Phase 4 — Recover

Recovery restores service. The recovery checklist:

StepVerificationOwner
Eradication-verifiedTest suite includes regression for this incident classEngineering
Monitoring-strengthenedDetection-layer added or tuned for this incident classSRE / MLOps
Containment-removedBlock rules / feature-flags removed after verificationEngineering
Canary-verifiedCanary traffic served for verification window without recurrenceSRE / MLOps
Customer-notifiedAffected customers notified per severity requirementsCustomer Success + Legal
Public-communicatedPublic statement if required (SEV-1 / regulatory obligation)Comms + Legal
Regulatory-notifiedGDPR / EU AI Act / FTC / state AG notifications per applicable jurisdictionsDPO + Legal

Premature-recovery cost: The cost of an incident that reopens because recovery was declared before eradication verification is 3-5× the cost of holding containment for another 4-24 hours. Verification windows exist to catch this.

Phase 5 — Postmortem

Postmortem converts the incident into organizational learning. See postmortem template below.

Regulator-Notification Paths

Regulatory notification obligations are jurisdiction-dependent and tight-deadline. The notification matrix:

RegulationTriggerDeadlineContent requirementsWho notifies
GDPR Article 33Personal-data breach likely to risk data-subject rights72 hours from awareness to supervisory authorityNature, categories/approximate numbers, likely consequences, measures takenDPO / Legal to lead supervisory authority
GDPR Article 34Personal-data breach with high-risk to data subjectsWithout undue delay to affected individualsPlain-language description, contact point, likely consequences, measures takenDPO / Legal / Customer Success
EU AI Act (Art. 73)Serious incident involving high-risk AI system15 days (most serious, 2 days for systemic-risk/widespread cases) to market surveillance authorityIncident description, high-risk system classification, corrective measuresLegal + Technical lead
FTC (Section 5)Deceptive or unfair practice; substantial injuryNo fixed deadline; disclosure in reports/filingsVaries by context; may trigger separately from breachLegal
State AG (US state-by-state)Varies — California CCPA, NY SHIELD, Virginia VCDPA, etc.30-45 days typical; some (MA) immediateState-specific templatesLegal
HIPAA (if applicable)Unsecured PHI breach60 days to HHS + affected individuals; annual for < 500HHS OCR formatCompliance + Legal
Sector-specific (SEC, DoD, healthcare providers)VariesVariesVariesCompliance + Legal

The 72-hour clock reality: GDPR Article 33’s 72-hour clock starts at “awareness,” not at “investigation complete.” An incident discovered Friday evening has a Monday-morning notification deadline. Runbook must operationalize the clock, including out-of-hours coverage. The supervisory-authority filing does not require complete information — it requires the information known at the time plus a commitment to follow-up filings as investigation progresses. Teams that delay notification waiting for “complete information” routinely miss the deadline.

Communication Templates

Incident communication requires pre-drafted templates that legal and comms have approved. Ship-time edits under incident pressure produce legal mistakes.

Internal — Initial Notification

INCIDENT: SEV-{n} — {class} — {one-line-summary}
Detected: {timestamp}
Affected scope: {what we know now}
Containment status: {in-progress | contained}
Incident commander: {name}
Comms lead: {name}
Legal lead: {name}
Next update: {time}
War-room: {link}

Customer — Affected Notification

Subject: Important information about your {product} account

We are writing to inform you of a recent incident affecting your {product} account.

What happened: {plain-language description}
When: {time window}
What information was involved: {categories}
What we are doing: {containment + eradication summary}
What you can do: {user actions, e.g., review activity, reset credentials}
Contact: {dedicated contact path}

We regret this incident and are taking the following measures to prevent recurrence: {summary}.

Regulator — GDPR Article 33

Follow the supervisory authority’s published form. Typical content:

  • Nature of the breach (class, vector)
  • Categories and approximate number of data subjects
  • Categories and approximate number of personal-data records
  • Likely consequences
  • Measures taken or proposed
  • Contact details of DPO

Public — Statement

Draft with Legal + Comms in advance for every severity-plausible incident class. Post-incident edits under deadline routinely produce statements that damage the organization further.

Postmortem Template

The postmortem is the artifact that converts the incident into organizational learning.

# {Incident ID} — {Class} — {Severity}

## Summary
One-paragraph plain-language description of what happened.

## Timeline
| Time (UTC) | Event | Source |
|------------|-------|--------|
| ...

## Impact
- Users affected: {count}
- Data affected: {categories, counts}
- Duration: {start} to {containment} to {recovery}
- Regulatory notifications triggered: {list}
- Customer notifications: {count sent, response}

## Root cause
Technical root cause — not "user error" or "model misbehavior" without further specificity. Decompose to the process or system gap that allowed the incident.

## Contributing factors
- Factor 1
- Factor 2
- ...

## What went well
- Detection path worked
- Containment within SLA
- ...

## What went poorly
- Specific failures — blameless framing
- Specific delays
- ...

## Lessons learned
- Categorized: process, tooling, training, architecture
- Each lesson maps to an action item below

## Action items
| ID | Description | Owner | Due date | Status |
|----|-------------|-------|----------|--------|
| AI-1 | ... | ... | ... | Open |

## Near-miss sibling
Was there a near-miss that should have prevented this? Document to inform near-miss escalation.

## Detection improvement
What detection signal would have caught this earlier? Wire it in as an action item.

## Signoffs
- Incident commander: {name, date}
- Engineering director: {name, date}
- Security lead: {name, date}
- Legal: {name, date}
- DPO (if applicable): {name, date}

The blameless discipline: Postmortems blame processes, systems, and decisions — not individuals. An incident where the on-call responder made a reasonable call that proved wrong is a runbook-documentation failure, not a responder failure. Blame-seeking postmortems destroy the reporting incentive that surfaces incidents early.

On-Call Readiness Checklist

Checklist itemStatusVerification
On-call rotation covers all 168 hours/week✓/✗PagerDuty/Opsgenie schedule published
Primary + secondary on-call per rotation✓/✗Escalation path tested quarterly
Runbook links accessible without VPN✓/✗Runbook portal tested from mobile
Severity-assignment decision tree posted✓/✗Published in runbook portal
Legal + DPO on-call contacts current✓/✗Quarterly review
Regulator-notification forms pre-drafted✓/✗Templates in runbook with Legal-approved placeholders
Communication templates pre-drafted✓/✗Comms-approved; in runbook
War-room setup automation✓/✗One-click war-room provisioning
Incident drill cadence✓/✗Quarterly tabletop; annual live drill per class
Postmortem template available✓/✗In runbook portal
Action-item tracking tool integrated✓/✗Jira/Linear project with postmortem-action-item label

The drill discipline: Runbooks that have never been exercised under drill conditions routinely fail under real-incident conditions. Quarterly tabletop drills catch the procedural gaps that only manifest under pressure.

Anti-Patterns

Anti-patternWhy teams do itWhy it failsCorrect pattern
No predefined severity treeAvoid “over-classifying”Initial classification becomes ad-hoc; SLA clock starts lateSeverity decision tree with plausible-worst-case rule
Containment destroys evidenceOperational focus on stopping harmInvestigation and postmortem blocked; root cause undeterminedEvidence-preservation snapshot before containment actions
Regulator-notification delayed for “complete info”Desire to submit clean notificationMisses 72-hour deadline; additional violationInitial notification with known facts + commitment to follow-up filings
Postmortem blames individualsCultural reflexDestroys reporting incentive; future incidents surfaced laterBlameless structure; focus on processes and systems
Customer notification delayed to Legal reviewRisk-aversionBreach-notification-SLA violation compounds incidentPre-approved templates; Legal pre-clears classes and provides on-call counsel
No drillsProduction pressureRunbook gaps only surface under real incidentsQuarterly tabletop drills per incident class
”It’s just a test” severity downgradeAvoid noiseRed-team findings that reflect real exploit surface get ignoredRed-team findings classified as SEV-3 minimum; upgrade if exploit-in-wild

Honest Limitations

  • Runbook quality decays without exercise. A runbook written two years ago and not drilled is 40-60% useful when a real incident fires. Budget quarterly drill time.
  • Severity classification is often ambiguous at detection. The “plausible worst case” rule mitigates but does not eliminate the judgment call. Post-incident reviews should include “was severity correctly classified” as a standing question.
  • Regulatory notification law changes faster than runbooks update. EU AI Act provisions (Article 73 serious-incident reporting) phase in through 2026-2027. US state privacy laws change quarterly. Legal review of the notification matrix every 90 days is required.
  • Containment actions destroy evidence. Even with discipline, the operational pressure to stop harm routinely destroys investigation artifacts. Build evidence-preservation automation into the containment tools, not just into the runbook.
  • Cross-jurisdiction notification is complex. A single incident affecting users in EU + California + New York triggers GDPR + CCPA + SHIELD notifications with different deadlines, different content requirements, different recipient authorities. The matrix assumes jurisdictional awareness that many engineering-led incident commanders lack.
  • AI-specific incident classes evolve faster than traditional security classes. Prompt injection patterns in 2026 do not match 2024 patterns. Runbooks that hardcode attack patterns decay; runbooks that abstract to detection-action patterns scale.
  • Postmortem action items close at 40-60% rate in most organizations. The most common failure after incident-response is the organizational failure to follow through on the structural changes identified in postmortem. Action-item tracking with accountability is non-negotiable.
  • Blameless postmortems require cultural commitment that leadership must model. An organization whose leadership blames individuals in post-incident meetings will not have blameless postmortems regardless of documented process.

The Incident-Ready Production AI System

An AI system is incident-ready when:

  • Detection layers cover all 8 incident classes with defined alert thresholds.
  • Severity decision tree is published and drilled.
  • Containment actions per class are pre-scripted and tested.
  • Regulatory notification templates are Legal-approved and accessible under pressure.
  • Communication templates (internal + customer + public) are pre-drafted.
  • Postmortem template is standardized with blameless framing.
  • Quarterly tabletop drill per class is scheduled.
  • On-call rotation includes Legal + DPO + Security reachable within SLA.
  • Action-item tracking from prior postmortems is current.

The goal is not to prevent incidents (impossible). The goal is to ensure that when an incident fires, the response is predictable, compliant, proportionate, and produces systemic improvement rather than organizational trauma.