AI Model Audit Guide — Pre-Deployment Testing for EU AI Act, NIST, and ISO 42001
Regulatory requirement mapping across EU AI Act, NIST AI RMF, and ISO 42001 with audit checklist, documentation templates, and compliance evidence collection methodology.
The EU AI Act Is Now Enforceable — Does Your AI System Have the Documentation to Prove Compliance?
The EU AI Act moved from proposal to enforcement in 2026. High-risk AI systems now require risk assessments, bias audits, transparency documentation, and human oversight mechanisms — with penalties up to 7% of global revenue. NIST AI Risk Management Framework and ISO 42001 provide complementary voluntary frameworks increasingly expected by enterprise buyers and investors. This guide maps regulatory requirements to engineering tasks, provides the audit checklist, and explains what documentation you actually need — distinguishing between what regulations require, what’s considered best practice, and what’s unnecessary compliance theater.
Regulatory Landscape — The Three Frameworks
Framework Comparison
| Dimension | EU AI Act | NIST AI RMF | ISO 42001 |
|---|---|---|---|
| Type | Mandatory regulation | Voluntary framework | Certifiable standard |
| Jurisdiction | EU (affects global companies serving EU) | US (influences global practices) | International |
| Enforcement | Government authority + penalties | Self-assessment | Third-party audit |
| Penalty | Up to 7% global revenue | None (voluntary) | Loss of certification |
| Risk classification | 4 levels (Unacceptable, High, Limited, Minimal) | Custom risk assessment | Organization-defined |
| Primary focus | Safety, fairness, transparency | Risk identification and management | AI management system |
| Documentation burden | High for high-risk; low for minimal risk | Moderate (proportional to risk) | High (certification requirement) |
| Maturity | Enforcement 2026 | Version 1.0 (2023) | Published 2023, certifications 2024+ |
Which Frameworks Apply to Your System
| Your situation | EU AI Act | NIST AI RMF | ISO 42001 |
|---|---|---|---|
| Serving EU users | Mandatory | Recommended | Optional (but signals maturity) |
| US-only, no regulated industry | Not applicable | Recommended | Optional |
| US-only, regulated industry (finance/healthcare) | Not applicable | Strongly recommended | Recommended |
| Selling to enterprise customers | May be contractually required | Often referenced in RFPs | Increasingly requested |
| Startup pre-revenue | Assess applicability now; comply before scaling | Good foundation | Too expensive until Series A+ |
EU AI Act — The Compliance Requirements
Risk Classification Decision Tree
| Question | If Yes | If No |
|---|---|---|
| Does the system manipulate behavior, exploit vulnerabilities, or conduct social scoring? | Unacceptable risk — banned | Continue |
| Is it used for: biometric identification, critical infrastructure, education/employment decisions, law enforcement, migration, or access to essential services? | High risk — full compliance required | Continue |
| Does the system interact directly with users (chatbot, content generation)? | Limited risk — transparency obligations | Minimal risk — voluntary codes of practice |
High-Risk AI System Requirements
| Requirement | EU AI Act Article | What it means in practice | Audit evidence needed |
|---|---|---|---|
| Risk management system | Art. 9 | Documented process for identifying, evaluating, and mitigating AI-specific risks | Risk register, mitigation log, residual risk assessment |
| Data governance | Art. 10 | Training data documented, relevant, representative, and as error-free as possible | Data card, quality metrics, representativeness analysis |
| Technical documentation | Art. 11 | Complete description of system: design, development, testing, performance | Model card, architecture document, test reports |
| Record keeping | Art. 12 | Automatic logging of system operation enabling traceability | Log retention policy, audit trail, event logging architecture |
| Transparency | Art. 13 | Users can interpret outputs and understand the system | Explanation mechanism, user-facing documentation |
| Human oversight | Art. 14 | Humans can understand, monitor, and override the system | Override mechanism, monitoring dashboard, intervention process |
| Accuracy, robustness, cybersecurity | Art. 15 | System performs consistently and is protected against adversarial threats | Test results, security assessment, robustness testing |
| Bias testing | Art. 10(2)(f) | Measures to detect and mitigate bias, especially regarding protected groups | Bias audit report, fairness metrics, mitigation documentation |
Limited-Risk AI System Requirements
| Requirement | What it means | Implementation effort |
|---|---|---|
| AI disclosure | Users must know they’re interacting with AI | 1-2 engineering days (add disclosure text/badge) |
| Deepfake labeling | AI-generated content must be labeled | 1-3 engineering days (metadata + visual label) |
| Chatbot disclosure | Users must be informed they’re chatting with AI, not human | 1 engineering day (disclosure banner) |
NIST AI Risk Management Framework — The Practical Guide
NIST AI RMF is organized around four functions:
GOVERN — Organizational Risk Culture
| Practice | What to document | Who’s responsible |
|---|---|---|
| AI governance policy | Roles, responsibilities, decision rights for AI systems | Executive leadership |
| Risk tolerance thresholds | What level of AI risk the organization accepts | Risk committee or equivalent |
| Impact assessment process | How AI systems are evaluated before deployment | Product + legal + engineering |
| Stakeholder engagement | How affected parties are consulted | Product management |
MAP — Risk Identification
| Practice | What to document | Output |
|---|---|---|
| System purpose and context | What the AI does, who it affects, in what context | Context document |
| Known limitations | What the AI cannot do reliably | Limitation inventory |
| Potential harms | How the AI could cause harm (direct, indirect, systemic) | Harm taxonomy |
| Stakeholder impacts | Who is affected and how, including underserved populations | Impact assessment |
MEASURE — Risk Assessment
| Practice | What to measure | Method |
|---|---|---|
| Accuracy and reliability | Task-specific quality metrics on representative data | Task-specific evaluation (see evaluation guide) |
| Fairness and bias | Fairness metrics across protected groups | Bias audit (see bias detection guide) |
| Robustness | Performance under adversarial and out-of-distribution inputs | Red team testing (see safety testing guide) |
| Transparency and explainability | Can decisions be explained to stakeholders | Explainability assessment |
| Privacy | Data handling, consent, retention, minimization | Privacy impact assessment |
MANAGE — Risk Treatment
| Practice | What to document | Frequency |
|---|---|---|
| Risk mitigation actions | What was done to reduce each identified risk | Per risk |
| Residual risk acceptance | What risk remains after mitigation and why it’s acceptable | Per risk |
| Monitoring plan | How risks are tracked in production | Ongoing |
| Incident response | How AI-caused incidents are handled | Event-driven |
| Decommission plan | How the AI system is safely retired | End-of-life |
ISO 42001 — The Management System Standard
ISO 42001 requires an AI Management System (AIMS) — a formal, documented management system specifically for AI:
Required Documentation
| Document | Purpose | Approximate effort |
|---|---|---|
| AI policy | Organization’s commitment to responsible AI | 2-5 pages, executive sign-off |
| Scope statement | Which AI systems are covered | 1-2 pages |
| Risk assessment methodology | How AI risks are identified and evaluated | 5-10 pages |
| Statement of applicability | Which ISO 42001 controls apply and which don’t | 3-5 pages |
| AI impact assessment | Impact on individuals, groups, and society | 10-20 pages per high-risk system |
| Data management procedures | Training data governance | 5-10 pages |
| Testing and validation procedures | How AI systems are tested before deployment | 5-10 pages |
| Monitoring and measurement procedures | How AI performance is tracked in production | 5-10 pages |
| Incident management procedure | How AI incidents are handled | 3-5 pages |
| Internal audit procedure | How compliance is verified internally | 3-5 pages |
| Management review records | Evidence of leadership engagement | Meeting minutes, quarterly |
Total documentation effort: 60-120 pages for the management system, plus per-system documentation (model cards, test reports, risk assessments).
Certification cost: Third-party ISO 42001 audit typically costs $15,000-50,000 depending on organization size and number of AI systems. Annual surveillance audits: $8,000-25,000.
The Audit Checklist — Cross-Framework
This checklist maps requirements across all three frameworks. Items marked “Required” are mandatory under the applicable framework; “Recommended” items are best practice.
| # | Audit item | EU AI Act (High-Risk) | NIST AI RMF | ISO 42001 |
|---|---|---|---|---|
| 1 | Risk classification documented | Required (Art. 6) | Required (MAP) | Required (§6.1) |
| 2 | Risk management system established | Required (Art. 9) | Required (GOVERN) | Required (§6.1) |
| 3 | Training data documented | Required (Art. 10) | Required (MAP 2.3) | Required (Annex B) |
| 4 | Bias testing performed | Required (Art. 10.2f) | Required (MEASURE 2.6) | Required (Annex B) |
| 5 | Technical documentation complete | Required (Art. 11) | Recommended | Required (§7.5) |
| 6 | Automatic logging operational | Required (Art. 12) | Recommended | Required (§8.1) |
| 7 | Transparency mechanism in place | Required (Art. 13) | Required (MEASURE 2.11) | Required (Annex B) |
| 8 | Human oversight mechanism | Required (Art. 14) | Recommended (MANAGE 4.1) | Required (Annex B) |
| 9 | Accuracy tested on representative data | Required (Art. 15) | Required (MEASURE 2.5) | Required (§8.1) |
| 10 | Robustness/adversarial testing | Required (Art. 15) | Required (MEASURE 2.7) | Recommended |
| 11 | Cybersecurity assessment | Required (Art. 15) | Required (MANAGE 2.3) | Required (§6.1) |
| 12 | Incident response plan | Required (Art. 62) | Required (MANAGE 4.2) | Required (§10.2) |
| 13 | Post-market monitoring | Required (Art. 61) | Required (MANAGE 1.1) | Required (§9.1) |
| 14 | Conformity assessment | Required (Art. 43) | N/A | Third-party audit |
| 15 | EU database registration | Required (Art. 49) | N/A | N/A |
Evidence Collection — What Auditors Actually Look For
| Evidence type | What it proves | How to collect it |
|---|---|---|
| Model card | System is documented per Art. 11 | Maintain in version control, update with every model change |
| Test reports | Accuracy and robustness tested per Art. 15 | Automated test pipelines with saved results |
| Bias audit report | Fairness testing performed per Art. 10 | Scheduled bias evaluation with saved metrics |
| Risk register | Risks identified and managed per Art. 9 | Maintained document with risk owners and status |
| Monitoring dashboards | Production monitoring in place per Art. 61 | Screenshots or exports showing ongoing measurement |
| Incident logs | Incident response functional per Art. 62 | Incident tickets with timeline, resolution, root cause |
| Override logs | Human oversight functional per Art. 14 | Logs showing human interventions and overrides |
| Change log | Traceability per Art. 12 | Version control history for model, prompts, guardrails |
How to Apply This
Use the token-counter tool to estimate evaluation pipeline costs — bias testing, accuracy measurement, and robustness testing all require inference calls.
Start with risk classification — determine which EU AI Act risk level applies to your system. This determines the scope of compliance required.
If high-risk: work through the 15-item cross-framework checklist systematically. Items 1-4 (risk classification, risk management, data documentation, bias testing) are the highest priority and the most commonly missing.
If limited-risk: implement transparency obligations (AI disclosure) — these are low effort and high impact.
Build evidence collection into your development process — post-hoc evidence gathering for an audit is 5-10x more expensive than continuous documentation.
Budget for ISO 42001 certification only after Series A+ — the documentation burden is real and premature certification diverts resources from building the AI system itself.
Honest Limitations
EU AI Act implementation guidance is still being published by the European Commission — specific requirements may be refined. NIST AI RMF is voluntary and self-assessed, meaning there’s no external validation of compliance claims. ISO 42001 certification costs are estimates based on early certification bodies; market pricing is still stabilizing. The cross-framework checklist covers the most common requirements but is not exhaustive — legal counsel should verify jurisdiction-specific obligations. Regulatory requirements apply to AI providers and deployers differently — this guide primarily addresses deployer obligations. The documentation effort estimates assume a single AI system; organizations with multiple AI systems face additional coordination overhead.
Continue reading
AI Bias Detection — Demographic Parity, Equal Opportunity, Calibration, and When Each Metric Applies
Fairness metric decision tree per use case, measurement methodology, regulatory requirements, and practical implementation for production AI systems.
AI Content Filtering — Guardrails That Block Without Breaking User Experience
False positive and negative rate comparison across filtering approaches, latency impact, implementation patterns, and the tradeoff between safety and usability.
Types of AI Hallucinations — Factual, Logical, Attribution, and How to Detect Each
Taxonomy of AI hallucination types with detection methods, failure rates by model and task, and a diagnostic decision tree for production systems.