A single misapplied CPT code in an emergency department can trigger a federal investigation and years of reputational damage. In 2024, University of Colorado Health (UCHealth) paid $23 million to resolve allegations that its automated billing system routinely assigned the highest-severity E/M code, CPT 99285, to emergency visits that did not warrant it. The case sent a clear signal to every health system in the country: the way you deploy AI medical coding in your ED matters as much as whether you deploy it at all. Yet the pressure to automate is not going away. There is a 30% shortage of medical coders, a gap projected to widen through the end of the decade. Emergency departments feel this shortage more acutely than almost any other specialty. EDs operate around the clock and face claim submission windows that leave little room for backlogs.
Why Emergency Departments Are the Hardest Coding Environment in Healthcare
The 99281–99285 Severity Spectrum
Emergency department evaluation and management (E/M) coding uses a five-level hierarchy, CPT codes 99281 through 99285, that maps to escalating levels of patient severity and resource consumption. A 99281 represents a straightforward, self-limited problem. A 99285 represents a high-complexity encounter requiring comprehensive examination and medical decision-making under conditions of significant risk. Since 2023, ED E/M level assignment is based solely on medical decision-making (MDM), rather than on a combination of history, exam, and MDM.

Systems trained on pre-2023 documentation patterns, where the volume of history and physical exam documentation influenced code level, can misinterpret the current framework. An AI model that equates documentation length with severity will over-code short, high-acuity encounters and under-code long, low-acuity ones. The MDM-only standard demands that AI systems parse the clinical reasoning embedded in notes, not just the quantity of information recorded.
Volume, Speed, and Documentation Gaps
Emergency departments see between 130,000 and 160,000 patient visits per year at a typical urban hospital. Patients cycle through triage, assessment, treatment, and disposition in hours, sometimes minutes. Physicians document under time pressure, often using templated notes, voice dictation, or EHR macros that may omit the specific clinical reasoning a coder needs to assign the correct E/M level.
For human coders, this means constantly reconciling incomplete documentation with precise coding requirements. For AI systems, it means processing inconsistent input data where critical clinical details may be buried in nursing notes, radiology reports, or medication administration records rather than stated explicitly in the physician's note. The AI medical coding systems that succeed in the ED are those capable of synthesizing information across multiple document types within a single encounter.
How AI Medical Coding Actually Works in the ED
From Clinical Notes to CPT and ICD Codes
Contemporary AI medical coding platforms use a pipeline that begins with natural language processing (NLP) to parse unstructured clinical text. The system ingests the physician's note, nursing documentation, procedure records, and ancillary reports associated with an encounter. It then extracts clinical entities and maps them to coding ontologies such as ICD-10-CM for diagnosis codes and CPT for procedure and E/M codes.
For ED encounters specifically, the AI must evaluate the complexity of medical decision-making by assessing three components: the number and complexity of problems addressed, the amount and complexity of data reviewed, and the risk of complications or morbidity associated with management decisions. Platforms like Sully approach this by integrating directly with major EHR systems such as Epic, embedding real-time code suggestions into the clinician's existing workflow rather than operating as a separate post-encounter review step. This integration model reduces the lag between patient discharge and claim submission, a critical factor in ED revenue cycles where delayed coding can mean delayed reimbursement.
The best AI medical coding systems do not simply assign a code and move on. They generate a confidence score for each suggestion and flag encounters that fall below a configurable threshold for human review. This human-in-the-loop design is what separates compliant AI coding from the kind of blind automation that led to the UCHealth settlement.
Retrieval-Augmented Approaches
One of the most significant advances in AI medical coding came from Mount Sinai Health System. The research team tested a retrieval-augmented approach specifically using 500 emergency department patient encounters. The AI model first read the physician's note and generated a plain-language description of the diagnosis. A retrieval system then matched that description against a database of more than one million historical hospital records to identify the ten most similar ICD entries, along with their frequency data. The model then used this retrieved context to select the most accurate ICD-10-CM code.
The results were striking. Across nine different AI models tested, including small open-source systems, the retrieval-augmented versions consistently outperformed their non-augmented counterparts. In many cases, the AI-assigned codes were more accurate than the original physician-assigned codes. Mount Sinai is currently piloting this approach within its EHR system, with plans to expand beyond primary diagnosis codes to secondary and procedural coding.
This matters for ED coding because the retrieval step addresses one of the biggest failure modes: the sheer size of the ICD-10-CM code set. With over 72,000 diagnostic codes, even experienced human coders struggle to achieve specificity. A retrieval layer narrows the search space dramatically, giving the AI a curated shortlist of contextually relevant codes rather than forcing it to select from the entire ontology.
What the Research Says About AI Coding Accuracy
Retrieval-augmented large language models can match or exceed physician accuracy in assigning primary ICD-10-CM codes for ED encounters. This is the strongest evidence to date for AI coding in the emergency department context. A crossover randomized controlled trial in July 2025 tested an AI tool called Easy-ICD with coders in Norway and Sweden. The study found a 46% reduction in median coding time for complex clinical texts when the AI tool was used, a difference of 123 seconds per encounter that was statistically significant (P<.001). Accuracy improvements were observed but did not reach statistical significance, suggesting that AI's primary near-term value may be speed rather than precision.
The Regulatory Landscape: False Claims Act and Federal Oversight
Any discussion of AI medical coding that skips the regulatory environment is incomplete and potentially reckless. The Department of Justice recovered over $5.7 billion in healthcare-related False Claims Act settlements in fiscal year 2025, the highest annual total in the statute's history. Understanding the enforcement climate is essential for any ED deploying AI coding tools.
The DOJ and HHS have created a dedicated False Claims Act Working Group to coordinate healthcare fraud investigations. That working group has explicitly flagged AI-driven EHR manipulation as a top enforcement priority for 2026. The concern is specific: automated prompts, defaults, or nudges embedded in EHR systems that drive claim submissions for medically unnecessary services or services coded at a higher level than the documentation supports.
What This Means for ED Coding Teams
For emergency departments specifically, the enforcement focus carries three practical implications. First, any AI coding system deployed in the ED must maintain auditable logic. Second, human oversight is a compliance requirement in all but name. Third, statistical monitoring for outlier billing patterns should be built into every AI coding workflow.

Building a Compliant AI Coding Workflow for Your ED
Deploying AI medical coding in an emergency department is not a plug-and-play proposition. The health systems that get the most value follow a structured implementation approach. Based on the research and emerging best practices, here is a framework that works:
Start with a documentation quality audit. Before deploying any AI coding tool, assess your ED's documentation patterns. Are physicians consistently documenting the medical decision-making elements that drive E/M level assignment? AI cannot code what is not documented. If your notes are thin, invest in documentation improvement first.
Select AI tools built for ED workflows. General-purpose AI coding platforms often struggle with the speed, volume, and case-mix complexity of emergency medicine. Look for systems that process multiple document types per encounter, support the 99281–99285 E/M framework, and integrate with your EHR in real-time rather than as a batch process.
Configure confidence thresholds conservatively. Set your AI system to route any encounter with a confidence score below a high-confidence threshold to a human coder for review. For a new deployment, err on the side of more human review, not less. You can loosen thresholds as you build confidence in the system's accuracy for your specific patient population.
Implement statistical outlier monitoring. Build dashboards that track your ED's code distribution against national benchmarks. If your 99285 rate drifts above peer norms, investigate immediately. Do not wait for CMS to flag it.
Establish a feedback loop between coders and clinicians. When AI coding identifies documentation gaps that prevent accurate code assignment, that feedback should flow back to the clinical team. The best AI implementations improve documentation quality over time, not just coding speed.
Conduct quarterly compliance audits. Sample a statistically significant portion of AI-coded encounters each quarter and have certified coders review them against the original documentation. Document your audit methodology and results. This paper trail is your defense in any future investigation.
What Makes ED-Specific AI Coding Different From Other Specialties
Emergency medicine coding differs from other specialties in ways that matter for AI system design. Understanding these differences helps communicate the value proposition accurately and helps operational leaders set realistic expectations. The first distinction is acuity variance. In a single shift, an ED sees everything from a sprained ankle (99281) to a STEMI with cardiac arrest (99285). AI systems must handle this full spectrum without bias toward either end. Many AI coding tools are trained disproportionately on high-volume, moderate-complexity encounters, which can create blind spots at the extremes.
The second is multi-provider documentation. ED encounters often involve an attending physician, a resident, a mid-level provider, and multiple nurses. The AI must reconcile these overlapping, sometimes contradictory records into a single coherent coding picture. This multi-source synthesis challenge is far less prevalent in outpatient specialties where a single physician authors the entire note.
The third is time sensitivity. ED revenue cycles operate on tighter timelines than most specialties. Patients present, are treated, and are discharged or admitted within hours. Claims must be submitted promptly to avoid payer-specific filing deadlines. AI medical coding adds value here by reducing the lag between patient disposition and final code assignment, in some implementations, to near zero for straightforward encounters. These distinctions explain why platforms designed for the ED, rather than adapted from other specialties, tend to outperform.
The Coder's Evolving Role in an AI-Augmented ED
The most forward-looking health systems are investing in upskilling their coding teams. Training them to audit AI outputs, interpret confidence scores, and serve as the human checkpoint in an automated workflow. This hybrid model, where AI handles volume and humans handle judgment, is emerging as the industry standard. The key capabilities that define an effective AI-augmented ED coder include:
Ability to interpret AI confidence scores and prioritize review queues based on risk
Deep familiarity with ED-specific E/M guidelines and the MDM-only framework for 99281–99285
Understanding of False Claims Act liability and the documentation standards required to defend coded encounters
Skill in identifying systematic AI errors (such as consistent over-coding of a specific diagnosis cluster) and escalating them for model retraining
The trajectory of AI medical coding in emergency departments points toward deeper integration, not replacement of existing systems. Mount Sinai's retrieval-augmented approach is already expanding from primary diagnosis coding to secondary and procedural codes. Even imperfect AI tools save coders meaningful time on complex encounters.

The health systems that will lead in this space are not the ones that deploy AI fastest. They are the ones that deploy it with the clearest understanding of what ED coding actually requires: clinical specificity and a genuine human-in-the-loop workflow. For emergency departments navigating this transition, the path forward is clear. Audit your documentation. Choose AI tools purpose-built for emergency medicine. Set conservative confidence thresholds. Monitor your code distribution. And never forget that every code your AI assigns carries the weight of a federal compliance obligation.
Sources:
DOJ Hits University of Colorado Health with $23 Million Penalty for Allegedly Upcoding Emergency Services — Constantine Cannon
Beware of Automated or AI-Generated Billing Coding to Government Healthcare Programs — Arnold & Porter
Assessing Retrieval-Augmented Large Language Models for Medical Coding — NEJM AI
Adding a Lookup Step Makes AI Better at Assigning Medical Diagnosis Codes — Mount Sinai Health System
Artificial Intelligence to Improve Clinical Coding Practice in Scandinavia: Crossover Randomized Controlled Trial — Journal of Medical Internet Research
DOJ's Record-Breaking 2025 False Claims Act Recoveries and Key Healthcare Fraud Enforcement Trends — White & Case LLP
False Claims Act Enforcement in Healthcare: How Practices Can Protect Themselves in 2026 — Doctors Management
ED Facility Level Coding Guidelines — American College of Emergency Physicians
UCHealth Says It Will Pay $23 Million in Fraudulent Emergency Department Billing Case — Colorado Public Radio
The Staffing Storm: How Workforce Shortages Are Crippling RCM Performance — Currance
TABLE OF CONTENTS
Hire your
Medical AI Team
Take a look at our Medical AI Team
AI Receptionist
Manages patient scheduling, communications, and front-desk operations across all channels.
AI Scribe
Documents clinical encounters and maintains accurate EHR/EMR records in real-time.
AI Medical Coder
Assigns and validates medical codes to ensure accurate billing and regulatory compliance.
AI Nurse
Assesses patient urgency and coordinates appropriate care pathways based on clinical needs.