BLOG

·

Apr 16, 2026

·

1 min read

Value-Based Care and AI: Coding for Quality Metrics That Determine Reimbursement

Value-Based Care and AI: Coding for Quality Metrics That Determine Reimbursement

In value-based care, quality metrics drive reimbursement. Sully.ai explores how AI coding helps capture the data that determines what providers get paid.

In value-based care, quality metrics drive reimbursement. Sully.ai explores how AI coding helps capture the data that determines what providers get paid.

The rules of healthcare reimbursement have changed. For decades, payment followed volume: the more services a provider delivered, the more they were paid, with limited regard for whether those services produced measurably better outcomes. Value-based care AI adoption is accelerating because the model has reversed. Under the payment structures now governing a growing share of Medicare and commercial contracts, what providers are paid depends not just on what they did but also on how well they documented it, how thoroughly their coding captured patient complexity, and how consistently their quality measure data reflect the care they actually delivered.

How Value-Based Care Tied Coding to Reimbursement

The Shift from Fee-for-Service to Pay-for-Performance

The transition from fee-for-service to value-based reimbursement began in earnest with the Affordable Care Act of 2010 and accelerated through the Medicare Access and CHIP Reauthorization Act of 2015, which created the Merit-based Incentive Payment System and the Alternative Payment Model pathways that now govern how most Medicare clinicians are paid. Under these frameworks, payment is adjusted based on performance across quality, cost, and care improvement dimensions - with both bonuses for high performers and penalties for low performers.

AI medical scribe capturing a doctor-patient consultation with a stethoscope, medication bottle, and clipboard on the desk.

Value-based care coding sits at the center of this framework because quality performance measurement depends entirely on the accuracy and completeness of documentation and coding at the point of care. A physician who delivers excellent chronic disease management but documents it imprecisely will score worse than a physician who delivers equivalent care and documents it with specificity. The payment difference between those two physicians is a documentation problem, not a clinical one.

Why Quality Metric Documentation Is Now a Financial Function

Quality metrics on healthcare AI adoption are being driven by the recognition that quality reporting is a financial one. Under MIPS, the Merit-based Incentive Payment System, the quality performance category accounts for 30% of the composite score that determines whether a clinician receives a payment bonus or penalty of up to 9% of Medicare Part B reimbursement. Under Medicare Advantage risk adjustment, the HCC codes assigned to each patient determine the capitation payments the plan receives, which in turn shape the resources available for care delivery. In both contexts, the financial output is a direct function of coding quality. Organizations that treat documentation and coding as billing operations rather than strategic functions are leaving measurable revenue on the table and absorbing penalties they could prevent.

The Coding Precision That Value-Based Programs Demand

Value-based programs impose documentation requirements that are more specific than what fee-for-service billing historically demanded. Risk adjustment models require that every chronic condition be documented as present, actively managed, and coded to the highest specificity the clinical record supports. Every year, for every patient. Quality measure reporting requires that specific clinical actions be documented in structured fields that reporting algorithms can read and count. Neither requirement is satisfied by documentation written solely for clinical communication.

 

The gap between what value-based programs require and what clinical documentation typically produces is where AI in healthcare coding tools is delivering its most significant financial impact. Closing the specificity and completeness gaps that cost organizations points, scores, and reimbursement.

MIPS and the Documentation Requirements That Determine Your Score

How MIPS Quality Measures Are Calculated and Reported

MIPS quality performance is calculated from a set of measures that clinicians select from a menu of options defined by the Centers for Medicare and Medicaid Services. Each measure tracks whether a specific clinical action was performed and documented for eligible patients - a blood pressure reading below a target threshold, a preventive screening completed, a care plan documented for a patient with multiple chronic conditions. The measure is counted as met when the documentation supports it and missed when it does not, regardless of whether the clinical action was actually performed.

Documentation Gaps That Cost Practices Points

The most common MIPS documentation failures are not the result of care that was not delivered - they are the result of care that was delivered but not documented in the form the measure algorithm requires. A blood pressure reading documented in narrative text rather than in a structured field may not be counted by the reporting algorithm. A preventive screening discussed with the patient, but not linked to the appropriate diagnostic code, may not register as complete. A care coordination conversation documented without the specific language the measure definition requires may score as a miss.

 

These gaps compound across a patient panel. A primary care practice with 1,500 Medicare patients and a 10-percentage-point documentation gap across its top MIPS measures can lose significant bonus revenue, not because the care was deficient, but because the documentation architecture the practice uses does not align with the reporting architecture the program applies.

How AI Quality Reporting Automates MIPS Measure Capture

AI quality reporting platforms address the MIPS documentation gap by integrating measure logic into the clinical documentation workflow rather than applying it after the fact in a separate reporting process. Quality reporting automation through AI also handles the extraction and submission of measure data from the EHR to the MIPS reporting registry, eliminating the manual data pull that consumes significant staff time during the annual reporting cycle and introducing errors that can suppress scores below what the clinical documentation would actually support.

 

The four MIPS performance categories each carry distinct documentation requirements that determine how AI tools should prioritize their support. Understanding each category is essential for evaluating where AI will produce the highest score improvement and financial return:

 

  • Quality Performance Category. Accounting for 30% of the composite MIPS score, the quality category tracks clinician performance across selected clinical quality measures. AI tools that flag missing quality measure documentation in real time during the clinical encounter address this category most directly, capturing measures that would otherwise be missed when documentation does not match the structured format the reporting algorithm reads.

  • Cost Performance Category. Accounting for 30% of the composite score, cost performance is calculated from claims data rather than documentation submissions, meaning it is not directly addressable through AI documentation tools. However, AI coding accuracy that produces cleaner, more complete claims reduces attribution errors that can inflate cost calculations by assigning resource utilization to the wrong clinician or episode of care.

  • Promoting Interoperability Category. Accounting for 25% of the composite score, this category measures the use of certified EHR technology for specific clinical functions, including patient engagement, electronic prescribing, and health information exchange. AI tools that integrate with certified EHR platforms and support electronic data exchange across care transitions contribute to performance in this category by enabling the structured data workflows it rewards.

  • Improvement Activities Category. Accounting for 15% of the composite score, this category credits clinicians for participation in specific care improvement activities such as care coordination programs and patient safety initiatives. AI tools that support care coordination documentation, chronic care management, and population health monitoring can contribute to evidence of participation in improvement activities when those tools align with the activity definitions CMS recognizes.

 

Organizations that understand which MIPS category their documentation gaps are concentrated in can target AI tool deployment to the areas with the highest score impact, rather than implementing broad solutions that dilute the financial return.

HCC Risk Adjustment Coding and Its Direct Impact on Revenue

Foundation for Risk-Adjusted Reimbursement

Hierarchical Condition Category coding is the mechanism through which Medicare Advantage and other risk-adjusted payment models calibrate capitation payments to the actual health complexity of a provider's patient panel. Each HCC code assigned to a patient contributes to a Risk Adjustment Factor score that determines how much the plan receives to care for that patient. Higher RAF scores indicate more complex patients and result in higher per-patient payments. Undercoded patients receive lower RAF scores and, correspondingly, lower payments. Leaving the organization with the cost of caring for a complex patient and the reimbursement calibrated for a healthier one.

Medical scribe documentation represented by a patient medical history form on a clipboard with a stethoscope and pen.

HCC coding AI tools address the RAF score gap by identifying, from clinical documentation, the chronic conditions that qualify for HCC assignment but have not been coded in the current encounter. The documented but unclaimed diagnoses that accumulate into significant payment shortfalls across a Medicare Advantage patient panel.

The V28 Model Transition and What It Means for Coding Strategy in 2025 and 2026

The transition to CMS-HCC Model V28 represents the most significant structural change to risk adjustment coding in more than a decade. The V28 model increases the total number of HCCs from 86 to 115, remaps the relationship between ICD-10 codes and HCC categories, and recalibrates the RAF score values assigned to individual conditions. Approximately 2,200 ICD-10 codes that previously mapped to HCCs under V24 no longer map to HCCs under V28, and RAF score values for commonly coded conditions have been reduced.

 

For organizations coding under the transition blend, the practical effect is that coding strategies calibrated to V24 will produce lower RAF scores and lower reimbursement under V28 unless they are updated to reflect the new model's structure. Risk adjustment AI tools trained on V28 mapping rules are positioned to identify HCC opportunities created by the new model while flagging codes that no longer carry risk adjustment value, preventing both missed revenue and overcoding compliance risk.

How AI Identifies Undocumented HCCs Before They Cost You Revenue

Natural language processing applied to clinical documentation can identify undocumented or inadequately documented chronic conditions that qualify for HCC assignment and flag them for physician review before the encounter closes. AI value-based reimbursement research finds that NLP-based HCC identification reduces missed diagnoses by up to 30% compared to human coding review alone - a gap that translates directly into RAF score points and the capitation revenue they represent. Sully's AI Medical Coder applies this NLP capability to the clinical documentation generated during each encounter, surfacing HCC-relevant diagnoses that are present in the note but not yet coded and routing them for physician confirmation before the encounter is finalized.

How AI Closes the Documentation Gap in Value-Based Programs

Flagging During Clinical Encounters

Quality measure documentation AI operates most effectively when it is embedded in the clinical encounter rather than applied after the fact. AI tools that surface quality measure gaps in real time give the clinician the opportunity to complete the measure, add the required documentation, or perform the clinical action that has not yet been performed. The same measures applied after the encounter can only flag the gap; they cannot close it.

AI-Assisted CDI

Clinical documentation improvement programs have traditionally operated through retrospective query workflows: a CDI specialist reviews a coded claim, identifies a specificity gap, and sends a query to the physician requesting additional documentation. That workflow is slow, expensive, and limited to the volume of claims a CDI team can manually review.

 

Artificial intelligence in healthcare CDI platforms applies the same specificity analysis prospectively and at scale, reviewing every encounter for the documentation gaps that value-based programs penalize and surfacing targeted queries to physicians within the EHR workflow before the claim is submitted. The result is a CDI function that operates at the volume and speed of AI rather than of a human review team.

AI for Population Health and Longitudinal Quality Performance

Managing Quality Metrics Across a Patient Population with AI

AI quality metrics for healthcare management at the population level requires tracking and measuring performance across thousands of patients simultaneously - identifying which patients are due for measure completion, which have documentation gaps from prior encounters, and which are approaching the end of a measurement period without the required clinical actions on record. Human workflows cannot manage this volume with the specificity and timeliness required for value-based program performance.

 

AI population health platforms aggregate measure status across the full patient panel, surface the patients with the most actionable gaps, and integrate that information into care team workflows - appointment scheduling, outreach prioritization, and care coordination protocols - so that quality gaps are addressed through clinical action rather than documentation retrofit.

Predictive Analytics

Predictive analytics applied to claims and clinical data can identify patients likely to exhibit high utilization or HCC capture gaps in the upcoming measurement period, enabling care teams to intervene proactively rather than retrospectively. Medical AI risk stratification models trained on longitudinal patient data identify the specific chronic condition management gaps and care coordination failures most likely to affect both quality scores and total cost of care outcomes.

 

That predictive capability is particularly valuable for organizations in shared savings programs where the total cost of care for a patient panel determines whether the organization earns shared savings or owes repayment. Identifying high-risk patients early enough to support intensive care management is as much a financial function as a clinical one under those contracts.

Closing Care Gaps

Care gaps generate both quality measure misses and clinical risk. AI tools that identify care gaps from claims and clinical documentation data, and surface them to care teams at the point of scheduling or encounter, connect population health management with the clinical interactions where gaps are resolved. Sully's AI platform is designed to surface this documentation intelligence within the clinical workflow rather than in a separate population health dashboard, ensuring that care gap information reaches the physician at the moment it can influence clinical decision-making and documentation.

Building an AI-Supported Value-Based Coding Strategy

Selecting AI for medical value-based coding requires evaluating solutions against the dimensions that determine whether the tool will improve quality program performance and financial outcomes or introduce new compliance and workflow risks. The following criteria reflect what value-based care program directors and CMOs consistently identify as decisive in AI medical technology platform assessments:

 

  1. Quality Measure Coverage and Real-Time Integration. Confirm that the AI tool covers the specific MIPS quality measures your organization has selected and integrates with your EHR at the point of care rather than through a retrospective reporting workflow. Real-time integration is the dimension that determines whether the tool closes documentation gaps during the encounter - when they can be addressed - or flags them after the fact, when the opportunity to act has passed.

  2. HCC Model Currency and V28 Compliance. Verify that the AI coding tool is trained on CMS-HCC V28 mapping rules and updates its model in alignment with CMS annual model revisions. A tool still calibrated to V24 logic will miss new HCC opportunities created by the V28 model and may recommend codes that pose overcoding risk under the current adjudication framework.

  3. Audit Trail and Documentation Support. Evaluate whether the platform generates a complete, physician-confirmed audit trail for every code recommendation - documenting the clinical evidence cited, the AI confidence level assigned, and the clinician review action taken. This audit trail is the compliance infrastructure that protects the organization in a RADV audit environment.

  4. Population Health Reporting and Gap Management. The most effective value-based AI platforms surface care gaps and quality measure status at both the population and encounter levels, enabling proactive outreach and care coordination that address gaps before the measurement period closes. Confirm that the platform's population reporting integrates with your existing care management workflows rather than requiring a parallel management process.

 

Organizations that evaluate AI value-based coding solutions against these criteria are positioned to select a platform that improves program performance, supports compliance, and delivers the financial returns enabled by accurate risk adjustment and quality reporting. Sully's integrations with more than 50 EHR platforms ensure that AI value-based reimbursement tools connect directly to the clinical documentation systems where quality measures and HCC data originate.

AI receptionist streamlining administrative tasks as a professional reviews financial spreadsheets with a calculator on the desk.

AI quality metrics, healthcare management, HCC coding accuracy, and real-time MIPS measure capture are not incremental improvements to billing efficiency. They are the foundation of a sustainable financial model under value-based contracts - one that aligns payment with the clinical complexity organizations actually manage and the care quality they actually deliver. Sully's AI Medical Coder and AI Scribe work together to close the gap between clinical documentation and value-based program requirements, ensuring every encounter is coded with the specificity required by quality metrics and risk adjustment.

Sources

TABLE OF CONTENTS

Hire your

Medical AI Team

Take a look at our Medical AI Team

AI Receptionist

Manages patient scheduling, communications, and front-desk operations across all channels.

AI Scribe

Documents clinical encounters and maintains accurate EHR/EMR records in real-time.

AI Medical Coder

Assigns and validates medical codes to ensure accurate billing and regulatory compliance.

AI Nurse

Assesses patient urgency and coordinates appropriate care pathways based on clinical needs.

Ready for the

future of healthcare?

Ready for the

future of healthcare?

Ready for the

future of healthcare?