Engineering Case Study

MedCode Pro: Building an AI-Powered Claim Scrubber

Medical billing teams face constant pressure from payers, often resulting in claims being denied due to simple coding errors, missing modifiers, or lack of documentation specificity. These denials create massive administrative overhead and delay revenue. The solution requires strict adherence to regulations like HIPAA in a highly complex and regulated environment.

To solve this, I designed and built MedCode Pro, a unified dashboard that acts as an AI Claim Scrubber and Clinical Documentation Improvement (CDI) assistant. The system intercepts the manual coding process and applies real-time validation rules to ensure claims are clean, compliant, and ready for submission, thereby significantly reducing the denial rate.


The Denial Problem: Errors are Expensive

In healthcare, a small error in coding-such as using an "unspecified" diagnosis code when a more specific one is available, or forgetting a necessary CPT modifier-can stop an entire claim dead in its tracks. The process of tracking, correcting, and resubmitting a denied claim can cost a provider over $100 per instance.

The goal of MedCode Pro is to integrate the denial risk check directly into the coder's workflow. Instead of validating after the claim is denied, the system validates the claim set before the first submission, using intelligent rules based on common payer policies (like NCCI edits and local coverage determinations).

Interactive Claim Scrubber Demo

Simulate denial risk analysis

HIPAA Compliant
Inputting claims...
AI Validation

Run the scrubber to analyze the claim.

Claim Status: Awaiting Analysis

From Documentation to a Clean Claim

The MedCode Pro workflow is designed to be highly reliable, separating concerns into three distinct stages: Extraction, Validation, and Output. The key innovation is the AI Scrubber, a validation layer applied before the claim is generated.

The application leverages modern APIs and custom logic to manage highly-specific clinical data, including ICD-11, CPT, and HCPCS Level II codes, ensuring that all data elements required for a clean claim are present and accurate.

System Architecture Diagram

The system is built on a modular architecture to allow for flexible integration with different electronic health records (EHR) and billing systems.

📋
Input
Clinical Note
🧠
Process
AI Extraction
🛡️
Logic
Claim Scrubber
📊
Output
Risk Report
☁️
Endpoint
CMS-1500/EHR

Data Modeling: Mapping JSON to EDI Standards

For interoperability, the lightweight JSON object used in the demo maps directly to the required elements of the HIPAA EDI 837 transaction set (the industry standard for electronic claim submission). This architectural choice proves an understanding of enterprise healthcare data mandates.

Data Mapping: JSON $\rightarrow$ EDI 837

JSON Input Key

  • diagnoses
  • procedures.code
  • procedures.modifier
  • patient_status

837 Loop/Segment

  • 2300 HI
  • 2400 LX, SV1
  • 2400 SV1-03
  • 2010BA DTP

Purpose/Context

  • Diagnosis Codes
  • Procedure Code
  • Service Modifier
  • Patient Demographics

The AI Scrubbing Logic and Denial Patterns

The core scrubbing engine uses a set of deterministic rules combined with a simulated large language model (LLM), such as the Gemini API, to execute complex validation checks. This hybrid approach ensures both compliance (deterministic NCCI edits) and intelligence (contextual risk scoring).

src/scrubbing_rules.js
const HighRiskPatterns = [
// Pattern 1: Missing Specificity for ICD-10 Code
/I10|R51\.9|J45\.909/, // Essential HTN, Unspecified Headache, Unspecified Asthma
// Pattern 2: Modifier Misuse (e.g., Modifier 59 on E&M)
/992\d{2}.+59/,
// Pattern 3: High-Risk E&M Level (Level 5 for established patient without clear supporting docs)
/99215/
];

The scrubber checks for these and other denial patterns, categorizing them by severity (High, Medium, Low) and providing an immediate, actionable alert to the coder.

CDI: Driving Clinical Documentation Improvement

A primary cause of denials is the use of non-specific codes (e.g., ICD-10 code `I10` for "Essential (primary) hypertension," which often lacks the detail required by payers). MedCode Pro's CDI functionality addresses this by identifying the non-specific code and immediately presenting a "Physician Query" to the user, asking for the missing clinical detail (e.g., "Was this malignancy, benign, or with heart failure?").

This feature effectively closes the loop between the coder and the physician's documentation, leading to richer data, higher specificity, and improved payment integrity.

Design Tradeoffs: Speed vs. Data Completeness

A key architectural decision was balancing user experience speed with the depth of data validation. Since full, real-time NCCI (National Correct Coding Initiative) checking involves querying massive, often slow, proprietary databases, the system initially favors an extremely fast, high-confidence local risk model (simulated via JavaScript rules).

The tradeoff is that a full, *perfect* scrub is deferred until the final submission layer. This approach ensures the coder gets instant, actionable feedback 90% of the time, dramatically improving workflow efficiency without crippling performance with a slow, synchronous API call on every keystroke. This demonstrates an understanding of realistic application performance versus absolute compliance perfection.

Engineering Principles Behind the Dashboard

The dashboard follows several principles necessary for a mission-critical financial application:

Single Source of Truth (SSOT)

All code data (ICD-11, CPT, Modifiers) is fetched from centralized look-up tables to ensure consistency across the entire application, from search to the final claim form.

State Management (React)

Using modern React state management, the dashboard ensures a seamless, instantaneous user experience where adding a code instantly updates the risk report and the form preview without page refreshes.

UI Fidelity and Context

The application visually mimics standard forms (CMS-1500/UB-04) and highlights risk areas in real-time, providing coders with immediate context and reducing the cognitive load of manual cross-referencing.

Extensible API Integration

The core logic is structured to easily integrate with various third-party APIs for real-world functionality like eligibility checks (270/271), claims submission (837), and prior authorization.

Key Takeaways for Platform Engineering

  • Operational Value: Designing systems that directly address business-critical functions like Revenue Cycle Management (RCM) to drive measurable cost reduction (e.g., reducing >$100 denial resubmission costs).
  • End-to-End Development: Full-stack ability to build highly interactive UIs (React/Tailwind) and integrate complex logic using APIs and simulated LLM/AI services (Gemini API) for robust back-end validation.
  • System Design: Experience creating extensible, modular architectures that adhere to compliance standards (HIPAA context) and simplify complex regulatory logic (NCCI edits, CDI).
  • Compliance & Security: Proven understanding of data sensitivity (PHI) and the importance of secure, compliant system design (HIPAA, **EDI 837/277**) in a heavily regulated industry.

Future Scope: From Tool to System of Record

The architectural foundation supports significant future expansion:

Automated Code Suggestion

Integration of an LLM to read full clinical notes and suggest a pre-scrubbed set of CPT and ICD-10 codes.

Denial Triage & Benchmarking

Tracking historical denial patterns to continuously refine the scrubbing rules and benchmark coder performance.

Cost Estimation API

Integrating with payment APIs (e.g., Stripe, custom payment gateway) for accurate patient cost estimates before service.

Data Transparency & Disclaimer

For educational purposes, the data sources and financial models are detailed below:

  • ICD and HCPCS Search: Uses public NLM (National Library of Medicine) Clinical Tables application programming interfaces (APIs).
  • CPT Content: The CPT® code list is a handcrafted mock list to avoid proprietary material held by the AMA.
  • Financial and Payer Rules: All financial values, fee schedules, RVUs, and simulated payer denial rules are fictional and for demonstration only.

View the complete codebase

The source code for MedCode Pro is open-source and available on GitHub.

View Repository