The Consensus Mechanism

The Consensus Mechanism

How Sully AI built a multi-expert AI system that outperforms the world's best models at clinical reasoning.

·

Amit Kumthekar

·

1 min read

Clinical AI is hitting a wall. A single model can be deprecated overnight, costs shift without warning, and even the best LLMs still get complex medical reasoning wrong — the kind that matters most at the bedside.

Inside this paper, you'll get the research every clinical AI team needs to understand what's possible today:

  • Why Sully's Consensus Mechanism outperforms OpenAI's o3 and Google's Gemini 2.5 Pro across every major medical benchmark tested

  • How a triage-inspired architecture routes queries to specialist AI experts — mirroring the way real clinical teams actually make decisions

  • The real accuracy gap: 61.2% vs. 53.5% on MedXpertQA, the benchmark specifically designed for complex clinical reasoning (not just medical trivia)

  • Why single models are systematically overconfident — and how ensemble consensus produces calibration you can actually trust in high-stakes settings

  • How the modular design lets you swap in newer models without rebuilding — so your clinical AI doesn't go stale every six months

If you're building, deploying, or evaluating AI for clinical decision support, this is the architecture paper you need to read first.

RELATED AGENTS

AI Consultant

AI Scribe

Dive Deeper

Download the full paper for more information about this article.

Ready for the

future of healthcare?

Ready for the

future of healthcare?