Building the autonomy layer for future healthcare systems

Published & Cited by

Building the autonomy layer for future healthcare systems

Published & Cited by

Featured publications and resources

Featured publications and resources

20 Jun 2025

Whitepaper

The Consensus Mechanism: Toward Trustworthy AI Collaboration

Defines Sully’s Consensus Mechanism, a protocol enabling AI agents to reach verifiable agreement through structured proposal-and-critique cycles, weighted scoring, and reputation tracking. This mechanism underpins transparent, multi-agent decision-making across healthcare, legal, and knowledge domains.

20 Jun 2025

Whitepaper

Scalable architecture for multi modal healthcare agents

Introduces Sully’s SuperAgent architecture — a composable ecosystem of isolated, self-contained agent packages. Each agent supports multimodal input (voice, web, phone, SMS) and integrates with common authentication, billing, and access layers.

20 Jun 2025

Whitepaper

QnA benchmarks LLM performance analysis

A large-scale benchmark across 12+ models and medical specialties. Finds O1 leading in overall accuracy (45%), with GPT-4.5-Preview dominating treatment tasks and O3-Mini excelling in lymphatic diagnosis. Recommends ensemble model use for real-world applications.

Topics covered

Topics covered

Research performance highlights

Research performance highlights

Improvement in clinical note quality with agentic workflows

17.3%

17.3%

Lower processing cost per note using optimized models

50%

50%

Greater efficiency achieved by open-source models

3.5×

3.5×

Increase in template compliance through automation

98%

98%

Insights from our data

Insights from our data

Model strengths vary by medical domain

Benchmarking 12+ models across medical specialties revealed clear domain strengths: O1 excelled in general medical Q&A, O3-MINI led in lymphatic diagnosis, and GPT-4.5-Preview dominated treatment and endocrine system tasks.

Agentic models deliver measurable gains

Agentic multi-agent workflows improved clinical note quality by 10–20% across five tested models — with GPT-OSS-120B achieving a 17.3% quality increase at half the baseline cost. This validates structured, multi-step reasoning pipelines for medical scribing tasks.

Papers and Reports

Papers and Reports

What Experts Say About Sully

What Experts Say About Sully

Partner with

Sully Labs

Work with our research team to design, test, and deploy

next-generation multi-agent AI.

Book a 30-min call

Partner with

Sully Labs

Work with our research team to design, test, and deploy

next-generation multi-agent AI.

Book a 30-min call

Partner with

Sully Labs

Work with our research team to design, test, and deploy next-generation multi-agent AI.

Book a 30-min call

Press and Citations

Press and Citations

The consensus mechanism

Second opinion matters: Towards adaptive clinical AI via

the consensus of expert model ensemble

The consensus mechanism

Second opinion matters: Towards adaptive clinical AI via

the consensus of expert model ensemble

The consensus mechanism

Second opinion matters: Towards adaptive clinical AI via the consensus of expert model ensemble

Join Sully!

Join Sully!