Building the foundation of an autonomous healthcare system

Building the foundation of an autonomous healthcare system

Engineers, researchers, and clinicians from...

Building the foundation of an autonomous healthcare system

Engineers, researchers, and clinicians from...

Research performance highlights

Research performance highlights

Improvement in clinical note quality with agentic workflows

17.3%

17.3%

17.3%

Lower processing cost per note using optimized models

50%

50%

50%

Decreased hallucinations using agents in real clinical practice

.0328%

.0328%

.0328%

Increase in template compliance through automation

98%

98%

98%

Partner with

Sully Labs

Work with our research team to design, test, and deploy

next-generation multi-agent AI.

Book a 30-min call

Partner with

Sully Labs

Work with our research team to design, test, and deploy

next-generation multi-agent AI.

Book a 30-min call

Partner with

Sully Labs

Work with our research team to design, test, and deploy next-generation multi-agent AI.

Book a 30-min call

Insights from our data

Insights from our data

Model strengths vary by medical domain

Benchmarking 12+ models across medical specialties revealed clear domain strengths: O1 excelled in general medical Q&A, O3-MINI led in lymphatic diagnosis, and GPT-4.5-Preview dominated treatment and endocrine system tasks.

Benchmarking 12+ models across medical specialties revealed clear domain strengths: O1 excelled in general medical Q&A, O3-MINI led in lymphatic diagnosis, and GPT-4.5-Preview dominated treatment and endocrine system tasks.

Agentic models deliver measurable gains

Agentic multi-agent workflows improved clinical note quality by 10–20% across five tested models — with GPT-OSS-120B achieving a 17.3% quality increase at half the baseline cost. This validates structured, multi-step reasoning pipelines for medical scribing tasks.

Agentic multi-agent workflows improved clinical note quality by 10–20% across five tested models — with GPT-OSS-120B achieving a 17.3% quality increase at half the baseline cost. This validates structured, multi-step reasoning pipelines for medical scribing tasks.

Press and Citations

Press and Citations

The consensus mechanism

Second opinion matters: Towards adaptive clinical AI via

the consensus of expert model ensemble

The consensus mechanism

Second opinion matters: Towards adaptive clinical AI via

the consensus of expert model ensemble

The consensus mechanism

Second opinion matters: Towards adaptive clinical AI via the consensus of expert model ensemble

Join Sully!

Join Sully!