Building the autonomy layer for future healthcare systems

Published or cited by

Building the autonomy layer for future healthcare systems

Published or cited by

Where our team

comes from

Where our team

comes from

20+

20+

Patents Filed

10+

10+

Academic Papers Published

12

12

Papers Cited

Research performance highlights

Research performance highlights

Improvement in clinical note quality with agentic workflows

17.3%

17.3%

Lower processing cost per note using optimized models

50%

50%

Decreased hallucinations using agents in real clinical practice

.0328%

.0328%

Increase in template compliance through automation

98%

98%

Active research topics

Active research topics

Partner with

Sully Labs

Work with our research team to design, test, and deploy

next-generation multi-agent AI.

Book a 30-min call

Partner with

Sully Labs

Work with our research team to design, test, and deploy

next-generation multi-agent AI.

Book a 30-min call

Partner with

Sully Labs

Work with our research team to design, test, and deploy next-generation multi-agent AI.

Book a 30-min call

Insights from our data

Insights from our data

Model strengths vary by medical domain

Benchmarking 12+ models across medical specialties revealed clear domain strengths: O1 excelled in general medical Q&A, O3-MINI led in lymphatic diagnosis, and GPT-4.5-Preview dominated treatment and endocrine system tasks.

Benchmarking 12+ models across medical specialties revealed clear domain strengths: O1 excelled in general medical Q&A, O3-MINI led in lymphatic diagnosis, and GPT-4.5-Preview dominated treatment and endocrine system tasks.

Agentic models deliver measurable gains

Agentic multi-agent workflows improved clinical note quality by 10–20% across five tested models — with GPT-OSS-120B achieving a 17.3% quality increase at half the baseline cost. This validates structured, multi-step reasoning pipelines for medical scribing tasks.

Agentic multi-agent workflows improved clinical note quality by 10–20% across five tested models — with GPT-OSS-120B achieving a 17.3% quality increase at half the baseline cost. This validates structured, multi-step reasoning pipelines for medical scribing tasks.

Press and Citations

Press and Citations

The consensus mechanism

Second opinion matters: Towards adaptive clinical AI via

the consensus of expert model ensemble

The consensus mechanism

Second opinion matters: Towards adaptive clinical AI via

the consensus of expert model ensemble

The consensus mechanism

Second opinion matters: Towards adaptive clinical AI via the consensus of expert model ensemble

Join Sully!

Join Sully!