Building the autonomy layer for future healthcare systems
Published & Cited by




Building the autonomy layer for future healthcare systems
Published & Cited by




Featured publications and resources
Featured publications and resources



20 Jun 2025
Whitepaper
The Consensus Mechanism: Toward Trustworthy AI Collaboration
Defines Sully’s Consensus Mechanism, a protocol enabling AI agents to reach verifiable agreement through structured proposal-and-critique cycles, weighted scoring, and reputation tracking. This mechanism underpins transparent, multi-agent decision-making across healthcare, legal, and knowledge domains.



20 Jun 2025
Whitepaper
Scalable architecture for multi modal healthcare agents
Introduces Sully’s SuperAgent architecture — a composable ecosystem of isolated, self-contained agent packages. Each agent supports multimodal input (voice, web, phone, SMS) and integrates with common authentication, billing, and access layers.



20 Jun 2025
Whitepaper
QnA benchmarks LLM performance analysis
A large-scale benchmark across 12+ models and medical specialties. Finds O1 leading in overall accuracy (45%), with GPT-4.5-Preview dominating treatment tasks and O3-Mini excelling in lymphatic diagnosis. Recommends ensemble model use for real-world applications.
Topics covered
Topics covered

Research performance highlights
Research performance highlights
Improvement in clinical note quality with agentic workflows
17.3%
17.3%
Lower processing cost per note using optimized models
50%
50%
Greater efficiency achieved by open-source models
3.5×
3.5×
Increase in template compliance through automation
98%
98%
Insights from our data
Insights from our data
Model strengths vary by medical domain
Benchmarking 12+ models across medical specialties revealed clear domain strengths: O1 excelled in general medical Q&A, O3-MINI led in lymphatic diagnosis, and GPT-4.5-Preview dominated treatment and endocrine system tasks.



Agentic models deliver measurable gains
Agentic multi-agent workflows improved clinical note quality by 10–20% across five tested models — with GPT-OSS-120B achieving a 17.3% quality increase at half the baseline cost. This validates structured, multi-step reasoning pipelines for medical scribing tasks.



Papers and Reports
Papers and Reports

One shot eval report
A benchmark of 16 LLMs across 786 medical transcripts identifying Kimi-K2-Instruct as the most efficient model for single-shot note generation — 3.5× faster than GPT-4o at comparable quality. Highlights universal failures in template compliance and rising competitiveness of open-source models.

One shot model analytics
A visual analytics companion to the one-shot evaluation, revealing correlations among metrics like safety, accuracy, and completeness. Confirms Kimi-K2 and GPT-OSS-120B as statistically equivalent in balanced performance.

Sully AI scribing agent v1 research report
A direct comparison of agentic vs. one-shot scribing methods. Shows GPT-OSS-120B delivers +17.3% quality at half the cost, outperforming GPT-5 in practical deployments. Includes cost, latency, and template compliance metrics, validating multi-agent systems in production.

The Consensus mechanism: Toward trustworthy AI collaboration
Defines Sully’s Consensus Mechanism, a protocol enabling AI agents to reach verifiable agreement through structured proposal-and-critique cycles, weighted scoring, and reputation tracking. This mechanism underpins transparent, multi-agent decision-making across healthcare, legal, and knowledge domains.

Design agentic evals
System diagram detailing Sully’s evaluation stack — connecting system prompts, transcripts, templates, and outcome metrics. Defines how sub-agents score performance across eight dimensions such as reasoning, safety, and compliance.

QnA benchmarks LLM performance analysis
A large-scale benchmark across 12+ models and medical specialties. Finds O1 leading in overall accuracy (45%), with GPT-4.5-Preview dominating treatment tasks and O3-Mini excelling in lymphatic diagnosis. Recommends ensemble model use for real-world applications.

Scalable architecture for multi modal healthcare agents
Introduces Sully’s SuperAgent architecture — a composable ecosystem of isolated, self-contained agent packages. Each agent supports multimodal input (voice, web, phone, SMS) and integrates with common authentication, billing, and access layers.

One shot eval report
A benchmark of 16 LLMs across 786 medical transcripts identifying Kimi-K2-Instruct as the most efficient model for single-shot note generation — 3.5× faster than GPT-4o at comparable quality. Highlights universal failures in template compliance and rising competitiveness of open-source models.

One shot model analytics
A visual analytics companion to the one-shot evaluation, revealing correlations among metrics like safety, accuracy, and completeness. Confirms Kimi-K2 and GPT-OSS-120B as statistically equivalent in balanced performance.

Sully AI scribing agent v1 research report
A direct comparison of agentic vs. one-shot scribing methods. Shows GPT-OSS-120B delivers +17.3% quality at half the cost, outperforming GPT-5 in practical deployments. Includes cost, latency, and template compliance metrics, validating multi-agent systems in production.

The Consensus mechanism: Toward trustworthy AI collaboration
Defines Sully’s Consensus Mechanism, a protocol enabling AI agents to reach verifiable agreement through structured proposal-and-critique cycles, weighted scoring, and reputation tracking. This mechanism underpins transparent, multi-agent decision-making across healthcare, legal, and knowledge domains.

Design agentic evals
System diagram detailing Sully’s evaluation stack — connecting system prompts, transcripts, templates, and outcome metrics. Defines how sub-agents score performance across eight dimensions such as reasoning, safety, and compliance.

QnA benchmarks LLM performance analysis
A large-scale benchmark across 12+ models and medical specialties. Finds O1 leading in overall accuracy (45%), with GPT-4.5-Preview dominating treatment tasks and O3-Mini excelling in lymphatic diagnosis. Recommends ensemble model use for real-world applications.

Scalable architecture for multi modal healthcare agents
Introduces Sully’s SuperAgent architecture — a composable ecosystem of isolated, self-contained agent packages. Each agent supports multimodal input (voice, web, phone, SMS) and integrates with common authentication, billing, and access layers.

One shot eval report
A benchmark of 16 LLMs across 786 medical transcripts identifying Kimi-K2-Instruct as the most efficient model for single-shot note generation — 3.5× faster than GPT-4o at comparable quality. Highlights universal failures in template compliance and rising competitiveness of open-source models.

One shot model analytics
A visual analytics companion to the one-shot evaluation, revealing correlations among metrics like safety, accuracy, and completeness. Confirms Kimi-K2 and GPT-OSS-120B as statistically equivalent in balanced performance.

Sully AI scribing agent v1 research report
A direct comparison of agentic vs. one-shot scribing methods. Shows GPT-OSS-120B delivers +17.3% quality at half the cost, outperforming GPT-5 in practical deployments. Includes cost, latency, and template compliance metrics, validating multi-agent systems in production.

The Consensus mechanism: Toward trustworthy AI collaboration
Defines Sully’s Consensus Mechanism, a protocol enabling AI agents to reach verifiable agreement through structured proposal-and-critique cycles, weighted scoring, and reputation tracking. This mechanism underpins transparent, multi-agent decision-making across healthcare, legal, and knowledge domains.

Design agentic evals
System diagram detailing Sully’s evaluation stack — connecting system prompts, transcripts, templates, and outcome metrics. Defines how sub-agents score performance across eight dimensions such as reasoning, safety, and compliance.

QnA benchmarks LLM performance analysis
A large-scale benchmark across 12+ models and medical specialties. Finds O1 leading in overall accuracy (45%), with GPT-4.5-Preview dominating treatment tasks and O3-Mini excelling in lymphatic diagnosis. Recommends ensemble model use for real-world applications.

Scalable architecture for multi modal healthcare agents
Introduces Sully’s SuperAgent architecture — a composable ecosystem of isolated, self-contained agent packages. Each agent supports multimodal input (voice, web, phone, SMS) and integrates with common authentication, billing, and access layers.

One shot eval report
A benchmark of 16 LLMs across 786 medical transcripts identifying Kimi-K2-Instruct as the most efficient model for single-shot note generation — 3.5× faster than GPT-4o at comparable quality. Highlights universal failures in template compliance and rising competitiveness of open-source models.

One shot model analytics
A visual analytics companion to the one-shot evaluation, revealing correlations among metrics like safety, accuracy, and completeness. Confirms Kimi-K2 and GPT-OSS-120B as statistically equivalent in balanced performance.

Sully AI scribing agent v1 research report
A direct comparison of agentic vs. one-shot scribing methods. Shows GPT-OSS-120B delivers +17.3% quality at half the cost, outperforming GPT-5 in practical deployments. Includes cost, latency, and template compliance metrics, validating multi-agent systems in production.

The Consensus mechanism: Toward trustworthy AI collaboration
Defines Sully’s Consensus Mechanism, a protocol enabling AI agents to reach verifiable agreement through structured proposal-and-critique cycles, weighted scoring, and reputation tracking. This mechanism underpins transparent, multi-agent decision-making across healthcare, legal, and knowledge domains.

Design agentic evals
System diagram detailing Sully’s evaluation stack — connecting system prompts, transcripts, templates, and outcome metrics. Defines how sub-agents score performance across eight dimensions such as reasoning, safety, and compliance.

QnA benchmarks LLM performance analysis
A large-scale benchmark across 12+ models and medical specialties. Finds O1 leading in overall accuracy (45%), with GPT-4.5-Preview dominating treatment tasks and O3-Mini excelling in lymphatic diagnosis. Recommends ensemble model use for real-world applications.

Scalable architecture for multi modal healthcare agents
Introduces Sully’s SuperAgent architecture — a composable ecosystem of isolated, self-contained agent packages. Each agent supports multimodal input (voice, web, phone, SMS) and integrates with common authentication, billing, and access layers.

One shot eval report
A benchmark of 16 LLMs across 786 medical transcripts identifying Kimi-K2-Instruct as the most efficient model for single-shot note generation — 3.5× faster than GPT-4o at comparable quality. Highlights universal failures in template compliance and rising competitiveness of open-source models.

One shot model analytics
A visual analytics companion to the one-shot evaluation, revealing correlations among metrics like safety, accuracy, and completeness. Confirms Kimi-K2 and GPT-OSS-120B as statistically equivalent in balanced performance.

Sully AI scribing agent v1 research report
A direct comparison of agentic vs. one-shot scribing methods. Shows GPT-OSS-120B delivers +17.3% quality at half the cost, outperforming GPT-5 in practical deployments. Includes cost, latency, and template compliance metrics, validating multi-agent systems in production.

The Consensus mechanism: Toward trustworthy AI collaboration
Defines Sully’s Consensus Mechanism, a protocol enabling AI agents to reach verifiable agreement through structured proposal-and-critique cycles, weighted scoring, and reputation tracking. This mechanism underpins transparent, multi-agent decision-making across healthcare, legal, and knowledge domains.

Design agentic evals
System diagram detailing Sully’s evaluation stack — connecting system prompts, transcripts, templates, and outcome metrics. Defines how sub-agents score performance across eight dimensions such as reasoning, safety, and compliance.

QnA benchmarks LLM performance analysis
A large-scale benchmark across 12+ models and medical specialties. Finds O1 leading in overall accuracy (45%), with GPT-4.5-Preview dominating treatment tasks and O3-Mini excelling in lymphatic diagnosis. Recommends ensemble model use for real-world applications.

Scalable architecture for multi modal healthcare agents
Introduces Sully’s SuperAgent architecture — a composable ecosystem of isolated, self-contained agent packages. Each agent supports multimodal input (voice, web, phone, SMS) and integrates with common authentication, billing, and access layers.

One shot eval report
A benchmark of 16 LLMs across 786 medical transcripts identifying Kimi-K2-Instruct as the most efficient model for single-shot note generation — 3.5× faster than GPT-4o at comparable quality. Highlights universal failures in template compliance and rising competitiveness of open-source models.

One shot model analytics
A visual analytics companion to the one-shot evaluation, revealing correlations among metrics like safety, accuracy, and completeness. Confirms Kimi-K2 and GPT-OSS-120B as statistically equivalent in balanced performance.

Sully AI scribing agent v1 research report
A direct comparison of agentic vs. one-shot scribing methods. Shows GPT-OSS-120B delivers +17.3% quality at half the cost, outperforming GPT-5 in practical deployments. Includes cost, latency, and template compliance metrics, validating multi-agent systems in production.

The Consensus mechanism: Toward trustworthy AI collaboration
Defines Sully’s Consensus Mechanism, a protocol enabling AI agents to reach verifiable agreement through structured proposal-and-critique cycles, weighted scoring, and reputation tracking. This mechanism underpins transparent, multi-agent decision-making across healthcare, legal, and knowledge domains.

Design agentic evals
System diagram detailing Sully’s evaluation stack — connecting system prompts, transcripts, templates, and outcome metrics. Defines how sub-agents score performance across eight dimensions such as reasoning, safety, and compliance.

QnA benchmarks LLM performance analysis
A large-scale benchmark across 12+ models and medical specialties. Finds O1 leading in overall accuracy (45%), with GPT-4.5-Preview dominating treatment tasks and O3-Mini excelling in lymphatic diagnosis. Recommends ensemble model use for real-world applications.

Scalable architecture for multi modal healthcare agents
Introduces Sully’s SuperAgent architecture — a composable ecosystem of isolated, self-contained agent packages. Each agent supports multimodal input (voice, web, phone, SMS) and integrates with common authentication, billing, and access layers.

One shot eval report
A benchmark of 16 LLMs across 786 medical transcripts identifying Kimi-K2-Instruct as the most efficient model for single-shot note generation — 3.5× faster than GPT-4o at comparable quality. Highlights universal failures in template compliance and rising competitiveness of open-source models.

One shot model analytics
A visual analytics companion to the one-shot evaluation, revealing correlations among metrics like safety, accuracy, and completeness. Confirms Kimi-K2 and GPT-OSS-120B as statistically equivalent in balanced performance.

Sully AI scribing agent v1 research report
A direct comparison of agentic vs. one-shot scribing methods. Shows GPT-OSS-120B delivers +17.3% quality at half the cost, outperforming GPT-5 in practical deployments. Includes cost, latency, and template compliance metrics, validating multi-agent systems in production.

The Consensus mechanism: Toward trustworthy AI collaboration
Defines Sully’s Consensus Mechanism, a protocol enabling AI agents to reach verifiable agreement through structured proposal-and-critique cycles, weighted scoring, and reputation tracking. This mechanism underpins transparent, multi-agent decision-making across healthcare, legal, and knowledge domains.

Design agentic evals
System diagram detailing Sully’s evaluation stack — connecting system prompts, transcripts, templates, and outcome metrics. Defines how sub-agents score performance across eight dimensions such as reasoning, safety, and compliance.

QnA benchmarks LLM performance analysis
A large-scale benchmark across 12+ models and medical specialties. Finds O1 leading in overall accuracy (45%), with GPT-4.5-Preview dominating treatment tasks and O3-Mini excelling in lymphatic diagnosis. Recommends ensemble model use for real-world applications.

Scalable architecture for multi modal healthcare agents
Introduces Sully’s SuperAgent architecture — a composable ecosystem of isolated, self-contained agent packages. Each agent supports multimodal input (voice, web, phone, SMS) and integrates with common authentication, billing, and access layers.

One shot eval report
A benchmark of 16 LLMs across 786 medical transcripts identifying Kimi-K2-Instruct as the most efficient model for single-shot note generation — 3.5× faster than GPT-4o at comparable quality. Highlights universal failures in template compliance and rising competitiveness of open-source models.

One shot model analytics
A visual analytics companion to the one-shot evaluation, revealing correlations among metrics like safety, accuracy, and completeness. Confirms Kimi-K2 and GPT-OSS-120B as statistically equivalent in balanced performance.

Sully AI scribing agent v1 research report
A direct comparison of agentic vs. one-shot scribing methods. Shows GPT-OSS-120B delivers +17.3% quality at half the cost, outperforming GPT-5 in practical deployments. Includes cost, latency, and template compliance metrics, validating multi-agent systems in production.

The Consensus mechanism: Toward trustworthy AI collaboration
Defines Sully’s Consensus Mechanism, a protocol enabling AI agents to reach verifiable agreement through structured proposal-and-critique cycles, weighted scoring, and reputation tracking. This mechanism underpins transparent, multi-agent decision-making across healthcare, legal, and knowledge domains.

Design agentic evals
System diagram detailing Sully’s evaluation stack — connecting system prompts, transcripts, templates, and outcome metrics. Defines how sub-agents score performance across eight dimensions such as reasoning, safety, and compliance.

QnA benchmarks LLM performance analysis
A large-scale benchmark across 12+ models and medical specialties. Finds O1 leading in overall accuracy (45%), with GPT-4.5-Preview dominating treatment tasks and O3-Mini excelling in lymphatic diagnosis. Recommends ensemble model use for real-world applications.

Scalable architecture for multi modal healthcare agents
Introduces Sully’s SuperAgent architecture — a composable ecosystem of isolated, self-contained agent packages. Each agent supports multimodal input (voice, web, phone, SMS) and integrates with common authentication, billing, and access layers.

One shot eval report
A benchmark of 16 LLMs across 786 medical transcripts identifying Kimi-K2-Instruct as the most efficient model for single-shot note generation — 3.5× faster than GPT-4o at comparable quality. Highlights universal failures in template compliance and rising competitiveness of open-source models.

One shot model analytics
A visual analytics companion to the one-shot evaluation, revealing correlations among metrics like safety, accuracy, and completeness. Confirms Kimi-K2 and GPT-OSS-120B as statistically equivalent in balanced performance.

Sully AI scribing agent v1 research report
A direct comparison of agentic vs. one-shot scribing methods. Shows GPT-OSS-120B delivers +17.3% quality at half the cost, outperforming GPT-5 in practical deployments. Includes cost, latency, and template compliance metrics, validating multi-agent systems in production.

The Consensus mechanism: Toward trustworthy AI collaboration
Defines Sully’s Consensus Mechanism, a protocol enabling AI agents to reach verifiable agreement through structured proposal-and-critique cycles, weighted scoring, and reputation tracking. This mechanism underpins transparent, multi-agent decision-making across healthcare, legal, and knowledge domains.

Design agentic evals
System diagram detailing Sully’s evaluation stack — connecting system prompts, transcripts, templates, and outcome metrics. Defines how sub-agents score performance across eight dimensions such as reasoning, safety, and compliance.

QnA benchmarks LLM performance analysis
A large-scale benchmark across 12+ models and medical specialties. Finds O1 leading in overall accuracy (45%), with GPT-4.5-Preview dominating treatment tasks and O3-Mini excelling in lymphatic diagnosis. Recommends ensemble model use for real-world applications.

Scalable architecture for multi modal healthcare agents
Introduces Sully’s SuperAgent architecture — a composable ecosystem of isolated, self-contained agent packages. Each agent supports multimodal input (voice, web, phone, SMS) and integrates with common authentication, billing, and access layers.

One shot eval report
A benchmark of 16 LLMs across 786 medical transcripts identifying Kimi-K2-Instruct as the most efficient model for single-shot note generation — 3.5× faster than GPT-4o at comparable quality. Highlights universal failures in template compliance and rising competitiveness of open-source models.

One shot model analytics
A visual analytics companion to the one-shot evaluation, revealing correlations among metrics like safety, accuracy, and completeness. Confirms Kimi-K2 and GPT-OSS-120B as statistically equivalent in balanced performance.

Sully AI scribing agent v1 research report
A direct comparison of agentic vs. one-shot scribing methods. Shows GPT-OSS-120B delivers +17.3% quality at half the cost, outperforming GPT-5 in practical deployments. Includes cost, latency, and template compliance metrics, validating multi-agent systems in production.

The Consensus mechanism: Toward trustworthy AI collaboration
Defines Sully’s Consensus Mechanism, a protocol enabling AI agents to reach verifiable agreement through structured proposal-and-critique cycles, weighted scoring, and reputation tracking. This mechanism underpins transparent, multi-agent decision-making across healthcare, legal, and knowledge domains.

Design agentic evals
System diagram detailing Sully’s evaluation stack — connecting system prompts, transcripts, templates, and outcome metrics. Defines how sub-agents score performance across eight dimensions such as reasoning, safety, and compliance.

QnA benchmarks LLM performance analysis
A large-scale benchmark across 12+ models and medical specialties. Finds O1 leading in overall accuracy (45%), with GPT-4.5-Preview dominating treatment tasks and O3-Mini excelling in lymphatic diagnosis. Recommends ensemble model use for real-world applications.

Scalable architecture for multi modal healthcare agents
Introduces Sully’s SuperAgent architecture — a composable ecosystem of isolated, self-contained agent packages. Each agent supports multimodal input (voice, web, phone, SMS) and integrates with common authentication, billing, and access layers.

One shot eval report
A benchmark of 16 LLMs across 786 medical transcripts identifying Kimi-K2-Instruct as the most efficient model for single-shot note generation — 3.5× faster than GPT-4o at comparable quality. Highlights universal failures in template compliance and rising competitiveness of open-source models.

One shot model analytics
A visual analytics companion to the one-shot evaluation, revealing correlations among metrics like safety, accuracy, and completeness. Confirms Kimi-K2 and GPT-OSS-120B as statistically equivalent in balanced performance.

Sully AI scribing agent v1 research report
A direct comparison of agentic vs. one-shot scribing methods. Shows GPT-OSS-120B delivers +17.3% quality at half the cost, outperforming GPT-5 in practical deployments. Includes cost, latency, and template compliance metrics, validating multi-agent systems in production.

The Consensus mechanism: Toward trustworthy AI collaboration
Defines Sully’s Consensus Mechanism, a protocol enabling AI agents to reach verifiable agreement through structured proposal-and-critique cycles, weighted scoring, and reputation tracking. This mechanism underpins transparent, multi-agent decision-making across healthcare, legal, and knowledge domains.

Design agentic evals
System diagram detailing Sully’s evaluation stack — connecting system prompts, transcripts, templates, and outcome metrics. Defines how sub-agents score performance across eight dimensions such as reasoning, safety, and compliance.

QnA benchmarks LLM performance analysis
A large-scale benchmark across 12+ models and medical specialties. Finds O1 leading in overall accuracy (45%), with GPT-4.5-Preview dominating treatment tasks and O3-Mini excelling in lymphatic diagnosis. Recommends ensemble model use for real-world applications.

Scalable architecture for multi modal healthcare agents
Introduces Sully’s SuperAgent architecture — a composable ecosystem of isolated, self-contained agent packages. Each agent supports multimodal input (voice, web, phone, SMS) and integrates with common authentication, billing, and access layers.

One shot eval report
A benchmark of 16 LLMs across 786 medical transcripts identifying Kimi-K2-Instruct as the most efficient model for single-shot note generation — 3.5× faster than GPT-4o at comparable quality. Highlights universal failures in template compliance and rising competitiveness of open-source models.

One shot model analytics
A visual analytics companion to the one-shot evaluation, revealing correlations among metrics like safety, accuracy, and completeness. Confirms Kimi-K2 and GPT-OSS-120B as statistically equivalent in balanced performance.

Sully AI scribing agent v1 research report
A direct comparison of agentic vs. one-shot scribing methods. Shows GPT-OSS-120B delivers +17.3% quality at half the cost, outperforming GPT-5 in practical deployments. Includes cost, latency, and template compliance metrics, validating multi-agent systems in production.

The Consensus mechanism: Toward trustworthy AI collaboration
Defines Sully’s Consensus Mechanism, a protocol enabling AI agents to reach verifiable agreement through structured proposal-and-critique cycles, weighted scoring, and reputation tracking. This mechanism underpins transparent, multi-agent decision-making across healthcare, legal, and knowledge domains.

Design agentic evals
System diagram detailing Sully’s evaluation stack — connecting system prompts, transcripts, templates, and outcome metrics. Defines how sub-agents score performance across eight dimensions such as reasoning, safety, and compliance.

QnA benchmarks LLM performance analysis
A large-scale benchmark across 12+ models and medical specialties. Finds O1 leading in overall accuracy (45%), with GPT-4.5-Preview dominating treatment tasks and O3-Mini excelling in lymphatic diagnosis. Recommends ensemble model use for real-world applications.

Scalable architecture for multi modal healthcare agents
Introduces Sully’s SuperAgent architecture — a composable ecosystem of isolated, self-contained agent packages. Each agent supports multimodal input (voice, web, phone, SMS) and integrates with common authentication, billing, and access layers.
What Experts Say About Sully
What Experts Say About Sully
Sully’s agentic framework redefines medical documentation. The accuracy and structure it brings to clinical scribing are unmatched.

Dr. Maya Linton
Chief Clinical Informatics Officer, Veritas Health Systems
Their evaluation methodology sets a new benchmark for transparency in LLM performance. Sully’s data-driven rigor is exactly what the field needs.

Prof. Daniel Cho
Director of Applied AI Research, Stanford MedX Lab
The modular design of Sully’s SuperAgent system proves that scalability and safety can coexist. It’s a model for real-world AI deployment.

Arjun Patel
Head of AI Architecture, Nimbus Biotech
Partner with
Sully Labs
Work with our research team to design, test, and deploy
next-generation multi-agent AI.
Book a 30-min call
Partner with
Sully Labs
Work with our research team to design, test, and deploy
next-generation multi-agent AI.
Book a 30-min call
Partner with
Sully Labs
Work with our research team to design, test, and deploy next-generation multi-agent AI.
Book a 30-min call
Press and Citations
Press and Citations
Corem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vulputate libero et velit interdum, ac aliquet odio mattis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos.
Worem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vulputate libero et velit interdum, ac aliquet odio mattis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Curabitur tempus urna at turpis condimentum lobortis. Ut commodo efficitur neque.
Neil J. Rubenking
12th aug, 2025
Corem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vulputate libero et velit interdum, ac aliquet odio mattis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos.
Worem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vulputate libero et velit interdum, ac aliquet odio mattis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Curabitur tempus urna at turpis condimentum lobortis. Ut commodo efficitur neque.
David Pierce
07th Jul, 2024
Corem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vulputate libero et velit interdum, ac aliquet odio mattis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos.
Worem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vulputate libero et velit interdum, ac aliquet odio mattis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Curabitur tempus urna at turpis condimentum lobortis. Ut commodo efficitur neque.
Andrew Milich
11th Jan, 2024
Corem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vulputate libero et velit interdum, ac aliquet odio mattis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos.
Worem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vulputate libero et velit interdum, ac aliquet odio mattis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Curabitur tempus urna at turpis condimentum lobortis. Ut commodo efficitur neque.
Neil J. Rubenking
12th aug, 2025
Corem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vulputate libero et velit interdum, ac aliquet odio mattis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos.
Worem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vulputate libero et velit interdum, ac aliquet odio mattis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Curabitur tempus urna at turpis condimentum lobortis. Ut commodo efficitur neque.
David Pierce
07th Jul, 2024
Corem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vulputate libero et velit interdum, ac aliquet odio mattis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos.
Worem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vulputate libero et velit interdum, ac aliquet odio mattis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Curabitur tempus urna at turpis condimentum lobortis. Ut commodo efficitur neque.
Andrew Milich
11th Jan, 2024
Corem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vulputate libero et velit interdum, ac aliquet odio mattis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos.
Worem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vulputate libero et velit interdum, ac aliquet odio mattis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Curabitur tempus urna at turpis condimentum lobortis. Ut commodo efficitur neque.
Neil J. Rubenking
12th aug, 2025
Corem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vulputate libero et velit interdum, ac aliquet odio mattis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos.
Worem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vulputate libero et velit interdum, ac aliquet odio mattis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Curabitur tempus urna at turpis condimentum lobortis. Ut commodo efficitur neque.
David Pierce
07th Jul, 2024
Corem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vulputate libero et velit interdum, ac aliquet odio mattis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos.
Worem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vulputate libero et velit interdum, ac aliquet odio mattis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Curabitur tempus urna at turpis condimentum lobortis. Ut commodo efficitur neque.
Andrew Milich
11th Jan, 2024
Corem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vulputate libero et velit interdum, ac aliquet odio mattis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos.
Worem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vulputate libero et velit interdum, ac aliquet odio mattis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Curabitur tempus urna at turpis condimentum lobortis. Ut commodo efficitur neque.
Neil J. Rubenking
12th aug, 2025
Corem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vulputate libero et velit interdum, ac aliquet odio mattis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos.
Worem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vulputate libero et velit interdum, ac aliquet odio mattis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Curabitur tempus urna at turpis condimentum lobortis. Ut commodo efficitur neque.
David Pierce
07th Jul, 2024
Corem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vulputate libero et velit interdum, ac aliquet odio mattis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos.
Worem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vulputate libero et velit interdum, ac aliquet odio mattis. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Curabitur tempus urna at turpis condimentum lobortis. Ut commodo efficitur neque.
Andrew Milich
11th Jan, 2024
The consensus mechanism
Second opinion matters: Towards adaptive clinical AI via
the consensus of expert model ensemble
The consensus mechanism
Second opinion matters: Towards adaptive clinical AI via
the consensus of expert model ensemble
The consensus mechanism
Second opinion matters: Towards adaptive clinical AI via the consensus of expert model ensemble
Resources
© Sully AI 2025. All Rights Reserved.
Epic is a registered trademark of Epic Systems Corporation.
Resources
© Sully AI 2025. All Rights Reserved.
Epic is a registered trademark of Epic Systems Corporation.
Resources
© Sully AI 2025. All Rights Reserved.
Epic is a registered trademark of Epic Systems Corporation.
