Session

Doubling Medical Safety: Fine-Tuning Open LLMs for Women's Health Without Human Labels

Overview

ExperienceIn Person
TrackArtificial Intelligence & Agents
IndustryHealthcare & Life Sciences
TechnologiesAI/BI, Databricks Apps, Agent Bricks
Skill LevelIntermediate

Enterprises building LLM features in healthcare hit the same wall: satisfying dozens of safety rules simultaneously—crisis escalation, treatment boundaries, referral language—while real user data is off-limits and expert labeling is prohibitively expensive.We'll show how Flo Health broke through using RFT-inspired synthetic fine-tuning, transforming Llama 3.3 70B into a healthcare-compliant assistant for women's health that doubled safety compliance versus our previous iteration. The key insight: instead of investing expert time in labeling, we redirected it into designing LLM judges that scale.Our system uses 60 LLM judges—52 for medical safety, 8 for usefulness—with priority-weighted reward aggregation where P1 safety rules dominate over P2 quality rules. You'll learn patterns for multi-judge evaluation systems, reward aggregation strategies for binary constraints, and why simpler approaches beat complex alternatives. For anyone building AI where "mostly safe" isn't good enough.

Session Speakers

Vladislav Nedosekin

/Director of Engineering - AI Platform
Flo Health

Speaker placeholderIMAGE COMING SOON

Michael Shtelma

/Lead Product Specialist - GenAI
Databricks

Andras Meczner

/Director of Medical Accuracy & Safety
Flo Health