Hosts: Chris Novak & Maya Johnson
In this episode:
• Today we're covering HealthFormer, the AI that models human physiology like never before, trust issues with vision-language models in medicine, and ho...
• Starting with HealthFormer — this is genuinel
Daily AI news for healthcare professionals. Two expert hosts cover how artificial intelligence is changing medicine, diagnostics, drug discovery, and patient care.
Chris Novak: Welcome to Pivot Health! I'm Chris—
Maya Johnson: —and I'm Maya. Let's get into it.
Chris Novak: Today we're covering HealthFormer, the AI that models human physiology like never before, trust issues with vision-language models in medicine, and how Stanford's governing their EHR-embedded AI agents.
Maya Johnson: Starting with HealthFormer — this is genuinely groundbreaking stuff. Researchers just dropped a generative AI model that can predict how your entire body changes over time.
Chris Novak: Yeah, and we're not talking about just blood pressure here. This thing models 667 different health measurements — everything from your gut microbiome to continuous glucose monitoring to sleep patterns. They trained it on 15,000 people from the Human Phenotype Project.
Maya Johnson: What's wild is they're using a decoder-only transformer — basically the same architecture as ChatGPT — but instead of predicting the next word, it's predicting your next health state. As a clinician, I think this could revolutionize how we think about preventive medicine.
Chris Novak: The key innovation here is that it's generative. You can literally ask it 'what happens if this patient starts exercising three times a week?' and it'll simulate their entire physiological trajectory. No more one-size-fits-all treatment plans.
Maya Johnson: Right, but here's my concern — we're essentially creating digital twins of human physiology. The ethical implications are massive. Who owns these predictions? What if insurance companies get access?
Chris Novak: Valid points, but think about the upside. We could finally crack why some people respond to statins and others don't. Why some diets work for certain individuals. This is precision medicine at scale.
Maya Johnson: I'm cautiously optimistic. If they can validate these predictions in real-world clinical trials, we're looking at a complete paradigm shift in how we approach chronic disease management.
Chris Novak: Moving to our second story — and this one's concerning. Researchers just audited five frontier vision-language models for medical image analysis, and the results are... not great.
Maya Johnson: Not great is putting it mildly. These are supposed to be the best AI models we have — Gemini 2.5 Pro, GPT-5, the works — and they're failing at basic medical tasks. The best model only achieved 23% accuracy in localizing anatomical structures.
Chris Novak: The laterality confusion is what really gets me. These models literally can't tell left from right consistently in medical images. That's not just bad performance — that's clinically dangerous.
Maya Johnson: Exactly. Imagine an AI suggesting surgery on the wrong kidney because it confused left and right. What's worse is when they tested a self-grounding pipeline — where the model first identifies what it's looking at, then answers questions — accuracy actually dropped.
Chris Novak: The format compliance failures are staggering too. When given a two-step prompt, GPT-5 and Gemini failed to follow instructions 70 to 99 percent of the time. These aren't edge cases — these are fundamental reliability issues.
Maya Johnson: This is why I keep saying we need medical-specific AI models, not general-purpose ones retrofitted for healthcare. The stakes are too high for 'good enough.'
Chris Novak: Honestly, this feels like a reality check for the whole industry. We've been so focused on capability that we forgot about reliability. You can't move fast and break things when those things are human bodies.
Maya Johnson: Our final big story offers some hope though. Stanford's showing how to actually govern AI in clinical settings with their Hyperscribe system.
Chris Novak: This is textbook responsible AI deployment. Hyperscribe is an EHR-embedded agent that converts doctor-patient conversations into structured chart updates. But here's what's impressive — they built an entire governance framework around it.
Maya Johnson: Twenty clinicians created over 1,600 evaluation rubrics across 823 cases. They're not just deploying and hoping for the best — they're continuously monitoring, getting live feedback, and gating every single update through controlled experiments.
Chris Novak: The results speak for themselves. Through seven iterations, median performance scores jumped from 84% to 95%. And get this — user feedback shifted from 79% error reports to 45% positive observations. That's what happens when you actually listen to your users.
Maya Johnson: What I love is they're tracking everything — technical performance, costs, user satisfaction. This isn't sexy, cutting-edge AI research. It's the unglamorous work of making AI actually useful in healthcare.
Chris Novak: It's a blueprint for how every healthcare AI deployment should work. You don't just ship it and forget it. You create feedback loops, you iterate, you govern. This is how we build trust in clinical AI.
Maya Johnson: And trust is everything in medicine. Patients need to trust their doctors, doctors need to trust their tools. This governance model shows that's actually achievable with AI.
Chris Novak: That's your Pivot Health briefing for May 2, 2026. Stay healthy, stay informed, Chris—
Maya Johnson: —and I'm Maya. To better outcomes, Maya. See you tomorrow.