UpNext AI

UpNext AI for June 18, 2026: today we’re tracking a reported clash between the White House and Anthropic over jailbreak-proofing a model rerelease, a big funding signal for world models as Odyssey hits a $1.45 billion valuation with Amazon among its backers, and a new benchmark testing whether AI agents can actually make useful preclinical pharmacology decisions. We also round out the show with quick headlines on OpenAI’s pre-launch failure prediction work, an AI chemist result from OpenAI and Molecule.one, and Google’s latest AMIE medical study.

Covered in this episode:
- The White House reportedly wants Anthropic to make Fable 5’s guardrails impossible to circumvent before any rerelease
- Odyssey reaches a $1.45 billion valuation in a Series B round with Amazon among the backers
- TxBench-PP tests AI agents on realistic small-molecule preclinical pharmacology decisions
- OpenAI researchers propose a way to predict how often models may fail before launch
- OpenAI and Molecule.one say a near-autonomous AI chemist improved a challenging medicinal chemistry reaction
- Google says new Nature research shows AMIE matched primary care physicians in complex disease management

Source links:
- WIRED: https://www.wired.com/story/the-white-house-wants-anthropic-to-block-all-jailbreaks-that-may-not-be-possible/
- TechCrunch: https://techcrunch.com/2026/06/17/world-model-maker-odyssey-nabs-1-45b-valuation-backed-by-amazon-and-other-big-names/
- arXiv (TxBench-PP): https://arxiv.org/abs/2606.19245v1
- The Decoder: https://the-decoder.com/openai-researchers-want-to-predict-how-often-ai-models-will-fail-before-launch/
- OpenAI: https://openai.com/index/ai-chemist-improves-reaction
- Google: https://blog.google/innovation-and-ai/models-and-research/google-research/amie-for-disease-management-in-nature/

What is UpNext AI?

Daily AI news and research, distilled. UpNext AI breaks down the most important developments in artificial intelligence—from major industry moves to cutting-edge papers.

Welcome to the UpNext AI podcast. It's Thursday, June 18th, 2026, and here's what matters in AI today.

First up, Anthropic is facing reported pressure from the White House over a possible rerelease of Fable 5. According to WIRED, Trump administration officials told the outlet that if Anthropic wants to rerelease Fable 5, it would need to ensure the model’s guardrails cannot be circumvented. The issue here is jailbreaks — prompts or techniques that get around a model’s safeguards. And the core tension in the story is that security experts quoted by WIRED say that standard may not actually be achievable. That makes this more than a company-specific dispute. It’s a revealing look at the gap between what policymakers may want from frontier model safety and what experts think is technically realistic. WIRED reports that the administration’s position is focused on Anthropic and this model rerelease specifically, not a generic conversation about safety in the abstract. The practical takeaway is pretty clear: expectations around model releases are getting more stringent, and the debate is shifting from whether jailbreaks matter to whether a frontier lab can ever fully eliminate them.

Next, a big financing signal in a different corner of AI: TechCrunch reports that world model startup Odyssey has reached a $1.45 billion valuation. The company, founded by self-driving veterans Oliver Cameron and Jeff Hawke, raised a $310 million Series B, according to TechCrunch, with Amazon among the backers. TechCrunch also reports that Odyssey has now raised $337 million to date. Why this matters is that world models keep getting pitched as a next layer beyond text-first large language models. In Odyssey’s case, TechCrunch says the company offers world models for use cases including video-game creation and robotics, and is known for producing interactive video from text prompts. There’s also a strategic infrastructure angle here. With Amazon as a backer, Odyssey says AWS is now its preferred cloud provider and that it will optimize its models for AWS Trainium chips, according to TechCrunch. So this is not just a venture valuation story. It’s also a sign that large cloud players still want to anchor the next wave of AI workloads, especially the ones that may move beyond pure text generation.

For the research section, a useful benchmark paper from earlier this week: TxBench-PP, short for TherapeuticsBench Preclinical Pharmacology. The paper looks at whether AI agents can help with small-molecule preclinical pharmacology decisions — in other words, the kinds of calls drug teams make before a candidate moves deeper into development. Instead of testing abstract trivia or literature recall, the benchmark is built around realistic program decisions using real-world assay data. According to the paper, TxBench-PP contains 100 evaluations spanning areas like mechanism-of-action reasoning, pharmacodynamic reasoning, target engagement, safety, and translational efficacy. Agents work through realistic workflow snapshots, inspect files in a coding environment, and return structured answers that are graded deterministically. And here’s the important result: across 16 model-harness configurations and 4,800 trajectories, the authors say no system reliably recovered preclinical pharmacology decisions. The strongest setup, Claude Opus 4.8 with Pi, passed 59.3 percent of endpoint attempts, while GPT-5.5 with Pi reached 55.3 percent. So the takeaway is straightforward: this kind of agent can be promising, but in a high-stakes drug-discovery setting, the best systems here were still far from dependable enough to treat as autonomous decision-makers.

...Are you building apps with voice? Elevate your app's voice capabilities with ElevenLabs. Their API is a game changer for embedding dynamic, responsive voice interactions in your applications, providing unprecedented realism, flexibility and latency. In fact, you're listening to one of their voices - right - now. If you are a developer looking to elevate user experience with natural voice interfaces, this is your solution. Visit up next dot fm slash eleven to check out their latest offerings. ...

OpenAI researchers are proposing a method to predict how often a model may fail after release, before launch. As summarized by The Decoder, the idea is to estimate post-deployment mistakes in a way that could fill gaps left by standard safety testing.

OpenAI and Molecule.one also say a near-autonomous AI chemist using GPT-5.4 improved a challenging reaction in medicinal chemistry. For now, the supported claim is narrow but interesting: a key drug-making reaction got better with the system in the loop.

And Google says new research published in Nature shows its conversational medical AI system, AMIE, matched primary care physicians in complex disease management. That’s Google’s framing, so it’s one to read with the study itself, but it’s notable as another sign that medical AI evaluation is moving beyond simple question answering.

Before we wrap up, a quick note: this podcast is generated with the assistance of AI and is intended for informational purposes only. All referenced articles, research, and commentary remain the property of their original authors and publishers.

If you enjoyed this episode, don't forget to subscribe, rate, and leave us a review! And that's your briefing for today. Full source links are in the episode notes, and we'll be back tomorrow with what's up next!

More episodes

Chapters

What is UpNext AI?