[00:00] Announcer: From Neural Newscast, this is Model Behavior,
[00:03] Announcer: AI-focused news and analysis on the models shaping our world.
[00:09] Nina Park: Welcome to Model Behavior.
[00:13] Nina Park: We examine how AI systems are built and operated in professional environments.
[00:19] Announcer: Nina, it has been an unusually dense week for model releases.
[00:24] Announcer: OpenAI just shipped GPT-5.4 yesterday, and it is clearly built for high-demand workloads.
[00:32] Thatcher Collins: Exactly, Thatcher. GPT-5.4 is being framed as their most capable model for professional work.
[00:40] Thatcher Collins: It introduces native computer use, allowing agents to navigate desktops using mouse and keyboard input.
[00:47] Thatcher Collins: It hit a 75% success rate on the OS World benchmark, which actually puts it ahead of the reported human performance of 72.4%.
[00:59] Announcer: That is a significant metric, Nina, but we have to be careful with the framing.
[01:04] Announcer: Even with these wins, Mercor's Octiac's benchmark shows that frontier models are still failing 75% of professional tasks on the first try.
[01:13] Announcer: Brendan Foody at Merkour compared these models to an intern who only gets it right a quarter of the time.
[01:19] Announcer: It is progress, but it is not yet total reliability.
[01:23] Thatcher Collins: It is a fair caveat.
[01:25] Thatcher Collins: OpenAI is also trying to fix its social presence.
[01:29] Thatcher Collins: Earlier this week, they released GPT 5.3 Instant as the new default.
[01:33] Thatcher Collins: it is specifically tuned to reduce the cringe,
[01:37] Thatcher Collins: meaning fewer over-the-top safety disclaimers
[01:40] Thatcher Collins: and more direct answers for harmless prompts.
[01:43] Announcer: The social engineering is one thing,
[01:45] Announcer: but the geopolitical engineering is much more tense.
[01:48] Announcer: Today, the United States government labeled Anthropic a supply chain risk.
[01:53] Announcer: This is the first time a United States company has been given that label,
[01:57] Announcer: and it stems from Anthropic's refusal to sign a military deal with the Pentagon.
[02:01] Thatcher Collins: Dario Amode is calling that label legally unsound.
[02:05] Thatcher Collins: Interestingly, Anthropic is seeing a massive surge in users, a million new signups a day.
[02:11] Thatcher Collins: A leaked memo suggests many are leaving ChatGPT specifically because of OpenAI's own deal with the Department of Defense.
[02:18] Announcer: The contrast between the two companies' stances is driving a clear wedge in the agent base.
[02:24] Announcer: While that plays out, we are also seeing AI integrate into the creative workflow.
[02:29] Announcer: Netflix just acquired Interpositive, the AI startup founded by Ben Affleck.
[02:35] Nina Park: It is worth noting this isn't about text-to-video generation.
[02:39] Nina Park: Interpositive focuses on post-production tools for dailies, like removing stunt wires or color correction.
[02:46] Nina Park: Affleck is staying on as a senior advisor, and he says the goal is protecting human creativity
[02:52] Nina Park: by automating the logistics.
[02:54] Announcer: Thank you for listening to Model Behavior, a neural newscast editorial segment.
[03:00] Announcer: Visit mb.neuralnewscast.com.
[03:05] Announcer: Neural newscast is AI-assisted, human-reviewed.
[03:10] Announcer: view our AI transparency policy at neuralnewscast.com.
[03:16] Announcer: This has been Model Behavior on Neural Newscast.
[03:20] Announcer: Examining the systems behind the story.
[03:22] Announcer: Neural Newscast uses artificial intelligence in content creation,
[03:26] Announcer: with human editorial review prior to publication.
[03:29] Announcer: While we strive for factual, unbiased reporting,
[03:32] Announcer: AI-assisted content may occasionally contain errors.
[03:35] Announcer: Verify critical information with trusted sources.
[03:39] SPEAKER_03: Learn more at neuralnewscast.com.