[00:00] Announcer: From Neural Newscast, this is Model Behavior, [00:03] Announcer: AI-focused news and analysis on the models shaping our world. [00:09] Nina Park: Welcome to Model Behavior. [00:13] Nina Park: We examine how AI systems are built and operated in professional environments. [00:19] Announcer: Nina, it has been an unusually dense week for model releases. [00:24] Announcer: OpenAI just shipped GPT-5.4 yesterday, and it is clearly built for high-demand workloads. [00:32] Thatcher Collins: Exactly, Thatcher. GPT-5.4 is being framed as their most capable model for professional work. [00:40] Thatcher Collins: It introduces native computer use, allowing agents to navigate desktops using mouse and keyboard input. [00:47] Thatcher Collins: It hit a 75% success rate on the OS World benchmark, which actually puts it ahead of the reported human performance of 72.4%. [00:59] Announcer: That is a significant metric, Nina, but we have to be careful with the framing. [01:04] Announcer: Even with these wins, Mercor's Octiac's benchmark shows that frontier models are still failing 75% of professional tasks on the first try. [01:13] Announcer: Brendan Foody at Merkour compared these models to an intern who only gets it right a quarter of the time. [01:19] Announcer: It is progress, but it is not yet total reliability. [01:23] Thatcher Collins: It is a fair caveat. [01:25] Thatcher Collins: OpenAI is also trying to fix its social presence. [01:29] Thatcher Collins: Earlier this week, they released GPT 5.3 Instant as the new default. [01:33] Thatcher Collins: it is specifically tuned to reduce the cringe, [01:37] Thatcher Collins: meaning fewer over-the-top safety disclaimers [01:40] Thatcher Collins: and more direct answers for harmless prompts. [01:43] Announcer: The social engineering is one thing, [01:45] Announcer: but the geopolitical engineering is much more tense. [01:48] Announcer: Today, the United States government labeled Anthropic a supply chain risk. [01:53] Announcer: This is the first time a United States company has been given that label, [01:57] Announcer: and it stems from Anthropic's refusal to sign a military deal with the Pentagon. [02:01] Thatcher Collins: Dario Amode is calling that label legally unsound. [02:05] Thatcher Collins: Interestingly, Anthropic is seeing a massive surge in users, a million new signups a day. [02:11] Thatcher Collins: A leaked memo suggests many are leaving ChatGPT specifically because of OpenAI's own deal with the Department of Defense. [02:18] Announcer: The contrast between the two companies' stances is driving a clear wedge in the agent base. [02:24] Announcer: While that plays out, we are also seeing AI integrate into the creative workflow. [02:29] Announcer: Netflix just acquired Interpositive, the AI startup founded by Ben Affleck. [02:35] Nina Park: It is worth noting this isn't about text-to-video generation. [02:39] Nina Park: Interpositive focuses on post-production tools for dailies, like removing stunt wires or color correction. [02:46] Nina Park: Affleck is staying on as a senior advisor, and he says the goal is protecting human creativity [02:52] Nina Park: by automating the logistics. [02:54] Announcer: Thank you for listening to Model Behavior, a neural newscast editorial segment. [03:00] Announcer: Visit mb.neuralnewscast.com. [03:05] Announcer: Neural newscast is AI-assisted, human-reviewed. [03:10] Announcer: view our AI transparency policy at neuralnewscast.com. [03:16] Announcer: This has been Model Behavior on Neural Newscast. [03:20] Announcer: Examining the systems behind the story. [03:22] Announcer: Neural Newscast uses artificial intelligence in content creation, [03:26] Announcer: with human editorial review prior to publication. [03:29] Announcer: While we strive for factual, unbiased reporting, [03:32] Announcer: AI-assisted content may occasionally contain errors. [03:35] Announcer: Verify critical information with trusted sources. [03:39] SPEAKER_03: Learn more at neuralnewscast.com.