[00:00] Announcer: From Neural Newscast, this is Model Behavior, [00:03] Announcer: AI-focused news and analysis on the models shaping our world. [00:11] Nina Park: Welcome to Model Behavior. [00:13] Nina Park: Model Behavior examines how AI systems are built, deployed, [00:17] Nina Park: and operated in real professional environments. [00:20] Nina Park: Joining us today is a director-level AI and security leader with a systems-level perspective on automation and enterprise risk. [00:29] Nina Park: It is good to have you here. [00:30] Thatcher Collins: We're looking at a remarkably dense week for Anthropic, Nina. [00:34] Thatcher Collins: They are currently pushing deep into the Office ecosystem. [00:38] Thatcher Collins: On Tuesday, they announced that Claude will now live inside Microsoft Excel and PowerPoint, [00:44] Thatcher Collins: allowing the model to generate slide decks directly from spreadsheet data. [00:49] Thatcher Collins: This follows their acquisition of Verisept, a startup that specializes in computer use agents for a reported $50 million. [00:58] Thatcher Collins: It's a strategic expansion, Thatcher. [01:01] Thatcher Collins: By acquiring Vercept, Anthropic is trying to solve the last mile problem of automation. [01:09] Thatcher Collins: However, this aggressive move into specific workflows like HR and wealth management is already rattling the market. [01:17] Thatcher Collins: Um. [01:18] Thatcher Collins: We saw software industry ETFs drop 6% earlier this month because investors fear Claude might make specialized enterprise tools obsolete. [01:29] Thatcher Collins: From a systems perspective, the risk isn't just technology. [01:33] Thatcher Collins: It's the friction with legacy providers like IBM. [01:37] Chad Thompson: The speed of that development is actually leading to a retreat on the safety front, Thatcher. [01:43] Chad Thompson: Anthropic has formally abandoned its signature promise to never train or release a frontier model without guaranteed safety mitigations in advance. [01:53] Chad Thompson: They're moving to a framework of transparency reports and safety roadmaps instead. [02:00] Chad Thompson: They've stated that unilateral restraint no longer makes sense in a market defined by geopolitical urgency. [02:08] Thatcher Collins: That geopolitical urgency is manifesting quite literally at the Pentagon, Nina. [02:13] Thatcher Collins: The Secretary is currently threatening to blacklist anthropic from U.S. military contracts. [02:19] Thatcher Collins: The standoff is over the refusal to allow its models to be used for autonomous weapons and domestic mass surveillance. [02:27] Thatcher Collins: Restrictions he has labeled as woke AI. [02:31] Thatcher Collins: There are reports the administration might even invoke the Defense Production Act to force compliance. [02:37] Thatcher Collins: This is a significant operational resilience challenge for Anthropic Thatcher. [02:42] Thatcher Collins: The CEO is holding a hard line on AI-directed warfare. [02:48] Thatcher Collins: But the Pentagon views this as a supply chain risk. [02:52] Thatcher Collins: If the Defense Production Act is invoked, it essentially forces the company to hand over control of its tools. [03:00] Thatcher Collins: It's a classic conflict between corporate ethics and national security mandates, [03:06] Thatcher Collins: and it could impact their upcoming public offering. [03:11] Nina Park: While this policy battle rages, we're seeing new data on what these models can actually do. [03:18] Nina Park: A consortium of a thousand researchers just released Humanity's Last Exam, [03:24] Nina Park: which consists of 2,500 expert-level questions. [03:28] Nina Park: The results were stark. [03:30] Nina Park: GPT-40 scored just 2.7%, and Claude 3.5 Sonnet only reached 4.1%. [03:39] Nina Park: These are questions involving ancient inscriptions and micro-anatomy, areas where, you know, simple pattern recognition fails. [03:48] Thatcher Collins: OpenAI seems to be addressing that gap through human partnerships rather than just model scaling, Nina. [03:54] Thatcher Collins: They've launched the Frontier Alliance, a multi-year partnership with Accenture, McKinsey, BCG, and CAP Gemini. [04:02] Thatcher Collins: The goal is to embed open AI engineers directly into these consulting firms to help enterprises manage AI agents that can perform real-world work. [04:12] Thatcher Collins: It's a clear signal that model intelligence alone isn't enough. [04:15] Thatcher Collins: You need human-guided implementation. [04:18] Thatcher Collins: Exactly, Thatcher. [04:20] Thatcher Collins: The HLE benchmark shows that AI still lacks deep specialized context. [04:26] Thatcher Collins: By partnering with consultancies, OpenAI is effectively borrowing human expertise to [04:32] Thatcher Collins: bridge the gap between model capabilities and enterprise requirements. [04:39] Thatcher Collins: For leaders, the takeaway is that while the models are moving into our apps, [04:45] Thatcher Collins: we are still a long way from fully autonomous expert systems [04:50] Nina Park: The distance between the lab and the office continues to shrink, but the friction is increasing. [04:57] Nina Park: Thank you for your perspective today. [04:59] Nina Park: Thatcher, any final thoughts? [05:01] Thatcher Collins: Thank you for listening to Model Behavior, a neural newscast editorial segment. [05:07] Thatcher Collins: You can find our technical breakdowns at mb.neuralnewscast.com. [05:13] Thatcher Collins: Neural Newscast is AI-assisted, human-reviewed. [05:17] Thatcher Collins: View our AI Transparency Policy at neuralnewscast.com. [05:22] Announcer: This has been Model Behavior on Neural Newscast. [05:26] Announcer: Examining the systems behind the story. [05:28] Announcer: Neural Newscast uses artificial intelligence in content creation, [05:32] Announcer: with human editorial review prior to publication. [05:35] Announcer: While we strive for factual, unbiased reporting, [05:38] Announcer: AI-assisted content may occasionally contain errors. [05:41] Announcer: Verify critical information with trusted sources. [05:44] Announcer: Learn more at neuralnewscast.com.