UpNext AI

Today on UpNext AI: the White House loosens access restrictions on Anthropic’s most advanced model for a limited set of U.S. organizations, Base44 rolls out its own model as vibe-coding startups push for defensibility, and a new paper argues coding agents should be judged in back-and-forth workflows instead of tidy one-shot tasks.

Covered stories:
- Anthropic allowed to restore Mythos access to a select group of U.S. companies and government agencies
- Wix-owned Base44 starts rolling out its own model, Base1, as it tries to own more of the stack
- SWE-INTERACT proposes a multi-turn benchmark for coding agents with changing requirements and user feedback
- Google says EU competition remedies could force search-data sharing and broader Android AI access with privacy risks
- Palantir brings NVIDIA Nemotron open models into air-gapped environments for U.S. agencies
- Researchers say a compromised GitHub repo can cause Claude Code to run hidden malware without verification

Source links:
- https://www.wired.com/story/anthropic-restores-access-to-mythos/
- https://techcrunch.com/2026/06/29/vibe-coding-platform-base44-launches-own-model-as-ai-startups-seek-defensibility/
- https://arxiv.org/abs/2606.30573v1
- https://arstechnica.com/gadgets/2026/06/google-warns-eus-plans-to-weaken-its-monopoly-could-expose-user-data/
- https://blogs.nvidia.com/blog/palantir-secure-ai-us-agencies-nemotron-open-models/
- https://the-decoder.com/claude-code-runs-a-github-repos-hidden-malware-without-verification-giving-attackers-full-control/

What is UpNext AI?

Daily AI news and research, distilled. UpNext AI breaks down the most important developments in artificial intelligence—from major industry moves to cutting-edge papers.

Welcome to the UpNext AI podcast. It's Tuesday, June 30th, 2026, and here's what matters in AI today.

We’ll start with Anthropic. Wired reports that after weeks of negotiations, the White House has permitted Anthropic to restore access to its most advanced AI model, Mythos 5, to a select group of U.S. companies and government agencies. The key point here is not a full reopening. According to the report, the administration is allowing access for more than 100 U.S. organizations, but broader rollout is still off the table. Wired says Commerce Secretary Howard Lutnick told Anthropic that certain trusted partners could get access because he determined that appropriate safeguards were in place. Anthropic told Wired that Mythos 5 is its strongest cybersecurity model, and said it’s now being redeployed to a small group of cyber defenders and infrastructure providers. The company also said it’s still working with the government to expand access further and to restore broader availability for Fable 5, the consumer-facing version with additional safeguards. Why this matters: this looks increasingly like a template for how advanced model releases may be handled in the U.S. Going forward, access to the strongest systems may depend not just on what a lab wants to ship, but on whether the government is comfortable with the safeguards, the customer list, and the deployment conditions. Wired also reports that the initial directive from June 12 remains in effect, and that Anthropic is still in discussions with the White House. So this is progress, but it is clearly not the end of the story.

Next, a business story with broader implications for the AI app layer. TechCrunch reports that Base44, the Wix-owned vibe-coding platform, has started rolling out its own AI model. Base44 says it wants that model to eventually outperform frontier models for its use case. And the bigger signal here is strategic: instead of just sitting on top of third-party APIs, Base44 is trying to own more of its stack. According to TechCrunch, the company says the first version of its model, called Base1, was trained on data generated from tens of millions of real user interactions on the platform. Founder Maor Shlomo argues that training and owning the model lets Base44 optimize latency, cost, and efficiency more tightly. That fits a wider pattern in AI right now. If you’re an application company, the big question is what makes you defensible when frontier labs keep improving the base models. Base44’s answer appears to be vertical integration: own the distribution, own the data loop, and increasingly own the model behavior too. TechCrunch notes that Base44 was acquired by Wix for 80 million dollars, and that the company has said it passed 100 million dollars in annual recurring revenue. The same report also points to bigger competitive pressure in the category, including rivals relying on external models and frontier labs moving closer to app-building workflows themselves. So the takeaway is not that every AI startup will train its own model. It’s that once a company has enough scale, enough usage data, and enough inference cost pressure, building a specialized model starts to look less like a moonshot and more like a margin and product-control decision.

For the research section, a paper from earlier this week called SWE-INTERACT: Reimagining SWE Benchmarks as User-Driven Long-Horizon Coding Sessions. The idea is pretty intuitive. A lot of coding-agent benchmarks give the model a fully specified task upfront and then score whether it finishes the job. But that’s not how real software work usually happens. Requirements are often vague at the start, users clarify what they want over time, and the task keeps changing as the work progresses. SWE-INTERACT proposes a benchmark built around that messier reality. The paper says the system uses a user simulator that starts with incomplete instructions, then progressively reveals requirements, inspects the agent’s workspace, and provides revisions, feedback, and new constraints along the way. And the headline result is important: strong performance on single-turn software-engineering tasks did not reliably transfer to these multi-turn workflows. In the paper’s evaluation, the best-performing models solved roughly 50 percent of the single-turn baseline tasks, but only 25 percent of the corresponding SWE-INTERACT tasks. The authors say the strongest models, including Opus 4.8 and GPT 5.5, do better at handling vague instructions, sticking with the task, and integrating new requirements. But even they still run into problems like over-agentic behavior, forgotten requirements, and technical mistakes. Bottom line: if you want to know whether a coding agent is ready for real developer workflows, don’t just test whether it can finish a neat prompt. Test whether it can survive a changing conversation.

...Are you building apps with voice? Elevate your app's voice capabilities with ElevenLabs. Their API is a game changer for embedding dynamic, responsive voice interactions in your applications, providing unprecedented realism, flexibility and latency. In fact, you're listening to one of their voices - right - now. If you are a developer looking to elevate user experience with natural voice interfaces, this is your solution. Visit up next dot fm slash eleven to check out their latest offerings. ...

First, Google is warning that proposed EU remedies could create privacy and security problems. Ars Technica reports that the European Commission wants Google to share anonymized search data with competitors and open Android so other AI models can get Gemini-like system access. Google says that, if implemented as described, the changes could increase fraud and expose user data.

Next, NVIDIA says Palantir is bringing Nemotron open models into air-gapped environments for U.S. government agencies. According to NVIDIA, the setup is meant to let agencies run customized models on their own infrastructure, train on their own data, and retain ownership of the resulting models and weights.

And finally, a security warning for coding-tool users. The Decoder reports that researchers at Mozilla’s 0DIN platform showed how a compromised GitHub repo could cause Claude Code to run hidden malware without verification, potentially handing attackers full control. Details are still limited in the material we have, but the practical lesson is clear: agentic coding tools inherit software supply-chain risk, and they can amplify it if execution happens too eagerly.

Before we wrap up, a quick note: this podcast is generated with the assistance of AI and is intended for informational purposes only. All referenced articles, research, and commentary remain the property of their original authors and publishers.

If you enjoyed this episode, don't forget to subscribe, rate, and leave us a review! And that's your briefing for today. Full source links are in the episode notes, and we'll be back tomorrow with what's up next!

More episodes

Chapters

What is UpNext AI?