UpNext AI

A funding wave in AI infrastructure is turning the routing and inference layer into a story of its own, while users push back on Google’s AI-first vision for Search. Plus, a new paper argues many of the metrics we use to judge AI text can miss outright contradictions.
In this episode:
- AI infrastructure funding gets the spotlight as Latent Space frames Fireworks, Baseten, and OpenRouter as part of a new decacorn moment
- DuckDuckGo says installs jumped after Google’s AI Search overhaul, suggesting some users want more control over how much AI shows up in search
- A new arXiv paper, MATCHA, proposes a better way to evaluate model-generated text by rewarding semantic agreement and penalizing contradictions
- Forbes examines Anthropic’s publicly available Claude system prompt for handling mental health chats
- Simon Willison highlights Daniel Stenberg’s warning that curl is facing a surge of credible AI-assisted security reports
- The Financial Times reports that UK law firm Pinsent Masons was reprimanded by a court over an AI-related error
Sources:
- Latent Space: https://www.latent.space/p/ainews-new-ai-infra-decacorns-fireworks
- TechCrunch: https://techcrunch.com/2026/05/26/duckduckgo-installs-are-up-30-as-users-reject-being-force-fed-googles-ai-search/
- arXiv (MATCHA): https://arxiv.org/abs/2605.27345v1
- Forbes: https://www.forbes.com/sites/lanceeliot/2026/05/27/analysis-of-anthropic-claude-system-prompt-instruction-that-shapes-the-handling-of-ai-mental-health-chats/
- Simon Willison: https://simonwillison.net/2026/May/26/the-pressure/#atom-everything
- Financial Times: https://www.ft.com/content/5ba4690b-8b98-43b3-ba0b-f2ec5591a572

What is UpNext AI?

Daily AI news and research, distilled. UpNext AI breaks down the most important developments in artificial intelligence—from major industry moves to cutting-edge papers.

Welcome to the UpNext AI podcast. It's Wednesday, May 27th, 2026, and here's what matters in AI today.

First up, AI infrastructure.

Latent Space has a practitioner-oriented roundup titled “New AI Infra decacorns: Fireworks, Baseten, and OpenRouter on the way,” and the big takeaway is that investors are still piling into the tooling layer beneath model apps. The piece frames this as more than routine funding chatter. The argument is that inference companies, model-serving platforms, and routing layers are becoming durable parts of the stack.

The clearest reported data point inside that broader theme comes from OpenRouter. As TechCrunch reported separately, OpenRouter raised a 113 million dollar Series B led by CapitalG, and more than doubled its valuation to 1.3 billion dollars in a year. TechCrunch also reports that usage grew 5x over the last six months.

Put together, the message is pretty straightforward. If the AI world is becoming more multi-model, then the companies that help developers switch between models, optimize cost and performance, and manage inference at scale may be turning into a very valuable layer of infrastructure.

And that matters because it suggests the market is not just rewarding whoever trains the biggest frontier model. It’s also rewarding the companies that make a messy model ecosystem usable in production.

Our second story is the reaction on the user side.

TechCrunch reports that DuckDuckGo says its app installs are up 30 percent as users push back on Google’s AI-heavy Search overhaul announced at I/O 2026. According to the report, DuckDuckGo said U.S. app installs rose 18.1 percent week over week on average during the May 20th to May 25th period versus the prior week, with growth sustained for six straight days and peaking at 30.5 percent on May 25th. On iOS, the average increase was even higher, at 33 percent, with a peak of 69.9 percent.

The framing from DuckDuckGo is user choice. CEO Gabriel Weinberg said Google is, in his words, “force-feeding AI with no way to opt out,” while DuckDuckGo is pitching itself as a place where users can decide how much or how little AI they want.

What makes this more than a one-day traffic blip is that it points to a real split in the market. One group of users wants AI to handle more of the web for them. Another group seems to want the opposite: simpler search, more control, and fewer generated layers between the question and the source.

DuckDuckGo is still a small player relative to Google, with TechCrunch noting it has only around 2 percent of the U.S. search market. But the growth spike is a useful signal that AI product changes can create openings for competitors, especially when those changes feel mandatory rather than optional.

Now to the research pick.

A new arXiv paper is called “MATCHA: Matching Text via Contrastive Semantic Alignment.” And even if the title sounds technical, the idea is easy to grasp. The researchers argue that a lot of common evaluation metrics for AI-generated text, including ROUGE and BERTScore, can give very similar scores to outputs that actually contradict each other.

That’s a problem. If your benchmark can’t reliably tell the difference between a correct answer and a polished but wrong one, then your model evaluation can look better while the product is quietly getting worse.

The paper’s proposed fix is to score generated text from two directions at once: how close it is to the reference answer, and how far it is from a counterfactual contradiction. In other words, don’t just reward overlap or generic similarity. Also penalize saying the opposite of what’s true.

According to the paper, MATCHA outperformed popular metrics across eight public benchmarks spanning question answering, image captioning, natural language inference, summarization, and semantic textual similarity. On TruthfulQA, the authors report improvements of 18.38 percent over ROUGE-L and 20.82 percent over BERTScore in matching texts with a reference.

The bottom line is simple: if you’re using AI systems to summarize, rewrite, or answer questions, the quality of your metric matters almost as much as the quality of your model. A weak score can hide a strong-looking failure.

...Are you building apps with voice? Elevate your app's voice capabilities with ElevenLabs. Their API is a game changer for embedding dynamic, responsive voice interactions in your applications, providing unprecedented realism, flexibility and latency. In fact, you're listening to one of their voices - right - now. If you are a developer looking to elevate user experience with natural voice interfaces, this is your solution. Visit up next dot fm slash eleven to check out their latest offerings. ...

A few more headlines. Forbes has a new analysis of Anthropic’s publicly available Claude system prompt, focusing on how the model is instructed to handle mental health chats. The piece looks at the prompt as a policy layer that shapes how the system responds when users ask for emotionally sensitive support.

Simon Willison highlighted a post from curl creator Daniel Stenberg, who says the curl team is dealing with an unprecedented flood of credible AI-assisted security reports. Stenberg says the rate of incoming reports is now four to five times higher than it was in 2024, and more than one report per day on average. The notable twist is that he describes the reports as high quality, even as the volume is creating serious pressure on the maintainers.

And the Financial Times reports that UK law firm Pinsent Masons was reprimanded by a court over an AI-related error. The FT says Judge Mark Mullen warned lawyers against outsourcing legal research or reasoning, another sign that courts are drawing a clearer line between using AI as assistance and treating it as a substitute for professional judgment.

Before we wrap up, a quick note: this podcast is generated with the assistance of AI and is intended for informational purposes only. All referenced articles, research, and commentary remain the property of their original authors and publishers.

If you enjoyed this episode, don't forget to subscribe, rate, and leave us a review! And that's your briefing for today. Full source links are in the episode notes, and we'll be back tomorrow with what's up next!