The Harness

OpenAI enters the biodefense vertical

Show Notes

OpenAI launches Rosalind Biodefense, a gated life-sciences model free to governments for pandemic preparedness with partners at Lawrence Livermore and CEPI. Microsoft Build 2026 opens Tuesday with Windows positioning as an AI agent platform and a multi-model Copilot stack that formally adds Anthropic. Meanwhile, a 624-point Hacker News thread argues domain expertise — not technical skill — is the durable moat in the agent era.

What is The Harness ?

A daily summary of what is interesting and happening in the AI industry, with a focus on what this means for people building harness experiences that are used.

Good morning, it's Sunday, May thirty-first. Today we're looking at a structural shift in how AI infrastructure is consolidating, with model providers absorbing the orchestration layer that builders previously owned. We'll start with what smol.ai sees emerging in the market, then move through the day's broader themes: biodefense, platform positioning, hiring evolution, and compute reservations.

Let's start with what smol.ai is highlighting. The lead story centers on vertically integrated AI stacks, and it's a pattern worth understanding because it reshapes where builders invest their energy. Google shipped Gemini Spark, a twenty-four-seven personal agent running continuously in the background on phone or laptop for Ultra subscribers, alongside Managed Agents in the Gemini API. That API bundles sandboxed Linux environments with code execution, web access, and file input-output into a single hosted service. OpenAI extended Codex to Windows computer use and added remote steering from the ChatGPT mobile app. The pattern across all three: model providers aren't stopping at models. They're absorbing the scaffolding layer — the orchestration logic, the execution environments, the infrastructure. Builders who assumed models would remain commodities, leaving infrastructure in their hands, are watching that logic drift toward being a hosted service instead. The architecture is consolidating, and control is moving upmarket.

The open-model story is shifting in a parallel direction. One in three AI teams now runs open-weight models, up from one in five just nine months ago, according to community survey data. The gap between technically possible and actually deployable by my team is narrowing fast. Hugging Face shipped OpenJarvis as a local personal AI running on Ollama; llama.app launched with unified installers and a clean command-line interface. These are surface-area improvements, not architectural breakthroughs, but they matter. Open models currently lag frontier proprietary models by roughly four months, and that's the window a product team must beat to justify a self-hosted path. There's also a quiet entrant worth watching: StepFun's Step 3.7 Flash is a one-ninety-eight-billion parameter mixture of experts with eleven billion active parameters, Apache 2.0 licensed, running at four hundred tokens per second with two-fifty-six-thousand context and three reasoning tiers. It benchmarks near frontier on coding tasks while undercutting Sonnet on price — at twenty cents to one dollar fifteen per million tokens. For teams experimenting across providers, it's a new data point to factor in.

Finally, smol.ai highlighted the economics of operating at scale. A single user consumed one point one five billion input tokens in a single month. The tactics that made that work: batch processing at a fifty percent discount, prompt caching at ninety percent savings on repeated prefixes, aggressive JSON verbosity reduction, and billing alerts. These aren't novel individually, but the token scale changes the equation. Production agentic workloads are now growing beyond what just use the API intuition was calibrated for.

Beyond smol.ai's lens, four infrastructure and strategy threads deserve attention.

First, OpenAI is making a structural move into biodefense. The company launched Rosalind Biodefense, offering its GPT-Rosalind life-sciences model to governments and vetted developers who are building pandemic-preparedness tools. Early partners include Lawrence Livermore National Laboratory, Johns Hopkins Applied Physics Laboratory, and CEPI, which is applying the model to its Hundred Days Mission currently focused on the Bundibugyo Ebola outbreak. The model outperforms GPT-5, 5.2, and 5.4 on chemistry, biochemistry, and experiment design benchmarks. Here's what matters: this is the clearest template yet for how a model provider carves out a regulated, access-gated vertical. It's distinct from the general-purpose API playbook. For AI product managers thinking about where to place infrastructure bets, this is a live example of a model provider entering a domain where accuracy and accountability matter more than reach. It's worth watching as a template for health, defense, and critical-infrastructure builds.

Meanwhile, Microsoft Build opens Tuesday, June second, in San Francisco. The AI agenda is the most product-consequential in years. Microsoft is unveiling the Windows Agent Framework for production deployments in .NET and Python, a rebuilt multi-model Copilot that formally integrates Anthropic models alongside OpenAI, Azure AI Foundry going generally available with multi-modal fine-tuning and RAG pipelines, and the next-generation autonomous capabilities for GitHub Copilot. The positioning is explicit: Windows as the platform for AI agents. Microsoft is trying to own the client-side scaffolding layer the same way cloud providers own the API layer. The interesting question is whether client-side ownership sticks in a world where orchestration is also drifting toward hosted services. For teams building on Azure or using VS Code, Tuesday's keynote is load-bearing for near-term architecture decisions.

On a different track, a practitioner consensus crystallized today around what the agent era actually rewards in hiring. Aaron Brethorst's essay on the subject hit six hundred twenty-four points on Hacker News with three hundred seventy-five comments. The core argument: agentic AI shifts the bottleneck from writing code to verifying whether the output is correct, and that verification requires deep domain knowledge no model can substitute. The hiring implication is structural. Teams that hired for technical velocity now need people who can check outputs in vertical domains, and that's a fundamentally different profile. You can train someone to code. You can't necessarily train someone to know whether a physics simulation is plausible.

Last up, Amazon's custom-chip business just marked a significant milestone. Its AI chip business crossed twenty billion dollars in annual run rate, with two hundred twenty-five billion dollars in multi-year Trainium revenue commitments. If Amazon's chip business stood alone, it would rank top-three globally alongside NVIDIA and AMD. Trainium-3 is nearly fully subscribed, and Trainium-4 is already significantly reserved at launch. The leading indicator: frontier model providers are locking in compute positions years ahead. Anthropic reserved five gigawatts of Trainium capacity versus OpenAI at two gigawatts. Custom silicon stopped being a hyperscaler differentiator years ago. Now it's table stakes for any serious model company, and the reservation data is a leading indicator of who's betting biggest on the next three years.

That's the briefing.