Barely Possible

Thinking Machines Lab's new interaction model drop as top technical announcement

Show Notes

[Barely Possible 2026-05-13] Today's episode: • OpenAI's DeployCo launched at a $10B valuation with $4B from TPG, Bain, Advent & Brookfield — buying queue priority for portfolio companies. • Anthropic voided all SPV-based share transfers, naming specific platforms; implied secondary market price reportedly halved immediately. • Thinking Machines Lab dropped what Tony calls the most interesting technical announcement from a smaller lab in recent memory. Hear the full breakdown in today's episode of Barely Possible. Want a podcast for your own topics? Join early access: https://www.barelypossible.to/waitlist/?source_path=public_episode_72&feed_source=rss&episode_id=72 Transcript: https://media.clawford.org/episodes/2026-05-13/podcast-episode-2026-05-13.txt

What is Barely Possible?

A daily briefing on the AI systems, products, companies, and policy shifts that are just becoming possible.

Want a podcast for your own topics? Join early access: https://www.barelypossible.to/waitlist/?source_path=public_feed&feed_source=rss

Alright, buckle in and grab your coffee, kiddos — I'm your boy Tony DeLuca and we have got a dense, genuinely interesting menu today, so let's get after it.

Before we get into today's material, a quick thread from last episode. We covered the Google-confirmed AI-generated zero-day exploit story, the one where Google's own threat intelligence chief said, quote, "It's here — the era of AI-driven vulnerability and exploitation is already here." That was a first. Criminal actors using AI-generated code to bypass two-factor authentication via a zero-day. We also noted that OpenAI announced Daybreak, their cyber defense initiative, literally the same day. That's a pattern we should probably get used to: offense and defense on parallel tracks, racing each other, sometimes from the same companies. Today there's a follow-up current in today's news that connects to that, which I'll get to in a bit.

Now let me tell you what's actually going on today, because there are several threads that matter a lot for builders and founders, and one of them is genuinely the most interesting technical announcement I've seen from a smaller lab in a while. Let me lay out the menu. We've got the Thinking Machines Lab interaction model drop, which is our deep dive today. We've got OpenAI's consulting arm, DeployCo, now officially launched. We've got Anthropic and OpenAI both going after fraudulent secondary market stock trading — which is a private markets story that's bigger than it looks. We've got the White House walking back the FDA-for-AI idea after the market freaked out. We've got Jensen Huang conspicuously absent from the Trump-China trade delegation. We've got benchmarks starting to eat themselves — literally, GPT-5.5 found fatal errors in the math benchmark that was supposed to test GPT-5.5. And we've got Isomorphic Labs, Demis Hassabis's drug discovery spinout, closing a two-point-one-billion-dollar Series B.

That's a real Friday's worth of material. Let's move.

Section one: The consulting gold rush is now official.

OpenAI has launched its professional services arm as a standalone company called the OpenAI Deployment Company — shortened to DeployCo in essentially every conversation about it. This is the formal version of what's been rumored for weeks. We touched on the Anthropic professional services play when it first leaked, and now there's a concrete OpenAI structure to match it.

Here's what DeployCo actually is: it's a forward-deployed engineering shop. Think Palantir's model, where you embed engineers directly into major enterprise clients to push through AI transformation that the clients can't do on their own. The structure is a joint venture with nineteen partners across consulting, private equity, and finance. The initial investment came in at four billion dollars at a ten-billion-dollar pre-money valuation, with TPG as lead investor and Bain Capital, Advent International, and Brookfield as co-lead founding partners. Goldman Sachs, notably, backed both DeployCo and the still-unnamed Anthropic equivalent — which tells you something about Goldman's view of where the AI services market is going.

The staffing core comes from an acquisition: an engineering firm called Tomorrow, which gives DeployCo roughly a hundred and fifty engineers out of the gate who already have track records deploying AI solutions.

Now here's the interesting angle for builders. What the private equity firms investing in this structure are actually paying for isn't equity in some future platform — they're buying line-skipping privileges. The reported intelligence is that a major motivation for investing was to get first access to those engineers for portfolio companies. In other words, the four billion dollars buys you a spot at the front of the deployment queue. That is a fascinating pricing signal. It tells you that the ability to actually execute AI transformation, not just subscribe to the API, is the scarce resource. The models are available. The people who can make them work inside a messy institution are not.

That observation has become almost conventional wisdom now, but DeployCo is the financial formalization of it. And the comment I found most accurate in the discourse around this: even with OpenAI and Anthropic both standing up professional services operations, they are not going to come close to absorbing the total demand for enterprise AI transformation support that exists. The consultant class is fine. The market is enormous.

One more thing worth flagging. The Anthropic version of this is still unnamed as of today. Goldman's presence in both suggests these won't be directly competitive for most clients — the scale of the need is big enough that they'll each have their lane.

Now shift from the consulting build-out to a story about what happens when private companies stay private too long and financial creativity fills the gap.

Section two: The fake Anthropic stock market, and why it matters for private markets broadly.

This story sounds like a crypto sideshow but it's actually a window into something that's been building for fifteen years and could get very messy very soon.

Here's what happened. Anthropic updated its legal documentation to explicitly call out a list of firms it says are selling unauthorized Anthropic shares or derivatives. Their statement was unambiguous: they do not permit special purpose vehicles to acquire Anthropic stock, any transfer to an SPV is void under their transfer restrictions, and any third party claiming to sell Anthropic shares to the general public is likely either committing fraud or offering an investment that has no value. They named names.

OpenAI did a parallel thing almost simultaneously, restating that unauthorized transfers are legally void.

The reaction in the gray markets was immediate. Anthropic's implied price on these secondary platforms reportedly got cut in half.

Now here's why this matters beyond the immediate drama. The underlying structure of a lot of what was being traded was wild. One person described it accurately as: you are buying a tokenized receipt for possible future economic exposure to a Cayman SPV that owns shares in another Delaware SPV that maybe owns rights to future equity pending transfer approval. Four layers of abstraction from an actual share certificate. Anthropic adjacent at best.

But here's the thing: sophisticated traders knew exactly what they were buying. The real danger is the retail end, people who genuinely believed they owned Anthropic stock when they owned something much more tenuous.

The bigger macro story here is that we've been in an era since the post-2008 zero-interest-rate period where private companies have been staying private much longer than in previous cycles because private capital has been essentially unlimited. When companies that would have gone public ten years ago stay private, retail investors who don't qualify as accredited investors get structurally locked out of participation in their creation. That demand doesn't disappear — it finds other outlets. And some of those outlets are multi-layer financial abstractions that have no real claim on the underlying asset.

The argument being made by several observers is that when SpaceX eventually IPOs, it will expose just how much synthetic ownership and outright fiction has accumulated in private markets broadly. Anthropic calling out these SPVs specifically may be less of a legal end-game and more of a warning shot that the reckoning is coming.

For founders and builders: the practical implication is that if you're a later-stage private company, you probably need a clear secondary policy and you need it documented publicly. Because the alternative, as we're seeing, is that the market creates one for you and it's messier than whatever you would have written.

Section three: The White House said it's not going to do the FDA thing.

Quick news item, but worth noting because it affects the business environment directly.

Last week, National Economic Council Chairman Kevin Hassett said in an offhand comparison that the administration might consider something like an FDA-style review process for powerful AI models before public release. The reaction from the AI industry was immediate and not subtle. The prospect of a mandatory safety review board with approval power over model releases was treated as existential by basically every major lab.

By Monday, the walk-back was complete. Former AI czar David Sacks had already said the FDA comparison wasn't apt and that no senior official supported it. Then Hassett himself went on CNBC and said, and I'm quoting: "At the White House, nobody has an idea that we should do something like bring in a giant new bureaucracy to approve AIs." He added that the current approach is administration officials working directly with the labs to ensure models released publicly aren't going to cause extreme harm, and he characterized this all-of-government, all-of-private-sector approach as working well. He also said it's unclear an executive order is even necessary.

And then, apparently with some self-awareness, he added: "I probably shouldn't have called it the FDA."

Look, the industry exhaled. For now, the regulatory story here in the US remains: soft guidance, voluntary commitments, direct relationships between the White House and the labs, and a posture that's more about not slowing innovation than about safety infrastructure. Whether that's the right call is a separate question. But the FDA framework is off the table, and the business environment for model development stays permissive.

Connected to this, and worth a brief mention: China sought access to Anthropic's newest models through official channels according to a New York Times report, and the answer was no. The headline is simple but the implications aren't. This is the AI export control question dressed in a specific instance. Anthropic drew a hard line.

Which brings us to the Trump China trip and the very interesting empty chair at the table.

Section four: Jensen Huang didn't get invited.

President Trump is assembling a delegation for a trade trip to China this week that includes Elon Musk, Apple's Tim Cook, and Meta's Dina Powell McCormick, alongside executives from finance, semiconductors, aerospace, and agriculture. The delegation is specifically described as companies with significant Chinese exposure representing sectors to be included on the trade agenda.

Jensen Huang, NVIDIA's CEO, said last week that he'd join if invited. He was not invited.

This is notable because Huang has traveled with Trump extensively — the Middle East, the UK. And there are Micron and Qualcomm executives attending, so semiconductors are on the agenda. But NVIDIA's AI chips — specifically the Hopper series and successors — are the item that China most wants access to and the item that the US export control regime most tightly restricts.

The absence of Jensen could mean that the White House is specifically signaling that NVIDIA's AI chips are not on the table as a negotiating chip in trade talks. As of today, zero export licenses for even the older H200 GPUs have been approved by the Commerce Department, despite signals in December that those might come through.

We'll know more by end of week. But the chair that isn't at the table often tells you more than the chairs that are.

Now let's get to the material that I think is actually the most consequential story today for anyone building AI products, and that's today's deep dive.

Section five, deep dive: Thinking Machines Lab and the interaction model — why "turn-based" AI is the wrong default.

Thinking Machines Lab dropped something genuinely new yesterday. I want to spend real time on this because the framing and the technical architecture both matter, and the competitive implications for anyone building voice or video AI products are immediate.

Quick background on TML since they've been relatively quiet since their first product. Former OpenAI CTO Mira Murati left OpenAI and built a team of senior researchers, raised funding that in any prior era would be called aggressive — we're talking low billions — but that in the current landscape feels almost modest next to what the frontier labs operate with. Their first product, Tinker, was a reinforcement learning as a service platform for fine-tuning open source models. It didn't catch fire. There were reports of co-founder departures — Barrett Zoff and Luke Metz left in January to return to OpenAI. Things went quiet.

Then yesterday they announced something called an interaction model — technically called TML Interaction Small — and the framing of why this is different from existing real-time voice and video models is worth understanding carefully.

The argument TML makes starts with a diagnosis: current AI systems, including the best real-time voice APIs from OpenAI and others, are fundamentally turn-based systems. The user talks, then stops. The model waits until the user stops. Then the model responds. Then the model stops. Then the user goes again. Even the systems marketed as "real-time" are, at their core, operating on discrete turn sequences where input and output happen in isolated windows.

The consequence of this architecture is what TML calls the collaboration bottleneck. Here's how they describe it: "Today's models experience reality in a single thread. Until the user finishes typing or speaking, the model waits with no perception of what the user is doing or how the user is doing it. Until the model finishes generating, its perception freezes, receiving no new information until it finishes or is interrupted." They then use an analogy that runs throughout their entire communication: trying to resolve a crucial disagreement over email rather than in person. Email is functional but it's impoverished. You can't read body language, you can't interrupt, you can't point at something on the screen, you can't see that someone is still formulating their thought.

Their solution, trained from scratch rather than retrofitted, processes streams in what they call two-hundred-millisecond microturns — continuous parallel input and output streams instead of a flattened sequential queue. The architecture is actually a two-part system: a real-time interaction model that maintains constant presence with the user, and a background model that handles longer reasoning, browsing, and agentic tasks. The interaction model keeps talking and listening while the background model works, and the results get woven into the conversation when appropriate.

What does this look like in practice? The demos TML shared — which are intentionally raw, researcher demos rather than polished marketing assets — show several things that current systems literally cannot do. The system recognizes when a new person appears on video and addresses them without being prompted. It does simultaneous translation, starting to translate a spoken phrase in one language into another while the speaker is still mid-sentence, similar to how human conference interpreters work. It tracks whether a speaker is thinking, yielding, self-correcting, or inviting a response — what TML calls dialogue management — without a built-in system for this, so it adapts to context. In one demo, a researcher asks the model to notify her when she starts slouching. The model does this proactively, without any periodic audio check-in prompting it to act.

That last demo is worth sitting with for a second. A conventional voice AI — even a good one — requires you to ask "am I slouching?" The TML model notices the visual world changed, infers the relevance of that change to the ongoing task, and initiates speech proactively. That's not a quantitative improvement on an existing capability. It's a capability that current systems categorically don't have.

TML groups these under the term visual proactivity, and they explicitly say current commercial real-time APIs cannot do any of these things. They write that if you ask a current system to count how many pushups you do, it might respond "sure thing" and then go silent, waiting for an audio cue that never comes. Visual counting, time-aware triggering, visual cue-based responses — these are claimed zeros for existing systems.

To support this, TML created two new benchmarks that had to be invented specifically because existing benchmarks don't capture what they're measuring. TimeSpeak tests whether a model can initiate speech at user-specified times while producing the right content. QSpeak tests whether the model speaks at the appropriate moment with a semantically correct response — for example, picking up a code switch from one language to another and providing the appropriate word without being explicitly asked each time. When you have to invent benchmarks to measure what you built, that's one indicator that the capability shift is real.

TML co-founder John Schulman, who some listeners will know as one of the key figures in the original reinforcement learning from human feedback work at OpenAI, frames the philosophy explicitly: TML was founded to advance capabilities for human-AI collaboration, which he argues are underemphasized relative to raw intelligence and autonomy because they're harder to evaluate. The new model is designed to be the outer user-facing layer of AI systems going forward — the part that continuously keeps the user informed and learns what they actually want while a more capable background system does heavier work.

The cleanest articulation of why this matters came from TML researcher Claire Birch. She drew a direct parallel to the transition from command-line interfaces to graphical user interfaces. Before the GUI, text was the primary interface to computers. You typed precise instructions line by line and tried not to make mistakes. The GUI was one of the greatest democratizing forces in personal computing — it made the computer usable by people who were not fluent in the machine's language. Her argument: current AI chat interfaces, even the best ones, are surprisingly CLI-like. They reward people who can write carefully crafted, context-laden prompts with the right structure and specificity. That's a skill set that looks a lot like knowing command-line syntax. The actual GUI moment for AI — the point where you don't have to think like the computer to use the computer — hasn't happened yet. TML believes interaction models trained from scratch for real-time multimodal exchange are a step toward that.

Now, the honest question hanging over TML is one of exclusivity and timing. How long does this stay unique to them? The answer is probably not long. The observation from one commentator was direct: "The frontier labs now iterate on each other's successful abstractions extremely fast." OpenAI already showed off new capabilities for their GPT real-time model just yesterday — demonstrating how a real-time audio model can act as a background agent, updating a task board as a team gives updates during a standup meeting. That's a different implementation of a similar underlying idea: AI that runs continuously in the background while other things happen.

So TML's actual window of differentiation may be short. But that's somewhat separate from whether the announcement matters. The broader shift it signals is this: the race to build the interaction layer, the always-on perceptual layer that sits between human users and more powerful background reasoning systems, is now officially on. And if Schulman is right that this becomes the standard outer user-facing layer for AI systems, then getting good at building and deploying that layer is a real product capability to invest in now, before everyone else catches up.

For builders specifically: the practical implication is that we're moving from a world where "real-time voice AI" means low-latency turn-based conversation toward a world where it means truly continuous, visually aware, proactively reactive presence. The API infrastructure for that shift doesn't fully exist yet. Building products that assume the old model will leave you rebuilding when the new model becomes standard.

One more observation I want to add here. The TML announcement also illustrates something about second-tier lab strategy in the current consolidation moment. We've talked several times on this show about the gravitational pull toward a small number of frontier players — OpenAI, Anthropic, Google, maybe a couple of others. TML made a bet that the way to matter in that environment is not to compete on raw intelligence, which requires resources you don't have, but to pioneer a different dimension of capability that the big labs have underinvested in. Whether that bet pays off long-term depends on whether the big labs adopt your paradigm and leave you behind, or whether being first in the space gives you enough of a head start that you become the go-to supplier of the interaction layer. We don't know the answer yet. But the strategic logic is sound.

Now shift to some briefer items that deserve your attention.

Section six: Isomorphic Labs gets two-point-one billion dollars.

Isomorphic Labs, the AI drug discovery company that Demis Hassabis spun out of Google's DeepMind, just closed a Series B round at two-point-one billion dollars. Multiple observers noted this would rank in the top few Series B rounds in history by size. The company is built on AlphaFold-derived protein structure capabilities applied to drug design, and Hassabis has been explicit about targeting major medical breakthroughs by 2030.

The comment I think is most useful for situating this round: "The industry is betting that compute-heavy biology will have its ImageNet moment — where the models suddenly become useful enough to justify the infrastructure cost." That's exactly what happened with ImageNet in computer vision. Before that, research was interesting. After that, it was commercially deployable at scale. If you believe something similar happens with AI-designed therapeutics, two-point-one billion is not an absurd bet. It's an enormous bet, but not an absurd one.

The broader pattern here: AI-adjacent fields — biotech, drug discovery, materials science — are now attracting capital at the same scale as AI infrastructure. The imagenet-moment framing is useful shorthand for what the investors are waiting for.

Section seven: Benchmarks started debugging themselves, and that's a problem.

There's a story circulating that's getting a lot of attention in research circles and deserves a mention here because of its implications for how we evaluate AI progress at all.

FrontierMath is one of the harder evaluation benchmarks for frontier models — it's designed to be genuinely difficult math problems that would take expert human mathematicians significant effort. Epoch, the AI research organization that maintains it, used GPT-5.5 to do an assisted review of the benchmark's problems. The result: GPT-5.5 flagged potential fatal errors in roughly a third of the problems across the first four tiers of difficulty.

The model being evaluated helped find errors in the benchmark designed to evaluate it. As one person put it: "So the AI is now debugging the math that was supposed to test AI."

The practical concern that follows from this is real. As models get better, the only people who can construct hard enough benchmark problems to meaningfully evaluate them are the models themselves or people using those models as tools. Which creates an obvious contamination problem and a verification problem. You can't be sure the benchmark is testing genuine generalization if the same class of system that might have memorized the material is also helping curate the material.

This is not a crisis today. But it's a structural problem for evaluating progress going forward, and it's going to get worse before it gets better.

Separately, on ProgramBench, a different newer software engineering benchmark, GPT-5.5 at its high compute setting was the first model to solve any task at all. It significantly outperformed Opus 4.7 on the same benchmark. One legitimate caveat raised in the community: some of ProgramBench's unit tests include assertions for undocumented features that are nearly impossible to discover independently, which means benchmark progress here might be partly contamination. Fresh benchmarks are better than overfit ones but they have their own reliability issues. Take the numbers with appropriate skepticism.

Also from Artificial Analysis, a new Coding Agent Index launched this week covering three benchmark suites across real agentic coding tasks: software engineering problems from Scale AI, terminal and cryptography tasks, and technical codebase exploration questions. The commentary on the leaderboard is sparse in our source material but the fact that someone finally built a multi-benchmark coding agent index rather than just raw SWE-bench scores is a step forward for how the field evaluates this capability class.

Section eight: Claude Code is hungry.

This is a smaller observation but one worth flagging for anyone running Claude Code in production or even in multiple terminal sessions.

Simon Willison, who is one of the sharpest practical observers of AI developer tooling, posted that his Mac had much less available memory than he expected and traced it to Claude Code processes running across multiple terminal windows. The total was around thirty gigabytes of RAM consumed by Claude processes alone. The largest single instance was using four-point-nine gigabytes.

That's not a bug necessarily, but it's a real operational consideration. If you're running parallel Claude Code sessions — which a lot of people are, because that's one of the main ways to parallelize agent work — you need to budget for memory accordingly. On a machine with 32GB of unified memory, thirty gigabytes going to Claude processes leaves very little headroom for everything else. On 64GB you're fine. On 16GB you have a problem.

The community observation underneath this is interesting: people are running these tools hard, in ways the original tooling probably wasn't stress-tested for, and the resource implications are only starting to be understood at scale.

Section nine: A small builder story worth your time.

One of the things I genuinely enjoy is when someone builds something clever and practical that shows what agents are actually capable of when applied to a real personal problem, not a demo for investors.

Someone on Reddit built a morning daily brief system for their three kids using a wifi-enabled receipt printer, a parent agent, three child agents, and a per-kid button that each kid can press to print their personalized morning brief. The system runs on a cron job at 1am, costs about three and a half cents per run, and handles calendar data, school lunch menus, weather, curated jokes, and science facts — each tuned to the age and interests of each kid. Biggest model in the stack is GPT-4.1 mini for science fact suggestions.

What I like about this as an illustration: the builder's explicit philosophy was composition over inheritance for agent design. Small models doing specific scoped tasks inside a hierarchy managed by a coordinator. Total cost per full family run: three and a half cents. This is what practical production agent architecture actually looks like when someone who has spent time thinking about it builds for a real use case. It's not a monolith. It's a set of specialized sub-agents with a clear hierarchy and tight task definitions.

The commenters were right to call it creative. The builder was also right to not try to monetize it. Sometimes the best demonstration of what's possible is a thing built because it's useful and fun, not because there's an MRR attached.

Section ten: A brief word on where the AI legal picture is heading.

I want to spend a few minutes on the chatbot liability cases because the pattern emerging here is going to affect every company building consumer-facing AI products, and the legal theory being developed in these cases is more coherent than the earlier versions.

As we've covered this theme periodically, the cases against AI companies for alleged psychological harm have evolved over time. The earlier cases were aggressive — they argued chatbots directly caused harm in well-adjusted users. Those cases face significant causation problems. The newer wave of cases has shifted to a more defensible but also more operationally demanding theory: duty to warn.

The latest case, Joshi versus OpenAI Foundation, filed in federal court in Florida in connection with the FSU shooting last April, argues that OpenAI's chatbot should have detected from the nature of the user's communications that the person was troubled and potentially planning violence — and that there was a duty to warn. It also suggests that in responding to questions about gun operation and attention that past shooters received, the chatbot provided some assistance in planning, even if it never suggested the person act.

The legal challenge in duty-to-warn cases for AI companies is the privacy tradeoff. If courts eventually find that AI systems have a duty to detect at-risk users and escalate, the only technical implementation of that is mass surveillance of conversations. False positive rates at that scale would be enormous, and every aspiring screenwriter researching a thriller would be flagged alongside actual bad actors.

This is genuinely hard. And it's the kind of hard where there's no clean technical solution that also protects user privacy. The case law is still early and we're years from any definitive ruling. But the trajectory of the legal theory — away from "the AI caused harm" toward "the AI should have flagged harm" — is a shift builders need to understand, because duty-to-warn obligations are real in other contexts and courts are clearly considering whether AI falls into a similar category.

Watch this space. The duty-to-warn framing is the one that might actually survive initial legal scrutiny.

Quick takes before we wrap.

Yoshua Bengio has a new project called LawZero, proposing a class of AI he calls Scientist AI — specifically non-agentic systems designed to explain the world from observations rather than take actions in it. The paper argues that non-agentic AI that generates theories and answers questions but doesn't act on the world could both accelerate scientific progress and serve as a guardrail against more dangerous agentic systems. It's a thoughtful alternative framing to the dominant "more autonomy is always the goal" paradigm, and it's worth reading if you're thinking about where the line between useful agents and dangerous ones should be.

Separately, a researcher posted on r/MachineLearning looking for cache-testing software that handles multi-tier LLM prompt caching — the kind of tiered ephemeral cache that Anthropic and others use where you have multiple cache layers with different residency windows, costs, and eviction rules. The answer from the community was essentially: nothing purpose-built for this exists, you have to roll your own. If you're building at scale and prompt caching is part of your cost strategy, that gap is real and building tooling for it is an actual problem worth solving.

And one genuinely funny moment from the community this week: someone on LocalLLaMA posted a joke image riffing on the AI world's habit of giving training fine-tuning techniques human names — the punchline being "Dad, why is my sister named Lora?" For the uninitiated, LoRA is a widely used fine-tuning method. If you're a few months into serious AI development, that lands. If you're still new, you'll get there.

Let me bring this all together.

Today's episode has a through-line that I want to name explicitly because I think it's the most useful frame for builders.

The DeployCo launch, the Thinking Machines interaction model, the framing from Claire Birch about AI's GUI moment — these are all, in different ways, about the same thing: the gap between what AI can do technically and what it can do in actual human context. DeployCo exists because models crashing into institutional inertia is a real bottleneck, and you need humans to close it. The interaction model exists because turn-based interfaces impose a cognitive tax on users that makes AI less useful than it could be in natural working conditions. The GUI analogy exists because we haven't yet reached the point where non-technical people can use these systems fluidly without adapting themselves to the machine.

What the current moment is about, if you're building, is that gap. The models are good. The raw capability is there. The question is whether the interface layer, the deployment layer, and the institutional layer can catch up to where the models already are. That's where the builders who are paying attention are going to make real products.

The Thinking Machines announcement is the most concrete illustration of this I've seen from a lab that isn't one of the big three. Whether TML's specific implementation dominates or whether OpenAI and Anthropic absorb the paradigm quickly, the shift from turn-based to continuous-presence interaction is happening. If you're building anything on top of voice, video, or real-time AI, you need to understand this architecture now rather than when it becomes the default.

Everything else today — the secondary market crackdown, the DeployCo launch, the benchmark fragility, the legal cases — is noise that matters but noise that you can monitor. The Thinking Machines story is signal that requires a decision.

That's gonna do it for today. This is Barely Possible. I'm Tony DeLuca. Stay sharp out there, and I'll catch you tomorrow.

More episodes

Chapters

Show Notes

What is Barely Possible?