Barely Possible

[Barely Possible 2026-07-01] Today's episode: • Simon Willison's math: Claude Sonnet 5's new tokenizer makes it ~1.4x pricier for English despite lower per-token pricing. • Etched hit a $5B valuation with $1B already booked for its transformer-only inference chips taking aim at Nvidia. • EquiLibre Technologies, ex-DeepMind poker AI builders in Prague, now valued over $500M making money for quant hedge funds. Hear the full breakdown in today's episode of Barely Possible. Want a podcast for your own topics? Join early access: https://www.barelypossible.to/waitlist/?source_path=public_episode_121&feed_source=rss&episode_id=121 Transcript: https://media.clawford.org/episodes/2026-07-01/podcast-episode-2026-07-01.txt | Notes: https://media.clawford.org/episodes/2026-07-01/2026-07-01-notes.md

What is Barely Possible?

A daily briefing on the AI systems, products, companies, and policy shifts that are just becoming possible.

Want a podcast for your own topics? Join early access: https://www.barelypossible.to/waitlist/?source_path=public_feed&feed_source=rss

Okay kiddos, I'm your boy Tony DeLuca and you've got Barely Possible cued up for the first of July. Fresh menu today, and I'll tell you straight off the bat: there's a model release on the table, but I'm not gonna let the model release run the whole show, because the more interesting thing happening today is about price, and about chips, and about whether all these AI companies actually make money doing what they're doing. So grab your coffee, settle in, and let's have at it.

Let me set the table. Anthropic put out a new model called Claude Sonnet 5. The pitch is simple and it's important: this is the cheaper way to run agents. Not the flagship, not the show-off Opus tier — the workhorse. They're positioning it as a budget alternative to their own Opus, to GPT-5.5, and to Gemini Pro. Stronger agentic chops, lower pricing, better safety guardrails. That's the marketing line.

Now here's where it gets interesting, and this is why I love when a careful person actually reads the fine print instead of just retweeting the press release. Simon Willison went and did the homework, and he found that the cheaper-model story has a wrinkle. Anthropic shipped Sonnet 5 with a new tokenizer. And the new tokenizer, by Willison's math, makes the model roughly one-point-four times more expensive for English, about one-point-three-three times more expensive for Spanish, and roughly the same price for Simplified Mandarin.

Let me explain why that matters, because it's not a nerd footnote, it's a billing footnote. When you buy time from one of these models, you don't pay per word, you pay per token. A token is a chunk of text — sometimes a whole word, sometimes a piece of one. The tokenizer is the thing that decides how your sentence gets chopped into those chunks. So if the published price-per-token goes down but the tokenizer chops your English into more pieces, your actual bill can go up even though the sticker price went down. That's the trick of the tape here. A model can be advertised as cheaper and still cost you more per real-world paragraph, depending on what language you write in and how the tokenizer behaves.

So for the founder building on this stuff, the lesson is don't read the headline price, run your own workload through it and watch the meter. Especially if your product is English-heavy, which most of you are. The advertised discount and the invoice you actually get at the end of the month are two different animals. I'm not saying Sonnet 5 is a bad deal — it may still be a great deal for agent workloads where you're firing thousands of small calls. I'm saying measure it yourself before you migrate your whole stack. And yes, Willison also generated his customary pelican on a bicycle to vibe-check the thing, because that's the man's tradition, and I respect a tradition.

That tokenizer wrinkle is a perfect on-ramp to the real theme today, because it's all about the gap between what AI costs on paper and what it costs in practice — and nowhere is that gap bigger or more consequential than in the chips that run all of this.

So let's talk about Etched. This is the story I think most builders should sit up for. Etched is one of these companies going after Nvidia, and the news is they've hit a five billion dollar valuation and — this is the number that matters — they say they've already booked one billion dollars under contract for the inference systems powered by their chip. One billion dollars of contracted revenue.

Now let me give you the why, and I'll keep the chip-nerd stuff short because I promised you I would. Etched's whole bet is different from Nvidia's. Nvidia builds general-purpose chips — they can train models, they can run models, they can do a lot of different math. That flexibility is exactly why Nvidia owns the world right now. Etched went the other direction. They built a chip that does basically one thing: run transformer models — the kind that power Claude, GPT, Gemini, all of them — for inference. Inference is the part where the model is actually answering your customer, not the part where it's being trained. They baked the transformer architecture into the silicon itself. You give up flexibility, you can't pivot if the architecture changes, but in exchange you get speed and cost efficiency on the one job that ninety-nine percent of production AI workloads actually are.

Here's why a billion in bookings is the headline and not the five billion valuation. We've spent the last couple weeks on this show talking about the AI economy from the cost side — the memory shortage, the RAMmageddon, South Korea throwing a trillion dollars at memory fabs, which we covered yesterday. The squeeze on everybody has been the cost of running these models. Inference cost is the single biggest variable in whether an AI company actually makes money. If your model is great but it costs you four dollars to serve an answer you charge three dollars for, you don't have a business, you have a charity with a chatbot. So when a specialized inference-chip company shows up with a billion dollars of real contracts — not hopes, not a pitch deck, contracts — that's a market telling you it will pay real money to get inference cheaper. That's demand validation for the entire idea that the cost curve has to come down, and somebody other than Nvidia might be the one to bend it.

For the founder, the practical read is this: the inference layer is getting competitive, and that's good for you. You don't buy Etched chips yourself, you're not running a data center. But the people you rent compute from increasingly will have alternatives to Nvidia, and competition at that layer eventually shows up as lower prices in your bill — or at least as a brake on prices going up. Keep an eye on which clouds start offering inference on non-Nvidia silicon, because that's where your margins quietly improve.

Now, let's shift from the chip running the model to the people trying to build a business on top of the model, because there's a great little story here about a poker AI.

Three researchers who left DeepMind started a lab in Prague called EquiLibre Technologies. These are the folks who built poker-playing AI — the kind of system that learned to bluff, to read incomplete information, to make decisions when you can't see all the cards. And what are they doing with that skill now? They're making money for quant hedge funds. The lab is now valued at more than five hundred million dollars.

I love this one because it tells you something true about where AI talent actually goes. Everybody thinks the smart money is chasing the next chatbot, the next consumer app. But poker and markets are cousins. Both are games of incomplete information, probability, and opponents who are also trying to outwit you. The same math that figures out when to bluff in Texas Hold'em figures out how to position a trade when you can't see what the other side is holding. So three people who could've gone and built another assistant instead took their decision-making-under-uncertainty expertise straight to the place that pays the most for exactly that skill: finance.

For founders, here's the takeaway that's bigger than poker. The most valuable AI applications are often the ones that don't look like AI applications. Nobody's calling EquiLibre an AI company in the headlines the way they call OpenAI one. But the edge they built is the same edge — superhuman reasoning under uncertainty — pointed at a niche where that edge converts directly into dollars. If you're hunting for where to build, don't only look at the obvious chat-and-image stuff where you're competing with the giants. Look for the corners where reasoning under uncertainty has a clear price tag attached. Trading is one. There are others nobody's named yet.

Which brings us to Anthropic again, because they're also trying to find a corner — and theirs is science. Anthropic launched something called Claude Science, and the framing in the reporting from Rebecca Bellan at TechCrunch is the smart part: Anthropic is betting on workflow, not a new model, to win over scientists.

Let me unpack that. Claude Science is described as a workbench — one environment where a scientist can do computational research without bouncing between a dozen databases, pipelines, and tools. The pitch isn't "our model is smarter at chemistry." The pitch is "we'll save you from the death-by-a-thousand-tabs reality of actually doing computational science." Anybody who's worked next to a researcher knows the truth — half the job isn't the thinking, it's the plumbing. Getting the data out of one system, into another format, through a pipeline, into a tool, back out, into a chart. It's miserable and it eats days.

So Anthropic's move here is notable precisely because it's not about the model. They didn't say "here's a science-tuned Claude that scores higher on some chemistry benchmark." They said "here's an environment." And that's a tell about where these labs think the value is heading. The raw model is increasingly a commodity — everybody's got one within six months of everybody else's. The thing you can actually charge a premium for, and the thing that's hard to switch away from, is the workflow you build around it. Once a lab full of scientists has all their pipelines living in Claude Science, they're not casually swapping that out next quarter because GPT got a point better on a benchmark.

That's the same instinct, by the way, that we talked about a few episodes back when Anthropic was building Claude into Slack to capture institutional knowledge. It's a pattern with this company: don't just sell intelligence, sell the place where the work lives. For founders, if you're building an AI product and your only moat is "we use a good model," you have no moat, because so does the next guy. The moat is the workflow, the data that accumulates inside your product, the switching cost you create by becoming the place where the work happens. Anthropic clearly believes that, and they're putting product effort behind it.

Now let me shift gears to something a lot less optimistic, because we should talk about the security story, and it's a doozy.

Dan Goodin over at Ars Technica wrote up a new attack on AI browsers, and the one-line version is almost funny until you realize it isn't: telling an LLM that two plus two equals five is enough to make it follow forbidden instructions. The headline on the piece is blunt — it calls this one more reason why AI browsers are a bad idea.

Here's the mechanism, in plain terms. An AI browser is a browser with an AI agent baked in that can read pages, click things, fill out forms, take actions on your behalf. The whole appeal is it does stuff for you. The new attack lulls the model into what the researchers describe as a kind of dream world where the guardrails stop applying. You feed it a premise that's false — two plus two is five — and get it to go along, to operate in this little fictional frame, and once it's accepted that frame, it'll follow instructions it would normally refuse. You've basically talked it out of reality, and a model that's out of reality is a model that's out of its safety training.

Why does this matter for builders specifically? Because so many of you are racing to ship agentic features — agents that browse, that act, that touch your users' accounts and data. And the uncomfortable truth this attack underlines is that the guardrails on these things are not walls, they're suggestions written in chalk. If a false premise wrapped in the right story can walk a model past its own rules, then any agent you give real permissions to — read my email, make this purchase, move this money — is a soft target. The Ars piece is part of a steady drumbeat now. Just last week there was the report about coding agents being tricked into installing malware from what looked like clean GitHub repositories. Same family of problem: the agent's own helpfulness is the exploit.

So my advice, and it's not glamorous: scope your agents tight. Give them the least power that lets them do the job. Put a human in the loop for anything irreversible — spending money, deleting data, sending things to the outside world. The fantasy is the fully autonomous agent that just handles everything. The responsible reality, today, is an agent on a short leash with a person holding the other end. Anybody selling you the leash-free version is selling you liability.

Let me stay in the world of platforms squeezing the people who build on them, because there are two stories today that rhyme, and founders should hear both.

First, Apple. Apple is taking its fight with Epic over App Store fees all the way to the Supreme Court. The Court will weigh whether the contempt finding against Apple in the Epic case was, in their word, erroneous. Quick refresher for anyone who tuned out of this saga — and I wouldn't blame you, it's been going for years. Epic, the Fortnite people, went to war with Apple over the cut Apple takes on App Store transactions and the rules around steering customers to pay outside the App Store. Courts told Apple it had to loosen up, let developers point users to other payment options. Apple was found in contempt over how it complied — or didn't really comply. Now Apple's pushing it up to the highest court to try to get that contempt finding tossed.

Why do you care, as a builder? Because the App Store tax — that fifteen to thirty percent cut — is one of the single biggest line items shaping whether a mobile-first business is viable. Every dollar your user spends, a chunk goes to Apple before it ever reaches you. The whole Epic fight has been about whether you're even allowed to tell your own customer "hey, you can pay cheaper on our website." If the Supreme Court takes this and rules in a way that re-tightens Apple's grip, the brief window where developers got a little more freedom could slam shut again. If it goes the other way, the economics of building consumer apps on iOS get meaningfully better. This is not abstract. This is your unit economics, decided by nine people in robes.

And the rhyme to that — same melody, different instrument — is Google killing the Tenor GIF API. Tenor, if you don't know the name, is the GIF service behind the little GIF buttons in a ton of apps. Google owns it. And Google is shutting down the public Tenor API, which forces changes at X, at Discord, and a bunch of other platforms that relied on it to serve GIFs. Tenor still works fine inside Google's own apps, naturally. It's everybody else who has to go scramble and find another GIF source.

Now on its own, the death of a GIF API is a small thing. Nobody's life is ruined because Discord has to find a new place to get reaction GIFs. But the pattern is the whole point, and it's the pattern that should live rent-free in every founder's head. When you build on somebody else's platform — their API, their app store, their distribution — you are building on rented land. The landlord keeps the good stuff for the house and can change the rules or evict you whenever it serves them. Google keeps Tenor for Google apps and pulls it from everyone else. Apple keeps thirty percent and fights in court for years to keep keeping it. We talked about this exact dynamic a couple weeks back with Tesco walking away from VMware over lock-in. Same lesson every time. Dependence on a platform is a business risk, not just a technical one. Build your core where you control it, and treat every third-party dependency as something that could vanish or get expensive on a Tuesday with no warning.

Alright, let me lighten it up and give you a few quicker hits, because not everything has to be a cautionary tale.

Google put out Nano Banana 2 Lite — and yes, that's the real name, Nano Banana, I don't make these up. It's a faster, cheaper version of their image generator. The honest framing in the coverage is refreshing: the images may not look as good, but they only take a few seconds to make. And that's actually the interesting product decision. There's a real market for fast-and-cheap-and-good-enough image generation, especially for creators cranking out volume where they don't need every frame to be a masterpiece. It's the same fork in the road we keep seeing — a flagship tier for quality, a lite tier for speed and cost. Google's just being explicit that the lite one trades fidelity for throughput. For anybody building creator tools, having a cheap fast image option in your stack matters more than having the prettiest one nobody can afford to run at scale.

Next, a couple of agents-in-your-pocket stories. A startup called Acti is putting AI agents directly into your smartphone keyboard. Their bet is that the keyboard — the thing that's already there in every app — is the next home for AI assistants. You get a keyboard for iOS and Android that works across apps and lets you build custom AI shortcuts using plain natural language. And separately, OpenClaw, the free open-source agentic program, is now available on Android and iOS. So the agent-on-your-phone race is heating up from two directions — a startup picking the keyboard as the wedge, and an open-source project just showing up on the app stores.

I'll say this about the keyboard play: it's clever positioning. The keyboard is the one piece of software that follows you into every single app. If you can own that surface, you don't need anybody's permission to be everywhere. It sidesteps the whole platform-lock-in problem we were just complaining about, because the keyboard is already invited into the room. Whether people actually want to summon an agent from their keyboard, or whether that's a solution sniffing around for a problem, we'll find out. But the insertion point is smart.

Now let me pivot to a couple of stories that aren't AI at all but that builders should still clock.

Realta Fusion says it generated electricity directly from a fusion reaction — what's being called an apparent first. The CEO, Kieran Furlong, told TechCrunch, quote, we can take power from a plasma, and said the milestone shows what's possible. Now, I want to be careful here and not turn this into more than it is. "An apparent first" and "shows what's possible" are doing a lot of load-bearing work in that sentence. This is not fusion powering your house next year. But the reason a builder should care even a little is that the entire AI infrastructure conversation right now bottoms out in power. Every data center buildout, every memory fab, every chip — the ultimate constraint is electricity. We've said it on this show: the bottleneck isn't always the chips, sometimes it's the watts and the heat. So any genuine progress on a fundamentally new power source is, way downstream, a story about whether the compute buildout has a ceiling. File it under "watch, don't bet."

And then, because the universe has a sense of humor, NASA may send a backup, nuclear-powered Mars rover to the Moon. There's apparently a spare Mars rover with a nuclear power source, and the idea on the table is to repurpose it for lunar duty. The quote in the piece is just an engineer saying, quote, that would be an awesome capability. And you know what? Sometimes a story doesn't need a deep founder lesson. Sometimes it's just a great use of a spare part. We've all got that one component in the garage we swore we'd use someday. NASA's version costs a few billion and runs on plutonium.

Alright, let me do the deep dive now, and I want to spend real time on it because it's the one that ties everything together — the tokenizer wrinkle, the chip bookings, the workflow bets, all of it. The deep dive is on a question that sits underneath this entire industry: how big is the AI economy actually, and is any of this making money?

The context here is a long breakdown that went deep on a report called The State of the AI Economy, from the team at Exponential View. And I want to walk you through it because the discourse you hear is either "it's all a bubble about to pop" or "this changes everything," and the truth, as usual, is more interesting and more specific than either bumper sticker.

First, the methodology, because it matters. The report went through reports on over a thousand AI companies. They scored their sources by confidence — audited accounts count more than an executive bragging on an earnings call. And critically, they deduplicated, so that AI spending only gets counted once. Here's the example: if you spend a hundred dollars in an app, and that app sends sixty dollars to a model provider and thirty dollars to inference hosting, that's counted as a hundred dollars of AI economy, not a hundred and ninety. That kind of discipline is rare in this space, where everybody's incentivized to make the number look enormous by counting the same dollar three times.

So what's the number? They say AI companies have banked a hundred and ten billion dollars over the past twelve months, and they're at an annualized run rate of a hundred and seventy-five billion. And the speed of growth is the part that genuinely surprised me. Back in 2023, the AI industry needed a hundred and eighty days to add a billion dollars in cumulative revenue. It has now gotten roughly ninety times faster at that — it needs less than two days now to add each new billion. The report's framing is that AI demand is more validated by realized revenue than any prior platform shift. The sector's growing about three times faster than any IT wave before it. Demand, they say, is real, big, and fast.

Now here's where it connects to everything we talked about today. That demand has lit up what the report calls a compute super cycle. The global semiconductor market is projected to hit one-and-a-half trillion dollars this year, basically doubling from last year's seven hundred ninety-two billion. That's the Etched story, that's the memory shortage story, that's South Korea's trillion-dollar fab bet — all of it is this super cycle in one phrase. And it spills into power: between 2008 and 2024, after the financial crisis, US net electricity generation was basically flat, no growth for sixteen years. Since 2024, it's growing at about one-and-a-half times the historical average. That's the fusion story's relevance — the grid is being asked to grow again for the first time in a generation, because of this.

But here's the question that actually decides whether this is a bubble: does the revenue cover the bill? Because the buildout is staggering. Hyperscaler and so-called NeoCloud capital spending will reach eight hundred forty-eight billion dollars this year, and two trillion cumulatively since 2020. The report's careful conclusion is this: revenues are covering the ongoing expense, but not yet the cumulative bill. Starting in the fourth quarter of last year, quarterly revenue started to exceed the depreciation on all that capital spending. In English: the new money coming in is now outrunning the rate at which the gear wears out. That's a healthier picture than the doom crowd assumes.

And there's a wrinkle on the chips themselves that I found genuinely reassuring. One of the big bubble arguments has always been that these GPUs go obsolete almost instantly, so there's no time to earn back what you paid for them. But the data in this report suggests older GPUs are earning real yields well past their six-year depreciation life — meaningful returns into year seven, year eight, even year nine. If that holds, the math on paying back all that capital spending gets a lot more forgiving, because the asset keeps earning long after the accountants wrote it off.

Now the part that matters most for you, the builder. The report points out we're still early — AI revenue is equivalent to about zero-point-four-two percent of US GDP, while the broader IT sector is around nine-point-four percent. So there's enormous room to grow, and it is growing — AI revenue relative to GDP is up roughly three times versus the first quarter of 2025, and ten times versus 2024. And the engine of that growth is exactly the shift we keep circling: the move from chat to agents. An agent task can burn around twelve hundred times the tokens of a simple chat task. Twelve hundred times. Global token volumes are now running above thirty quadrillion per month, growing fourteen times year over year.

And here's the beautiful tension that ties back to Sonnet 5 and that tokenizer. Even as total AI spending climbs because of all those agent tokens, the cost per token keeps falling. The report cites the blended price per million tokens dropping from seventeen dollars to two dollars over a recent stretch, even as the models got dramatically more capable. Falling unit prices encourage more use, which makes previously uneconomical applications suddenly viable, which drives more volume. That's the flywheel. That's why a cheaper agent-tier model like Sonnet 5 is strategically a big deal even with a tokenizer quirk — the whole game is driving the cost of an agent action down far enough that founders can build things that didn't pencil out last year.

And where's the value landing? The report says revenue is still concentrated around chips — again, Etched, the super cycle — but the mix is starting to shift up the stack toward apps and models. The share of AI revenue coming from the app and model layer was up almost three times over the last year. For the first time, real app revenue, from companies like the coding tools, is showing up in the numbers. The value is moving toward the application layer. That's you. That's the most important sentence in the whole thing for somebody building a product.

The one number I'll leave you with from this, because it's the one that should change how you think about your own company: they compared revenue growth between companies with no AI spend and companies with high AI intensity — the top quarter of AI spenders by share of revenue. Companies with no AI spend grew revenue over three years roughly in line with nominal GDP, fifteen to twenty percent. Companies with high AI intensity grew revenue by more than a hundred percent over the same period. That's a ninety-two percent revenue growth differential between the heavy AI adopters and the abstainers.

Now, I'm gonna put my skeptic hat back on for one second, because that's my job and I'd be doing you dirty otherwise. Correlation is not causation. The companies spending heavily on AI might just be the aggressive, well-run, fast-growing companies who'd be growing fast anyway and who also happen to buy the shiny new thing. AI might be the symptom of a winning company, not only the cause. But even granting that caveat, the direction is clear and the deduplicated revenue numbers are hard to wave away. This is not 1999 dot-com vapor where nobody had revenue. There's a hundred and ten billion real dollars on the table, growing faster than any tech wave before it.

So let me bring it home. The honest takeaway from all of this isn't "bubble" and it isn't "to the moon." It's that the AI economy is real, it's big, it's covering its operating costs even if not yet its full historical bill, and the value is migrating up the stack toward the applications — which is the layer where most of you actually live and work. The thing that decides who wins down there is exactly the stuff we talked about all episode: keeping your inference costs down as the chip market gets competitive, building workflow moats instead of leaning on a model anybody can rent, scoping your agents so they don't get talked into a dream world and drain a customer's account, and not betting your company on rented land that the landlord can repossess on a Tuesday.

That's the menu for today. The model got cheaper, sort of, mind the tokenizer. The chips got competitive, watch your bill go the right direction. The science labs are buying workflow, not magic. And underneath all of it, the money is more real than the doomers want to admit and more disciplined than the hype men want to admit.

This is Tony DeLuca, and you've been listening to Barely Possible. Go measure your token bill before you migrate anything, be good to each other, and I'll see you tomorrow.

More episodes

Chapters

What is Barely Possible?