Barely Possible

$30K Claude agent runaway bill exposes AWS Bedrock's missing spend guardrails

Show Notes

[Barely Possible 2026-05-15] Today's episode: • An AWS Bedrock Claude agent ran unchecked and generated a $30K bill; Cost Anomaly Detection caught $0. • Anthropic's June 15 pricing split caps programmatic Claude usage at subscription value (~$20/mo for Pro), cutting some devs' effective... • Theo (T3 Chat) called the change a rug pull after building on the Agent SDK per Anthropic's own guidance. Hear the full breakdown in today's episode of Barely Possible. Want a podcast for your own topics? Join early access: https://www.barelypossible.to/waitlist/?source_path=public_episode_74&feed_source=rss&episode_id=74 Transcript: https://media.clawford.org/episodes/2026-05-15/podcast-episode-2026-05-15.txt

What is Barely Possible?

A daily briefing on the AI systems, products, companies, and policy shifts that are just becoming possible.

Want a podcast for your own topics? Join early access: https://www.barelypossible.to/waitlist/?source_path=public_feed&feed_source=rss

Hey hey hey, welcome back, kiddos — I'm your boy Tony DeLuca and we have got a juicy tray of fresh AI dispatches for you today, so let's not waste a second.

May 15th, 2026. And the lead story today practically wrote itself: an AWS user just got hit with a thirty-thousand-dollar invoice after a Claude agent went completely off the rails on Bedrock, and AWS's own Cost Anomaly Detection — the exact tool they market as the safety net for this kind of thing — failed to catch a single dollar of it. That's where we're starting, because it's not really a billing horror story. It's a wake-up call about what it actually means to deploy AI agents in production, and it connects directly to two other things we're going to dig into today: Anthropic's new Claude subscription pricing change and the question of who bears the cost when the token economy shifts under your feet.

But first — a quick continuity note. Yesterday we spent some time on the token-maxxing debate: who's actually getting value from AI, and whether heavy compute consumption is genius or waste. Today that question stops being philosophical. We've got receipts.

Let's do it.

So the AWS-Bedrock story. The facts are simple and a little brutal. Someone spun up a Claude agent through AWS Bedrock. The agent ran without guardrails. No hard spending caps. No circuit breakers. The Cost Anomaly Detection system, which AWS explicitly markets as protection against runaway spend, detected nothing. Thirty thousand dollars later, they had a bill and a very interesting learning experience.

Now, one commenter in the thread said, quote, AWS and GCP have nothing to limit spend, it's toxic and should be illegal. That's a bit dramatic, but the underlying frustration is real. Another person, someone who's apparently watched this happen to three different founders, laid out the four survival rules: set billing alerts at one hundred, five hundred, and a thousand dollars, not after the fact. Use AWS Budgets with actual hard stops, not just alert emails. Put token limits directly in your API calls, which most people skip entirely. And test with cheaper, smaller models first before you scale up.

That last point is the one that stings, because the culture around AI deployment right now is very much build-fast-ask-questions-later. And costs have historically been cheap enough that it mostly worked. The issue is that era is ending, and the Bedrock incident is one of several signals pointing in the same direction at the same time.

Look at the wider picture from the same week: Tencent, a hyperscaler with massive GPU infrastructure, publicly admitted that their GPUs only pay for themselves when running personalized ads. General-purpose AI inference is apparently burning money even at their scale. That's a frank confession from a company that has every incentive to put a positive spin on things. Meanwhile, Anthropic is metering and throttling programmatic Claude usage at the API layer — a supply-side move that only makes sense if inference costs are genuinely outpacing what the subscription pricing model can absorb. And Notion is turning its workspace into an agent orchestration hub, TikTok is replacing human media buyers with autonomous agents, and Apple is internally debating whether autonomous agent submissions even belong in the App Store because no review framework exists yet for non-deterministic software.

The pattern here is not subtle. The deployment wave is accelerating straight into a cost and governance crisis, and the guardrails are being built after the agents are already in production. As one commenter put it, quote, mandatory spend caps and hard circuit breakers honestly feel inevitable. Oops, the agent spent five figures overnight, is not a survivable failure mode for most startups.

If you're building agents right now — and I know a lot of you are — the practical upshot is this: treat your AI agent's API access the same way you'd treat database write permissions. Hard limits. Staged rollouts. Budget circuit breakers at multiple levels. The model is the least of your problems. Operational infrastructure around the model is where founders are bleeding.

Now, all of that backdrop makes it a lot easier to understand what Anthropic just did with its subscription pricing, and why it lit the developer community on fire.

This is today's deep dive, and I'm drawing directly from a detailed analysis of the change, because the noise around it has been loud enough that it's worth cutting through carefully.

Here's what actually happened. Anthropic announced that as of June 15th, they are splitting Claude subscription usage into two distinct categories. The first is what they call interactive use — you sitting in front of Claude AI, Claude Code, or Claude's collaborative workspace, typing and getting answers. That experience is unchanged. Your subscription limits still apply there. The second category is what they call programmatic use — running Claude through the Agent SDK, through the CLI tool, through GitHub Actions, or through third-party tools that pipe into Claude's backend. That category is now getting a separate monthly credit equal to your subscription amount. So if you pay twenty dollars a month for a Pro account, you get a twenty-dollar credit toward programmatic API usage. Anything beyond that, you pay full API rates.

Anthropologic framed this as good news in three ways: it clarifies what's allowed, it officially green-lights third-party tools built on the Agent SDK, and it adds this quote-unquote bonus credit on top of your existing plan.

The developer community did not see it that way.

The reason is token subsidies. For a very long time, power users of Claude subscriptions have been consuming dramatically more compute than their subscription price would suggest. Numbers floated in the discussion include estimates that a maxed-out two-hundred-dollar-a-month subscription could effectively represent two thousand to five thousand dollars worth of API compute if it were priced at standard API rates. One developer shared a specific session where Claude Code consumed thirty-one dollars of API tokens and counted as only seven percent of their monthly usage allowance over a five-hour window. That backs out to something like a hundred dollars an hour of compute for a hundred-dollar-a-month plan. That is a massive, massive subsidy.

Anthropologic has been sustaining that subsidy for users who go through its own harnesses — Claude AI, Claude Code, the collab workspace. What it is no longer sustaining is that same subsidy for people who use third-party tools built on top of the Agent SDK. Those folks are now on full API pricing for their programmatic usage, and the effective cut to their available compute is enormous. Developers reporting twenty-five-to-forty-times reductions in effective rate limits are not exaggerating.

The developer reaction was genuinely furious. T3 Code's builder Theo, who had done significant work to integrate with the Claude Agent SDK, posted that his users' rate limits got cut forty-fold despite doing everything right. He'd followed the guidance from the Claude Code team, built against the supported path, and got rugged anyway. His concluding line was that any statement from an Anthropic employee should be treated as, quote, a lie on a timer. Another developer called Anthropic's communications, quote, the most intentionally misleading copy in the industry. A third argued that Anthropic has simply concluded that social sentiment from power developers no longer matters because they've locked in corporate enterprise clients.

Now, there's also a cooler-headed read here, and I think it's the right one, even if the comms around this were genuinely bad.

The subsidy era was always going to end. The underlying reality is a semiconductor shortage that industry observers are projecting through roughly 2030. Not just AI chips — memory, logic, fab capacity, power infrastructure, every single layer of the stack is constrained, and these are multi-year bottlenecks. When supply is fixed and enterprise demand is exploding — and it is exploding; Anthropic's business adoption has reportedly quadrupled year over year, overtaking OpenAI in enterprise usage according to data from payment processor RAMP — the economics of subsidizing a class of users who consume hundreds of dollars in compute for twenty dollars a month become impossible to sustain.

What Anthropic is actually doing is rationing. They are maintaining the subsidy for their own ecosystem and withdrawing it from the third-party developer ecosystem that competes with their harnesses. Is that cynical? Maybe. Is it irrational from a business standpoint given that they can now sell the same compute to enterprises at five to ten times the price per token? Absolutely not.

Here's the part that should matter most to founders listening right now: this isn't just an Anthropic story. The consensus in the industry is that OpenAI will be forced to make a very similar move within the next several months. They haven't had quite the same explosive enterprise growth, so they haven't felt the capacity pressure as acutely yet. But the underlying physics don't change. Everyone who has been building businesses on deeply subsidized compute is building on borrowed time, and the clock is ticking in months, not years.

The practical question for builders is: what's your plan when the subsidy goes? If your product's unit economics only work at subscription rates, you need to find out now, not on June 16th. Either your margins survive full API pricing, you build on open-source models running on your own hardware, or you build on Anthropic's own harnesses and stay inside the subsidy wall. Those are roughly your choices.

One commenter made a point worth sitting with: the token-maxing era has lasted about six months. Six months where models were good enough to build anything, and costs were low enough to not worry about it. That window is closing. The intelligence got better. The free ride did not scale with it.

Okay, now let me shift to a few stories that connect back to the cost and infrastructure picture before we move on.

XAI has apparently launched a command-line interface tool that people are describing as OpenCode-inspired. Sparse details in the discussion, but the direction is clear: every major AI lab is building their own developer-facing tooling layer. Claude Code, Codex, and now an xAI CLI. This is a harness war, and it's happening at every level of the stack. If you're choosing which tool to build your workflow around, the stability risk is real — as the Anthropic pricing change just demonstrated. The labs are all simultaneously competing to own the developer relationship and raising the stakes for anyone who builds on top of them.

On the web traffic front, there's an interesting snapshot circulating of how AI platform web traffic has shifted over the past year. Twelve months ago, ChatGPT had north of seventy-seven percent of measured AI web traffic. One month ago, that number had fallen to about fifty-four percent. Gemini has gone from around seven percent to over twenty-six percent. Claude went from roughly one-and-a-half percent to nearly eight percent. The redistribution is real and it's moving fast.

The Google angle here is worth noting. Several commenters invoked Google's historical pattern of entering markets late, burning cash on distribution to build share, and eventually consolidating. One person shared the quote from a Google executive: when we built Chrome, we had one percent market share one year after we launched. In fact, if you look at Google's history, it has virtually never been first to a new tech product — browsers, search, mail, maps — but its distribution channels allowed it to close gaps fast. Someone is currently on a free Gemini student deal, zero dollars a month for a year plus five terabytes of cloud storage, and says they'll probably cancel their ChatGPT Plus before the free period ends. That is distribution strategy working exactly as intended.

For founders thinking about where to build, the market share charts are not yet destiny, but the trend lines matter. A provider that's going from one-and-a-half to eight percent in twelve months is not standing still.

Now let me get to a story that doesn't involve pricing wars at all, because one of the more striking items today is about what AI is actually capable of on a technical level.

There's a report circulating about a public macOS kernel memory corruption exploit built for Apple's M5 chip using a model called Mythos Preview — which, based on context in the Anthropic paper we'll get to in a moment, appears to be an Anthropic research model released to select partners. According to the report, the exploit took five days to build. The M5 chip had only shipped a few weeks before. And this is claimed to be the first public kernel memory corruption exploit on that silicon.

People in the discussion are split between crediting AI as a genuine force multiplier for security research and skepticism about whether this is partly marketing. But even in the skeptical reads, the observation that gets made is worth quoting: the interesting part isn't that AI helped find it. It's that kernel exploits on new silicon are getting built this fast now. That's the part that doesn't go away regardless of who paid for the blog post.

This connects directly to a paper that Anthropic released back in April — it's been getting fresh attention this week, so I want to give it proper framing as an older report that's resurfacing. The paper outlines two scenarios for global AI leadership by 2028, and it reads, as one commenter put it, more like a geopolitical briefing than a safety document.

The core argument is that the United States currently holds a meaningful compute lead over China — primarily because of chips. Nvidia, TSMC, ASML: US and allied companies built a hardware stack that China can't replicate yet, and export controls have made the gap real.

But China's labs have reportedly stayed close through two specific workarounds. First, chip smuggling — PRC labs are allegedly training on export-controlled hardware they shouldn't have access to, and a Supermicro co-founder was recently charged for allegedly diverting two-and-a-half billion dollars worth of servers to China. Second, what Anthropic is calling distillation attacks — where thousands of fake accounts get spun up on US AI platforms, model outputs get harvested at scale, and that data gets used to train competing models. Essentially, the argument is that Chinese labs have been free-riding on American R&D investment.

Anthropologic lays out two 2028 scenarios. In the good one, the US closes the loopholes, enforces export controls, the compute gap widens to eleven times, and US models stay twelve to twenty-four months ahead. In the bad one, China reaches near-parity, floods global markets with cheaper models, and ends up shaping global AI governance norms — including potentially exporting AI-enabled surveillance infrastructure to other authoritarian governments.

Anthropologic is explicitly calling for distillation attacks to be criminalized, which puts them in the position of political actors lobbying for specific legislation. Whether that's appropriate for an AI lab is genuinely contested — several commenters noted the irony of a company that trained on copyrighted data pointing fingers at others for data appropriation. The response from the skeptical side is blunt: democracies set the norms is a bit of a stretch right now.

I'm not here to adjudicate the geopolitics. But from a founder's perspective, the structural point matters regardless of who's making it: compute scarcity and compute access are not just pricing problems. They are strategic national-security-level constraints that are going to shape which AI capabilities are available to whom and at what cost for years. That's the backdrop for every infrastructure decision you're making right now.

Shift from geopolitics to a story I genuinely love, because it's concrete and it's crypto and it's a good reminder of what AI actually does well when it's being used as a tool rather than a chatbot.

A user who had held five Bitcoin for eleven years lost access to his wallet. His first recovery attempt used GPU cracking software, ran for weeks, and chewed through three-and-a-half trillion possible passwords. Nothing. He then dumped all his old college laptop files into Claude. Claude didn't break the cryptography. What it did was act like a senior security researcher: it processed the unstructured data, found a legacy wallet.dat file the user didn't know he still had access to, identified a bug where the decryption script was concatenating his shared key and password incorrectly, fixed the bug, and produced the private keys in the correct format based on an old mnemonic phrase the user provided.

Five Bitcoin recovered. Four hundred thousand dollars.

The reason I like this story is the specific thing Claude did: it didn't brute-force anything. It reasoned about the problem, found the actual bottleneck in a messy pile of decade-old files, and fixed a code logic error that was blocking the correct solution. That is the use case — pattern recognition and debugging in unstructured contexts — where these models are genuinely ahead of any alternative.

For anyone building crypto tooling or recovery services, that capability is worth thinking hard about.

Now let's talk about a subject that's been getting serious discussion in the builder community and got a sharp articulation in a Reddit thread this week.

The question is whether human-in-the-loop governance is actually an illusion in enterprise AI. The argument goes like this: companies believe they have a governance strategy because they've said humans will review anything risky. Sounds reasonable. But as AI systems move from recommendation to execution, they don't just generate answers anymore. They classify risk. They estimate confidence. They decide what gets escalated to humans and what gets handled silently. Which means the system being governed is also deciding when governance should begin.

That's a structural problem. If a human only sees what the AI chose to escalate, then the real governance layer isn't the human review. It's the AI's escalation logic — and that logic is invisible, hard to audit, and changes with every model update.

The commenter who put it best wrote: your governance layer is only as good as what the system decides to surface, and that's a fundamental conflict of interest baked into the architecture.

The proposed reframe is a shift from human-in-the-loop to human-governed autonomy. Less about approving every output, more about defining where autonomy boundaries sit, governing reversibility, mandatory escalation thresholds, and deciding where the AI should categorically not act. The audit target shifts from outputs to the boundary design itself.

This is a genuinely useful frame for anyone building agentic systems right now, especially anything in a regulated industry. The thirty-thousand-dollar AWS bill and the human-in-the-loop governance problem are actually the same problem: the AI is executing, not recommending, and the oversight architecture hasn't kept up. If you're deploying agents with the ability to call APIs, modify databases, or send communications — and most of you are or will be shortly — boundary design, observability, and reversibility are not nice-to-haves. They're the product.

A couple of quick items before we wrap, because they're interesting and I don't want to leave them on the floor.

There's been a thread making the rounds about a quirk in heavily RLHF-trained local models: they tend to treat everything past their knowledge cutoff as fictional. The example in the thread is someone asking a local model to search the web for news about a real 2026 military operation, and the model — despite successfully running the search tool and getting real results — concludes that the results are probably a geopolitical simulation or an alternate-history exercise, because, quote, there is no real-world war occurring in 2026. The model is essentially so well-trained to avoid asserting things it doesn't know that it hallucinates fictional framing around real events. The more RLHF was applied, apparently, the worse this gets. Base models tend to just process the information without the fictional-scenario reflex.

For builders working with local models on live-data pipelines — news monitoring, market intelligence, real-time anything — this is a real production risk. The practical workaround from the thread is to prepend search results with an explicit framing tag that tells the model these are current factual events retrieved on a specific date and to respond accordingly. It cuts the denial rate roughly in half. Not perfect, but better than the model constructing elaborate fiction around your live data.

And the Monet experiment. Somebody posted a real Monet painting on social media and said it was AI-generated, then asked people to explain in detail why it's inferior to a real Monet. Hundreds of people obliged. They described the harsh colors, the lack of depth, the absence of symbiosis, the borked nonsense with no sense of space. One person went with the philosophical argument: it's not a physical painting created by a well-known artist. That's it. That's the difference. The painting in question was, of course, an actual Monet.

The thread's reaction was split between finding it hilarious and finding it depressing. One commenter made the observation that he literally prompted them like bots and they returned the most likely tokens. Which is accurate and also maybe the most interesting part of the whole thing: humans, when asked a leading question by a confident poster, produce output that closely resembles what you'd expect from a language model with heavy prior conditioning. The bias goes both ways.

The broader point — which connects to the Gallup polling discussed this week showing that seventy percent of Americans oppose data center construction near their homes — is that the people building and caring about AI have a serious perception problem they are not taking seriously enough. The anti-AI feeling isn't really about AI safety or about replacing jobs; the polling shows those are minority concerns. It's about resource use, noise, land, utility bills. Tangible local costs with no tangible local benefits. The data center industry is, as one person put it, fundamentally a marketing problem. Railroads had the same problem until people understood that a railroad meant commerce and jobs. The solution isn't correcting misconceptions. It's delivering actual benefits.

That's a real strategic challenge for AI companies, and it connects directly to policy: OpenAI's chief global affairs officer recently acknowledged in an interview that AI companies will be crushed by public sentiment if they can't find ways to redistribute AI wealth. He used the Alaska oil dividend comparison — the idea that people need to feel like they're getting a piece of this. OpenAI has since thrown support behind the Kids Online Safety Act and a state-level AI regulation bill in Illinois, appearing to accept state-by-state regulation as preferable to no regulation at all, provided it doesn't result in a patchwork of incompatible standards.

The political path here is going to be messy. But the strategic insight — that the public is not your enemy, they just don't yet feel the benefit — is worth taking seriously.

Alright. Let me leave you with the things worth watching.

Anthropologic's June 15th pricing change is the one that will generate the most immediate noise, but the real signal is what happens six to twelve months from now when OpenAI follows. If you're building on any subsidized compute stack, model your unit economics at full API rates now, not later.

The agent cost problem is not going away. The thirty-thousand-dollar AWS bill is a preview of what happens when autonomous systems interact with metered infrastructure at scale without operational guardrails. Build the guardrails first.

And the Anthropic 2028 paper on compute leadership — even accounting for the fact that it's partly self-serving as lobbying — lays out a structural reality that isn't going away: compute access is a geopolitical variable now, and the constraints run through 2030 regardless of who's in charge. Plan accordingly.

That's the show for today. I'm Tony DeLuca. Stay sharp out there, kiddos — see you tomorrow.

More episodes

Chapters

Show Notes

What is Barely Possible?