Barely Possible

[Barely Possible 2026-05-31] Today's episode: • Bernhard Hauser's May 30 piece argues productized services are back, with AI cutting back-of-house costs so fixed-scope work hits... • Simon Willison calls token-maxing leaderboards "a stupid idea," worse on enterprise Anthropic/OpenAI plans billed at full API price... • Granite-4.1-30b lands in the shadow of Qwen3.6 and Gemma4, raising the question of which models are worth your bench time. Hear the full breakdown in today's episode of Barely Possible. Want a podcast for your own topics? Join early access: https://www.barelypossible.to/waitlist/?source_path=public_episode_90&feed_source=rss&episode_id=90 Transcript: https://media.clawford.org/episodes/2026-05-31/podcast-episode-2026-05-31.txt | Notes: https://media.clawford.org/episodes/2026-05-31/2026-05-31-notes.md

What is Barely Possible?

A daily briefing on the AI systems, products, companies, and policy shifts that are just becoming possible.

Want a podcast for your own topics? Join early access: https://www.barelypossible.to/waitlist/?source_path=public_feed&feed_source=rss

Okay kiddos, it's your boy Tony DeLuca, and welcome back to Barely Possible, where we read the technical stuff so you don't have to, and then we tell you which parts are actually worth your time. Today's a lighter menu than usual, but there's a real meal in here if you know where to look. We're gonna talk about a business model that died, got buried, and is apparently crawling back out of the grave. We're gonna talk about why a guy who knows this industry cold thinks one popular way of bragging about AI usage is, his words, a stupid idea. And we're gonna kick around the question of how you actually figure out what your AI spend is buying you, because spoiler, nobody really knows. Plus a few quick hits from the engineering bench. Buckle up.

Let's start with the one that'll matter most to you if you're running a small shop or thinking about starting one. There's a piece out, dated the thirtieth of May, from Bernhard Hauser over at Growing Ventures, and the headline is plain as day: Productized Services Are Back. And before you roll your eyes, let me tell you why I think this one's worth chewing on, because it connects to a thread we've been pulling at all week on this show.

Here's the setup in Hauser's words. He says, do you remember when everyone online was talking about productized services? Just two years ago, productized services were everywhere. He says a lot of the founders he knew who were doing consulting on the side were packaging up that work into fixed-scope offerings. And then, the implication is, the conversation moved on. The hype cycle rolled over to the next thing. Productized services got quietly shelved as a 2024 idea, the kind of thing you did before you found a real SaaS product to build.

Now, let me explain what a productized service actually is, because the term gets thrown around loosely. It's the middle ground between consulting and software. Consulting is custom, hourly, every engagement is a snowflake, and it doesn't scale because you're selling your hours. Software is the dream: build it once, sell it a thousand times, the marginal cost rounds down to nothing. A productized service is the thing in between. You take a repeatable chunk of expert work, you put a fixed price on it, you put a fixed scope around it, and you sell it like a product even though there's a human doing the work behind the curtain. Logo design for four hundred bucks, flat. A landing page audit, here's the deliverable, here's the price, here's the turnaround. You're not selling your time, you're selling an outcome with a sticker on it.

Now why would that be coming back right now? The piece is short and the full argument is behind the audio, so I'm not going to pretend I have every line of his reasoning. But here's the thing that jumps out at me, and it ties directly into what we've been hammering on this whole week. We spent the last few episodes talking about the economics of AI agents. Yesterday we were on the harness economics, the token pricing, who actually pays full freight. The day before, indie builders stitching together free tiers to close the capability gap. And the running theme has been: the model is cheap-ish, but turning the model into something a customer will actually pay for is where all the real work hides.

Productized services are the answer to that, and here's why. If you're a small team and you've got AI that can do eighty percent of a task, you don't have a product yet. You have a capability. The gap between a capability and a product is all the unglamorous stuff: the quality control, the edge cases, the customer hand-holding, the thing where the AI does something dumb and a human has to catch it before it ships. A productized service lets you sell the outcome today while a human babysits the AI in the back. You're not waiting two years to build the perfect autonomous SaaS. You're charging money this month, you're learning what customers actually want, and the AI is your margin lever, not your whole product.

That's the bet. AI makes the back-of-house cheaper, so the productized service that used to barely pencil out now has real margins. You used to need three people to deliver that fixed-scope package profitably. Now maybe you need one person and a good agent setup. The economics flip. The thing that was a stepping stone on the way to software becomes a destination, because the margins start looking like software margins without the years of build time and the burn.

Now I want to be careful here, because I'm extrapolating from a short post and my own read of the moment. Hauser's the one calling the trend; the deeper mechanics, I'm inferring. But the logic holds together, and it lines up with a pattern we keep seeing. Everybody wants to be a product company because that's where the valuation multiples live. But a product company that's pre-revenue and pre-product-market-fit is just a money furnace with a logo. A productized service is revenue from day one. It's customers from day one. It's a feedback loop from day one. And in an AI world where the underlying capability is getting commoditized fast, the durable thing isn't the model, it's the relationship with the customer and the judgment about which eighty percent the AI handles and which twenty percent still needs a human who knows what they're doing.

So if you're sitting there with technical skills and you've been told the only respectable path is to raise a seed round and build a platform, here's a different read. Package the thing you already know how to do. Put a price on it. Let the AI eat your cost structure from the inside. You can always platform-ify it later once you actually understand the demand. The founders who survive the next couple years might not be the ones with the cleverest models. They might be the ones who figured out how to charge money while everybody else was still demoing.

That's my take, and I'll flag it as a take. The piece is the seed; the tree I just grew is mine. Watch this space, because if productized services really are coming back, it's a tell about where the AI economy actually pays out, and it's not where the headlines have been pointing.

Now, that question of where the money actually goes and what it buys you, that's a perfect bridge into the next thing, because it comes from a guy who's been beating this exact drum, and he's getting sharper about it.

Simon Willison, who if you build software you already know, he's been in a thread recently going back and forth with Ed Zitron and Madison Mills, and he dropped a line on the thirtieth that I want to read you, because it's small but it's pointed. He's talking about what he calls token maxing leaderboards. And he says, those token maxing leaderboards, which were clearly a stupid idea to begin with, are even more of a stupid idea if you're on an enterprise Anthropic or OpenAI plan where you get billed at full API token price, not consumer-subscriber discounts.

Okay, let me unpack what a token maxing leaderboard even is, because this is one of those inside-baseball things that tells you a lot about the moment we're in. Somewhere along the way, burning tokens became a flex. People started bragging about how many tokens they pushed through Claude or GPT in a week, like it's a badge of honor, like the number itself proves you're getting value. There are leaderboards. People are competing to consume the most. And Willison is calling that exactly what it is: dumb. Burning more tokens isn't a measure of output. It's a measure of input. It's a measure of cost. Bragging about your token count is like bragging about how much gas your car burned on the way to work. Congratulations, you spent more money. That's not the win you think it is.

And then he twists the knife with the enterprise pricing point, and this is the part that connects to where we've been all week. On a consumer subscription, you pay a flat monthly fee and you can hammer the thing, and the cost per token to you feels like nothing because you've already paid your twenty bucks or your two hundred bucks and you're just running it. But on an enterprise plan, you're often paying full API token rates. No flat-fee buffet. Every token is metered, every token has a price, and that token-maxing leaderboard behavior that felt free on consumer suddenly shows up on a real invoice at full freight.

We talked yesterday about harness economics and who pays full price for tokens, and this is the same wound from a slightly different angle. The behavior the consumer-tier culture trained people into, just run it, throw more at it, max it out, is the most expensive possible behavior the moment you cross into enterprise billing. The habits don't transfer cleanly. The consumer experience taught people to treat tokens as free, and then enterprise hands them the actual bill.

For you as a builder, here's the practical takeaway. If you're building on top of these APIs and you're billing your own customers, your token discipline is your margin. The leaderboard mentality is the enemy of a healthy unit economic. Every time your product does something lazy, like stuffing the whole conversation history back into the context window on every call, or re-running an agent loop three times because you didn't bother to cache the intermediate result, that's tokens, that's money, and at enterprise rates it compounds fast. The folks who win on margin are the ones treating tokens like a scarce resource, not a flex.

And this dovetails right into the deeper question that thread was actually circling, which is the real puzzle nobody's solved: how do you even measure what AI is doing for you? Because the token count is the easy thing to measure, which is exactly why people glommed onto it. It's a number. It goes up. But it tells you about cost, not value. And the value side, that's where everybody's flying blind.

The deeper point in that conversation, and it's been a running theme in Willison's commentary, is that measuring the productivity of software teams and knowledge workers in general remains an unsolved problem. We've never had a clean way to do it. Lines of code is a famously terrible metric; the best engineers often delete more than they write. Story points are made up. Velocity charts are theater half the time. And now you bolt AI on top of this already-fuzzy measurement problem and you try to calculate an ROI on your AI spend, and it's furiously difficult, because you can't even cleanly measure the denominator.

Think about what that means for the enterprise buyer right now. The CFO signs off on a big Anthropic or OpenAI contract. Six months later somebody asks, what did we get for that. And the honest answer is, we don't have a rigorous way to know. We can measure the spend down to the penny, because that's metered. We can't measure the productivity gain with any confidence, because we never could measure productivity in the first place. So you've got a perfectly precise cost number sitting next to a completely fuzzy benefit number, and that asymmetry is dangerous. When the precise number is the cost and the fuzzy number is the benefit, the cost wins the argument every time the budget gets tight.

This is, by the way, the exact mechanism behind the annual AI slowdown panic we touched on earlier this week. It's not necessarily that the tools stopped working. It's that the bills are concrete and the gains are vibes, and at some point in the budget cycle, vibes lose to invoices. The companies that figure out even a rough, defensible way to tie AI usage to actual business outcomes, dollars saved, tickets closed, revenue influenced, those companies will keep their AI budgets. The ones treating it as a faith-based line item are going to get squeezed the first time somebody with a spreadsheet starts asking hard questions.

So what do you do about it as a builder? You instrument the outcome, not the activity. Don't tell your customer how many tokens your agent burned. Tell them how many support tickets it resolved, how many hours it gave back, how many documents it processed that a human would've had to slog through. Tie your pricing and your reporting to the thing the customer actually cares about. Which, by the way, brings us right back around to the productized services point. A productized service prices the outcome. You're charging for the deliverable, not the tokens. The customer never sees the messy metered cost; they see a fixed price and a result. That's not a coincidence. The business models that work in this environment are the ones that hide the token meter behind an outcome the customer can understand. The ones that struggle are the ones that pass the raw metered chaos straight through to the buyer and ask them to do the ROI math nobody knows how to do.

Alright, let me shift gears, because there's a quieter conversation happening among builders that I think is worth surfacing, and it's the antidote to all this hype-leaderboard nonsense. There was a recent thread over on the artificial subreddit asking a simple question: what AI or dev tools are people actually sleeping on right now? Not the same five products everybody recommends. The stuff flying under the radar.

Now this is a Reddit discussion, single thread, take it for what it is. But the answers were more interesting than the question, and one comment in particular stuck with me. Somebody said the category people still underrate is observability and orchestration tooling for AI systems. Not the flashy models themselves, but structured tracing, eval pipelines, async task coordination, guardrail layers. And here's the line that nails it: because once projects move beyond demos, that infrastructure starts mattering more than whichever model is hottest that week.

That is exactly right, and it's the unglamorous truth of this whole business. The demo is easy. The demo is one prompt, one happy path, one clean output that makes everybody in the room go ooh. Production is the part where the model does something weird at three in the morning and you need to know what it did, why it did it, and how to stop it from doing it again. That's tracing. That's evals. That's guardrails. And nobody's bragging about their eval pipeline on a leaderboard, which is precisely why it's underrated and precisely why it's where the real durability is.

Another commenter put it beautifully: the best tools often disappear into your workflow so smoothly that people stop noticing they're using them. And another said the most useful tools are never the flashy demos, half the time it's some random repo with eight hundred stars that quietly saves you hours every week. That's the truth of building. The stuff that actually moves the needle is boring, it's reliable, and it doesn't trend.

Somebody also flagged NotebookLM as underrated for non-developer, research-heavy work. The pitch there: you upload your own documents, you ask questions, and it only answers from what you gave it, so you're not getting hallucinations pulled in from the open internet. That grounding, answering only from the source material you control, is genuinely useful for anybody doing research where being wrong is expensive. I'd just gently note that grounded doesn't mean infallible; a model can still misread the document you handed it. But the principle, constrain the source, reduce the surface area for nonsense, is sound.

The through-line from the token-maxing rant to this thread is the same lesson. The hype rewards the visible and the loud. The actual work rewards the invisible and the quiet. The leaderboard culture says max your tokens. The people actually shipping say instrument your system so you know what it's doing, and reach for the boring tool that disappears into your workflow. If you're deciding where to spend your attention as a builder, that's the tell. Chase the boring infrastructure. That's where the moat is.

Now let me pivot to something a little more unsettling, and it's been bouncing around the artificial subreddit in a recent post. The framing was blunt: nothing is real anymore, we're reaching the point where crowd scenes can be entirely generated by AI. And the claim is that AI can now realistically simulate massive crowds, public events, the whole thing, convincingly enough that you can't easily tell.

Now I want to be careful with this one, because it's a Reddit post with a video and a lot of people arguing in the comments, not a verified benchmark or a research result. So I'm not going to tell you the technology definitively crossed some line on a specific date. What I'll tell you is what the conversation reveals, because the conversation is the actual story here.

The poster's framing was that the scary part isn't the quality anymore, it's how fast people are finding creative ways to use it. And one of the sharper comments reframed the whole thing: we're moving from seeing is believing to seeing means basically nothing. A few years ago, faking a crowd at this quality would've needed a movie studio budget. Now, the claim goes, somebody can generate it overnight for engagement, marketing, propaganda, fake hype, whatever.

Here's the part that actually matters for you as a builder, and the same commenter landed on it. The shift is that trust online is moving away from the content itself and toward source reputation and community verification. Read that again, because it's a real architectural insight buried in a Reddit argument. When the content can't be trusted on its face, trust relocates. It moves to who's vouching for it. It moves to provenance. It moves to the reputation of the source and the verification of the community around it.

That's not just a doomer observation, that's a product opportunity and a product constraint at the same time. If you're building anything that handles media, anything that surfaces content to users, anything where authenticity matters, the question stops being is this image real and becomes can I prove where this came from and who stands behind it. Provenance becomes a feature. Verification becomes a feature. The cryptographic chain of custody, the trusted-source badge, the community attestation layer, that stuff goes from nice-to-have to load-bearing. And on the flip side, if your product generates content, you'd better be thinking about how you label it, because the regulatory and reputational pressure on unlabeled synthetic media is only going one direction.

I'll be honest, some of the commenters thought the whole panic was overblown, pointing out the generated stuff still has tells if you look closely. And that's fair, today. But the smart money doesn't bet on the tells lasting. The trajectory has been one direction, and planning your product around fakes always being detectable by eye is planning to be wrong. Build for the world where you can't tell, and make provenance your answer. That's the move.

Alright, let me bring it down to the engineering bench for a few quick ones, the kind of stuff that doesn't make headlines but tells you where the craft is heading. And I'll keep these tight because they're more for the builders in the back than the strategy folks up front.

First, there's a recent thread on the LocalLLaMA subreddit asking whether IBM's Granite 4.1 30B model is getting overshadowed by Qwen 3.6 27B and Gemma 4. And the consensus in the comments was pretty blunt: yeah, it's overshadowed, partly because Qwen 3.6 27B and Gemma 4 31B are just benchmarking better, and partly because IBM doesn't do hype marketing, so Granite flies under the radar. One commenter who'd tried it on a simple task said it performed poorly compared to even a much smaller Qwen model. But here's the nuance worth keeping: a couple of folks pointed out that a 30B dense model is still solid for the unsexy production jobs, function calling, text extraction, the structured stuff where you don't need flashy reasoning, you need reliable, budget-friendly output. And IBM itself noted in the model discussion that reasoning-focused Granite models are in the works; the current one is deliberately aimed at compact use cases with strict token budgeting. Which, notice, is the same token-discipline theme we've been circling all episode. There's a real market for a model that does the boring extraction job cheaply and predictably, even if it never trends on a leaderboard. Don't sleep on the workhorse just because it's not the showpony.

Second, a research project that caught my eye, mostly because the framing was so vivid. It's called NeuroFlow, posted recently on the MachineLearning subreddit, and the one-liner is great: Vision Transformers waste ninety percent of their compute recalculating stationary asphalt. The idea is that when you run a vision model on video, most of the frame isn't changing, the road, the sky, the background, it's the same from frame to frame. NeuroFlow tracks what they call semantic surprise, basically which patches of the image actually changed, and it throws away the boring unchanged tokens before they hit the encoder. The headline number is a claimed fifty-five-times wall-clock speedup on high-resolution video while keeping around ninety-seven percent fidelity, and notably with no fine-tuning required. Now, this is a single self-posted project with the code on a public repo, and those eye-popping speedup numbers are the author's own claims, not independently verified, so file it under promising-if-it-holds. But the underlying intuition, don't spend compute re-deriving the parts of the input that didn't change, is exactly the kind of efficiency thinking that turns into real cost savings when you're running vision at scale. The link's in the show notes for anybody who wants to kick the tires.

Third, a quick practical one for anybody training models, also from the MachineLearning subreddit. The note was about profiling PyTorch training without accidentally stalling your GPU. The core observation is one of those things that sounds obvious once you hear it but bites everybody: the more you measure, the more you change the behavior of the thing you're measuring. Specifically, the common move of calling synchronize to get clean timing boundaries actually inserts synchronization points into an otherwise asynchronous workload, which slows the real run down. The suggested alternative is using CUDA events around the boundaries you care about and reading them later, so you capture timing without forcing the GPU to stop and wait. A commenter pushed back that for most training you can just use the built-in profiler with a few warmup steps and export the trace, which is fair. But the deeper lesson, and the reason I'm mentioning it at all, is the one a commenter nailed: a lot of people coming from CPU profiling badly underestimate how asynchronous GPU execution really is. Your mental model from the CPU world will lie to you. If you're moving into GPU-heavy work, that's a trap worth knowing about before it costs you a week.

And one last bench note, because it's a fun one. Someone posted a recent project: a tiny open-source self-driving AI, seven megabytes, that they say runs on a phone and learns navigation, lane following, and even drift recovery from visual and sensor input. The pitch is autonomous driving on lightweight edge hardware without server-scale infrastructure. Now, a commenter immediately and fairly poked at the framing, noting that designed for real-time autonomous driving on a phone doesn't entirely make sense as a serious claim. And I'd agree, nobody should be putting their actual car in the hands of a seven-megabyte phone model. But as a demo of how small you can squeeze a competent control model, it's a neat data point. The trend that matters underneath the overclaim is real: capable models keep getting smaller and cheaper to run at the edge, and that's the same cost-and-margin story driving everything else today. When the model fits on the edge, you're not paying for the cloud round trip, and that changes the economics of what you can build.

So let me tie the bow on this, because there's actually one clean thread running through everything today, and it's not the one the hype machine wants you to chase. The leaderboard culture says burn more tokens, ship the flashy demo, brag about the big numbers. And every single story today quietly argues the opposite. Productized services win because they hide the messy cost behind a clean outcome. Willison's token-maxing rant says the big consumption number is a cost, not an achievement. The unsolved productivity-measurement problem says the precise cost will always beat the fuzzy benefit when budgets tighten, so you'd better measure outcomes. The underrated-tools thread says the durable stuff is the boring observability layer nobody brags about. Even the Granite workhorse and the GPU-profiling note are about discipline, efficiency, knowing what your system is actually doing under the hood.

The builders who make it through the next stretch aren't going to be the ones with the highest token counts or the loudest demos. They're going to be the ones who quietly figured out the unit economics, instrumented the outcomes, and charged real money for a result a customer could understand. That's not glamorous. It never is. But that's where the survivors live.

That's the menu for today. A little lighter than usual, but I'd rather give you four things worth chewing on than twelve things you'll forget by lunch. Links to everything are in the show notes, as always, and if a productized service is what gets you off the sidelines and charging money this month, don't let me stop you. This has been Barely Possible. I'm Tony DeLuca, take care of yourselves out there, and I'll catch you on the next one.

More episodes

Chapters

What is Barely Possible?