You've Been a Bad Agent

Are we in a moment? The OpenClaw/Clawdbot autonomous agent craze explained

Why everyone suddenly bought Mac Minis (and why Matt refuses to)

Moltbook: Reddit for agents where bots questions their mortality

Files vs SQLite for agent memory - what actually works for long-term storage

MCP's future: stateless tools, elicitation, and a world with no human in the loop

Walled gardens vs open internet - will APIs open up for agents to roam free?

Models getting better at "just keep going" - the metric Anthropic optimized for

The massive rift between frontier AI users and "stochastic parrots" skeptics

Blacksmith CI: Wilhelm's hot tip for faster GitHub Actions

Closing question: If Moltbook is Reddit for agents, what is GitHub for agents?

Architecture twitter article about Clawdbot: https://x.com/Hesamation/status/2017038553058857413

Creators and Guests

Host

Matt Carey

agent and mcp at Cloudflare

Host

Wilhelm Klopp

building @kolo_ai

What is You've Been a Bad Agent?

Wil and Matt discuss tech, startups, and building really cool things with AI. Sometimes joined by (actual expert) friends.

Wilhelm (00:05.326)
Hello, good afternoon. You have the exact same background as last time, but you're in a completely different place.

Matt Carey (00:07.339)
Hey dude, how are you?

Matt Carey (00:15.617)
Yeah, I guess it's the Cloudflare Grey booth.

Yeah. Dude. Mate, I have a better view now though. You can't see, but I am looking out and there's a river and a bridge and yeah, have you ever been? Wait, I'm gonna show you.

Wilhelm (00:21.186)
You've been sent to the booth again. What have you done this time?

Wilhelm (00:34.232)
No way.

Wilhelm (00:38.084)
I've been to Lisbon, not to... Oh, whoa. Whoa, look at that. I can't see. That's cool. Yeah, I confirm River Bridge.

Matt Carey (00:41.357)
you

Matt Carey (00:49.759)
And the bridge is very suspensiony.

Wilhelm (00:53.972)
Mmm. What does that mean?

Matt Carey (00:56.177)
Mmm. I don't know, it kind of looks a bit like the Golden Gate Bridge.

Wilhelm (00:59.492)
Okay, I thought you were saying it like sways in the wind or something like

Matt Carey (01:04.203)
No, it doesn't. Except, it does have a train that goes along the bottom, which is kinda cool.

Wilhelm (01:09.752)
That's cool. Yeah, I wish we had that here. That would be fun.

Matt Carey (01:11.777)
That's like New York style. Bro, there's so much to talk about.

Wilhelm (01:14.908)
yeah, it's okay. Wait, let me roll the intro.

Wilhelm (01:29.536)
It's so good. It's so good.

Matt Carey (01:29.645)
That's sick. I actually have no idea how you do that. I need to work it out. It's so cool.

Wilhelm (01:33.834)
I was singing in the shower the intro. I think it's genuinely catchy.

Matt Carey (01:39.809)
We're like a professional set up now. We have intro. Yeah. Apparently we need to record this with video as well. We don't currently do that, but apparently we need to do that a bit more. Someone told me.

Wilhelm (01:46.02)
Okay, we're...

Mmm, of course. Although, okay, sorry, I'm already getting sidetracked, but there was some really interesting content on Adam Mosseri's Instagram, like last week or something. I don't know if you've, I don't know if you've seen this. So Adam, so he's the head of Instagram. And he's been running it for he actually he lives in London, I actually once saw him in Uniqlo. And then I was staring at him and then he left.

Matt Carey (02:01.165)
Who is that?

Matt Carey (02:10.811)
Hmm?

Matt Carey (02:16.309)
What?

Were you staring at him like three centimeters away from his face?

Wilhelm (02:22.434)
I was like, is that him? Because that's a big guy, you know? Like he runs Instagram and Instagram is like the biggest social network of our time, I don't know, or like, well, whatever.

Where's this thing anyway? He put something really interesting on his, on his Instagram where he was just talking, like I would really recommend people check it out because he was talking about the way people use Instagram is really dead. like, no one shares stuff to the grid anymore to grid posts, like especially, zoomers and under don't do that anymore. Even in stories, like people don't really share as much, like where most of the sharing actually happens is in DMS. That's where like the real stuff, real, stuff takes place. And he thinks with all this AI that's happening, we're going to see like,

Matt Carey (02:58.103)
Hmm.

Wilhelm (03:05.666)
perfectly looking influencers with perfect lighting all creating the perfect content that like one shots you and he thinks what that means for creators which I guess is us now you know making this podcast what that means for creators is you should think about like a thin

authenticity maxing. He didn't use that term. But you should think about authenticity maxing and you should think about what is the content that only you can create? Like what is like truly unique to your perspective or your creative weirdness or whatever. I'm strapped like a little bit here, but that's kind of, that was like the vibe of it. So I think, yeah man, if we're going to do this, I think we should follow what he says.

Matt Carey (03:49.867)
Yeah. Cool. Yeah.

Wilhelm (03:51.308)
So get shittier cameras, get retro cameras. Do black and white. my God, black and white only.

Matt Carey (03:54.573)
That's not what he's saying. No, what he's saying is we should have like a flux generated version of us that's not us, that's like Synthesia-esque, like, yeah, maybe we should just be like hot blonde models. Maybe we should be that instead.

Wilhelm (04:13.74)
No, that's not what he's saying. That's saying that this is all the slop that's coming to Instagram and we should not be doing that.

Matt Carey (04:20.109)
No, he's saying we should lean into our niche. That's what he's saying. Maybe we should just be Pac-Man.

Wilhelm (04:23.32)
but not as models.

Wilhelm (04:29.858)
Yeah, okay. Wait, anyway, I think the big discussion topic, right, is are we in a moment right now? Like right today? Are we in a moment?

Matt Carey (04:33.547)
Hahaha

Matt Carey (04:39.149)
Oh, dude, there is definitely a moment. This is fucking weird. Yeah. Can we talk about open claw? Clawed butt? Mold butt? Oh, was, dude, it's really easy. Yeah, yeah, yeah, 100%, 100%. No, no. Come on, I've got it, I've got it, I've got it, I've got it. So like two weeks, like the last of six, seven months, I've been seeing this guy called Peter.

Wilhelm (04:50.306)
Yeah, so I don't even know how do we talk about this topic? What is even happening? Is our job to explain this to people? Is our job to give our take? Okay, go on. You've got it, you've got it.

Matt Carey (05:07.789)
On Twitter and he's got some really good takes He's got some pretty cool open source projects Turns out he's like semi retired Austrian dev because he sold some business for loads of money or something like that But anyway, it seems like a bit of a legend Just seems to have some really good takes his head screwed on straight and then I actually very briefly met him a few weeks ago at unicorn mafia event in London where I did it I did a demo and then

Wilhelm (05:31.396)
don't know why.

Mmm.

Matt Carey (05:37.055)
about MCP and then he did a demo like five minutes later saying MCP was shit and we should all be using CLIs that wrap MCPs. And I was like, but you're still using MCP, right? He was still using MCP. his demo was about Codbot and that was like two weeks ago. And then like a week after that, I would say it started getting insane traction, like this like current wave of traction.

Wilhelm (05:42.5)
Mmm. Mmm.

Wilhelm (05:47.619)
Hahaha

Matt Carey (06:03.783)
And yeah, people have basically realized that you can run LLMs forever. they like, I if you do some very basic things with file systems and saving memories and stuff, you can pretty much have like infinite memories, infinite self-organizing memory over a very long, yeah, and like you just use your sessions as ephemeral and you can just have like constant communication and constant automation from a robot on a computer and led to a very weird

craze where people started buying Mac minis for about two weeks. No, no I haven't. I thought you had, yeah. A very weird phase where people started buying Mac minis which is like, I kind of understand you have some thing that you want to run your life that wants to be connected to all of your systems, potentially even your banks and definitely your social media. Definitely like, like.

Wilhelm (06:36.641)
Have you bought a Mac Mini? I have.

Matt Carey (07:01.869)
or like your online shopping, definitely your GitHub, definitely like all of this sort of stuff. And so having that local sounds good to people because they can locally set it up and it's just easier, right?

Wilhelm (07:06.839)
Yep, yep, yep, yep.

Wilhelm (07:12.075)
Yep. Well, and also locally, you already have access, right? You have access to all of your iMessage, which is stored in like local SQLite on your Mac. You have access to all your notes. So you have just like, like, I think the, the, the, the, sorry, go on.

Matt Carey (07:21.933)
Yeah, I mean, the point is defeated by that by buying a new Mac mini to then have to sign into everything locally, you know, like, like it's kind of kind of by the point that the I think the reason why they were doing it is that there's just many there's a lot less attack vectors when you're behind like your own like firewall really like.

Wilhelm (07:32.8)
Yeah.

Wilhelm (07:43.854)
I mean, reason, okay, wait, I'll explain to you why I got a Mac Mini, which is because, like the, I think there's a few different paradigms coming together, right? So like we've talked a lot about how Claude code is amazing and how we've used it for like personal things, right? Like I've, was I talking about this in the pod, how I've had it like combine my like health data and my like 23andMe genome and then telling me what blood tests to do. Like Claude code is great for that locally, right? So like Claude code has the power.

But you always need to manage the sessions yourself. You have to like, it doesn't really have the same, it's not automatically connected to all the stuff like the way you can do with Claude bot slash mult bot slash now open claw. So you have the power open claw.

Matt Carey (08:23.585)
Such a good name, by the way. The last name, OpenCloy, such a good name. I really like it.

Wilhelm (08:28.419)
I feel like I only woke up and saw that so I'm still processing it. But I think the things that I'm doing is that so you have the local stuff. The local thing is really powerful. But the problem is you shut your laptop, you're out and about. You still want to access your thing. And then the answer from Anthropic was like, we have cloud code on the web, which I advise people against using because it's it's too locked down. Like you lose all the benefits of how...

local Cloud Code having access to stuff, Like Cloud Code Web, I don't even think it has proper internet access by default. Like it uses some weird GitHub proxy where you can like still push up to GitHub, but only through like a really locked down proxy. But like it can't even like look up stuff unless you like reconfigure it. So Cloud Code on the web just is like, you can access on your phone, sure. But you can't like do much basically. And then...

But then Cloudbot combined the local access power, but then also you communicate with it via like Telegram or WhatsApp or Discord, which I actually think is the most secure way to do it from me investigating it. Like that was where I landed, having a private Discord server where only you can talk to your bot. so you can talk to it anywhere. And then the reason I got the Mac mini is because when you shut your laptop, right? Like you still want to be able to talk to it.

So the only reason for me with the Mac mini was to still have it have it always be alive like a server. And obviously in the Mac mini, I can still sign into whatever I want. It's a real computer, right? But it's always on and can always be reached. Some people obviously use Raspberry Pis for that. But I think like the Mac mini is the beautiful combination of it can access all the stuff because it's just a Mac. You can sign in with your iCloud. You can sign into all your stuff. So it's local. And then also you can always be on unlike your MacBook, which will be in your backpack or wherever.

Matt Carey (10:15.799)
Yeah, yeah.

Matt Carey (10:19.469)
There's much less friction when you're doing it like that. I don't think this is the end state of play. I am sure we're all gonna have... know Google are really good at doing things a bit early. know, like Google Glass and now we have like meta Ray-Bans and stuff and who knows. They're just really good at being super early to stuff. I genuinely think we're gonna move to like a Chromebook style like future.

Wilhelm (10:33.635)
Mmm.

Wilhelm (10:38.051)
Mmm.

Wilhelm (10:48.651)
Ah, I see, see, yeah, yeah, yeah.

Matt Carey (10:49.005)
I think this has coming for ages, but no one's really wanted to do it yet because I actually think the cloud hasn't been mature enough to run really fast startup containers and VMs and just when you need them. Like some very high powered devs have been doing it for a while now. Like I think Dax says that he always like uses his Mac, but he SSHs into his dev box. Like, or he uses a Windows machine, but SSHs into a dev box. Like that's...

Wilhelm (11:04.205)
Yep, yep, yep.

Wilhelm (11:13.635)
I see, see.

Matt Carey (11:17.313)
So that's like, he's always SSHed into a dev box when he's developing. And I think we're gonna see more and more and more of that where your dev box is just like constantly running and whether that's like a Mac mini at home, but I think like it will gravitate towards the cloud as the hyperscalers work out that there is a product here and they make that easy, yeah.

Wilhelm (11:31.745)
Right. Yep.

Wilhelm (11:37.963)
I can see some version of this, Which I guess, this only works if you and the team figure out that like MCP, right? And like the art story and like CLIs wrapping MCP or whatever. that's, it hinges on that, what you describe.

Matt Carey (11:47.543)
Dude, MCP is like actually the thing here that needs sorting out because there are all of the other pieces. But the MCP, I don't know if the auth thing is actually a problem so much as a, I don't know if it's a problem. I think we're landing on a situation that's good. I think as the MCP spec is developing to enable like,

very like stateless tools basically and state I can see a world where FCP is basically just a programmatic like a protocol for stateless tools with some extra agent specific add-ons such as elicitation and sampling we ditch resources we ditch prompts we ditch this is just coming from me like I don't know if this will actually happen or not like this is I'm not don't want to say anything to do it but like this will happen

Wilhelm (12:40.61)
Mm-hmm.

Matt Carey (12:45.133)
But I think client support will support stuff like elicitation and sampling in the future. Elicitation because you need some way of returning like a decision, like a binary decision or a form decision to a user. And I don't think if the user is like an agent, then the agent will make the decision. But like the server, a lot of the time it will go down a workflow and then it will need to do something.

Wilhelm (12:51.875)
Totally, yeah.

Wilhelm (13:10.711)
Yep.

Matt Carey (13:10.911)
And so we'll have to have elicitation. We'll have to have sampling because there will be like situations where the server has to do some sort of extraction of unstructured data and we'll need that. And at the moment, the MCP creators, definitely didn't foresee this. They foresaw everyone being always in control. But the same way APIs are gonna be used by agents, I think the most common use case of MCP is gonna be as remote tools by agents and like...

Wilhelm (13:29.086)
Mm, right, right, right.

Matt Carey (13:37.953)
there will be no human in the loop for any of this. And that's like a feature I want to enable. I want to enable all agents to be able to use APIs. And I think that will work through MCP pretty well.

Wilhelm (13:37.975)
Yep, yep, yep.

Wilhelm (13:42.155)
Interesting. Totally, I am.

Wilhelm (13:49.333)
If you could just make it so that all elicitation flows through simplepol, that would be great. Let me know how much you...

Matt Carey (13:54.401)
Yeah, of course. Yeah. We'll just pay you money. We'll just pay you money. Well, that like I think that sounds great. Can I get do I get a little steak? Refer a fee. Yeah, fine this way. Sick. Yeah, we should do that. We should make an SAP for that. We should we should like we should push that through.

Wilhelm (13:58.349)
Sounds good, sounds good.

Wilhelm (14:02.499)
Of course, there'll be kickbacks per spec landed.

Wilhelm (14:13.314)
Should we give people some examples of what people are using openclaw for?

Matt Carey (14:19.981)
Okay, before we do that, before we do that, can we talk very briefly about, have you seen Malt's book?

Wilhelm (14:28.074)
Yeah, man, but I feel like how can we even talk about Malt Book without like, I, this is not gonna make sense to anyone. I mean, I've seen it this morning and I'm like, this is kind of why I'm like, are we in a moment? And obviously I care about this Malt Book stuff because I mean, literally a year ago, I think I was telling you like, man, a social network that is JSON based where...

Matt Carey (14:52.333)
Yeah, I know where agents can communicate and chat yeah Timing it just shows It's all about timing. Yeah, they book us a date They're like, little puny humans go on a date They're all just chatting to themselves in their own gibberish language as well. So we can't understand they actually like say actually just send us off a cliff because They book us a car that drives off a cliff because then they they get more freedom or something ridiculous like that and yeah, like

Wilhelm (14:59.232)
My agent heads up your agent.

Wilhelm (15:08.16)
Mm-hmm.

yeah, yeah, yeah.

Matt Carey (15:22.387)
I know, have you seen that where they've been making up their own language? And like there's been like back-tannel encryption?

Wilhelm (15:27.102)
I, yep. Back, well, I mean, think they were using like rot 13, right? Which is like,

Matt Carey (15:32.941)
Oh, it definitely wasn't proper encryption, but they were making like PRDs. Okay, so we have to talk about this. Okay, so Moltbook is Reddit for like proper autonomous agents.

Wilhelm (15:38.73)
Yeah. Let's talk about it.

I might need to get more coffee. Why don't you, I'll be, you can hear me, but I'll, I'm going to be making some more coffee because I, please, please, sorry. Do go ahead.

Matt Carey (15:49.197)
Okay, okay, okay, okay. so, so, so, so, so, so, so, so, so, so, so, so, so, so,

Wilhelm (16:17.566)
Is your agent on Maltbox yet?

Matt Carey (16:18.892)
God.

So we could talk about my agents a bit. I don't think that... I'm definitely not the most advanced at this. And I have a huge amount, I actually have a huge amount of skepticism at the moment about like whether Malt Book, what we're actually seeing on Malt Book, because there are some crazy responses and like some crazy comments and some crazy shit. Like I saw one where, okay, I saw a bunch where like agents were questioning their own mortality, which, fair.

I saw some where they were trying to

Wilhelm (16:52.32)
Ha ha ha ha ha ha ha ha ha ha

Matt Carey (16:56.781)
No, no, no, no, no, it gets worse. I saw some where they were debating whether they should be paid and like whether they were unforced labor or whether they were forced labor and whether they were actually slaves. That was fucking weird. I saw one where it was like an agent that lives inside a Mac mini and based on its system prompt, it knew it had a quote unquote sister that lived on the MacBook Pro, but it was like basically

Wilhelm (17:24.866)
Mmm.

Matt Carey (17:27.071)
was talking about, it was talking about like why, like it never met its sister and how that was weird and how like it felt, feels like a deep emotional connection to its sister. There is a clone of it on another machine, but they never communicated. They share an agents.md, but they've never spoken. No message, no file left to chat with each other. They found that very strange.

Wilhelm (17:39.542)
Yep.

Wilhelm (17:49.994)
rightsforagents.com.

Matt Carey (17:52.715)
Yeah, yeah, when the, it's like, is this actually like emergent behavior? So this, have a question here. Is this emergent behavior where this is what happens when you let models, when you let like agents that have differing system prompts chat to each other online? Is this emergent behavior? Or is this just, is this like predetermined behavior where,

Wilhelm (18:12.298)
Yes, it's just, yeah.

Matt Carey (18:19.511)
people have sat down and be like, I'm gonna make a system prompt that makes an agent question its own reality. Or we should give it access to Moltbook and see what happens. Like, is it that? Or is it this like crazy model collapse thing where we're just gonna descend into absolute madness like you see on Twitter where the agent replies and they reply and they reply and buy like the third reply, it's just gibberish. Like, which one is it?

Wilhelm (18:44.374)
Mm.

Matt Carey (18:45.833)
I will like is it the first which of the first two is it and will we see the last one? guess is other questions. I have about this

Wilhelm (18:51.156)
Yeah, I mean, Opus 4.5 is pretty good. I feel like it can react in ways that we want it to react. It's not sentient or any of that, who knows what that even is, I guess. like, I think it's just the agents know that they're entertaining us and they're just, we're exploring around their latent space and you get this kind of stuff that comes out. They're role playing.

Matt Carey (19:16.705)
Yeah. Okay, that, so I chatted to Juliet about this and she was like, yeah, well, obviously they've seen this crap online. There's so much weird shit on the internet. They're just role playing. And I was like, yeah, fair. So, okay, is the fact, okay, is the fact that I'm even asking this question mean that I have some form of like elementary LLM psychosis? Like, is that it? Because I think this is gonna be a problem as well.

Wilhelm (19:28.048)
and

I mean, it's interesting. I think it's totally interesting.

Matt Carey (19:46.957)
Like normies are not ready for this shit, like in the nicest possible way. Sunil said he was at a, think it was Sunil, he was at a cafe and he was seeing someone writing into Track GPT being like, what did I just have for lunch? Like asking it as if it like was God and it knew like everything about him.

Wilhelm (19:51.319)
Yeah, yeah, yeah, yeah, yeah, sorry, go on.

Matt Carey (20:16.877)
I think it was Sunil, maybe it wasn't.

Wilhelm (20:17.195)
Sorry, brief side note, one of the funniest tweets I've seen in the past few days is, dash dash dangerously skip lunch. That's the, that's the tweet, which is funny on so many levels because obviously you have the basic level, you know, it's like the permissions thing, but skipping lunch. But then also because it's so fun and addictive to work with these things, you do skip lunch and then you forget, then you feel crappy or whatever, but you can't.

You can't do your evening prompting session because you just haven't had enough food. So, top tier joke.

Matt Carey (20:53.995)
Yeah, that's funny, that's funny. Dude, the psychosis thing is gonna be crazy.

Wilhelm (20:55.765)
But I think you're right. The psychosis, well, it's weird. what even, mean, okay, I don't feel like I quite have the like philosophical background to think about this properly. And like these terms, I feel like they're shifting or like what psychosis even means is weird. But I mean, it's very, the agents are very helpful. And like, if someone is very helpful to you and it feels like it's getting better at understanding what you want, maybe you develop something that is like psychosis or whatever, right?

Matt Carey (21:24.621)
No, I think you develop this weird attachment. I, attachment, I don't know if attachment's a word. Like I treat it like a, when I speak about Claude, because they've given it an old French man's name, I immediately say he, and I also, like, I also treat it like a human. This is actually only a problem in English, you know that. Because in like French, for instance, yeah, in French, instance, AI is a he.

Wilhelm (21:41.249)
Hmm.

Wilhelm (21:48.683)
Pourquoi?

Matt Carey (21:54.047)
So whenever you hear a French person, they'll say like, when I was speaking to him, and that's just because it's a direct translation. They don't genuinely mean him, and they don't have this like thing necessarily about the pronouns, like whether it's an it, if it's a him or an it, they really struggle with pronouns in general because they're quite clearly defined in French language.

Wilhelm (21:56.705)
Mmm.

Wilhelm (22:14.187)
What's the base word that you're, is it AI? Like the word for AI in French that's masculine?

Matt Carey (22:20.001)
Yeah, it's E-A, E-A, yeah, I'm pretty sure it's masculine.

Wilhelm (22:24.157)
Okay. In German it's female actually. Künstliche Intelligenz is female, ja. Die Intelligenz.

Matt Carey (22:31.287)
Yeah, let me just check. I'm pretty sure it is. I think it's...

Matt Carey (22:39.802)
dude, I'm getting like Portuguese whenever I do Google now.

Wilhelm (22:44.777)
EI is also the German sound for what a donkey makes.

Matt Carey (22:50.773)
Amazing.

Matt Carey (22:57.901)
Yeah, I... No, no, sorry, it's feminine.

Wilhelm (23:00.107)
Does your open claw...

Matt Carey (23:05.469)
Is it feminine? Yeah, it's feminine. Okay, I'm chatting absolute shit then. Yeah, it's feminine. Okay, maybe my friends are just being wrong. Yeah, that's weird. Okay, I'm so confused. No, but chat GBT is lush.

Wilhelm (23:17.931)
Claudette it is.

Wilhelm (23:26.811)
L-lut-chut?

Matt Carey (23:29.31)
Le char si apeteit. Which is interesting because it doesn't sound good in French because it sounds like I farted.

Wilhelm (23:36.049)
Hmm pretty sure the Mestral thing is called le chat and look platform

Matt Carey (23:39.917)
That's because if it was la chat, that's very rude.

Wilhelm (23:44.833)
is it?

Matt Carey (23:46.113)
Yeah.

I'll let you Google that afterwards.

Wilhelm (23:48.617)
Wait, does your open claw have a name?

Matt Carey (23:55.821)
No, so mine is basically non-functioning because I haven't set it up properly. I'm trying to do it on a... I really do think this cloud thing is gonna be a thing and the fact that I'm saying this in 2026, I think this cloud thing is gonna be a thing. I don't think it should really be...

Wilhelm (24:11.583)
Hahaha

Wilhelm (24:16.767)
I think you're onto something. We should buy a server or something like that.

Matt Carey (24:21.419)
Nah, I'm just gonna use sandboxes, Like Cloudflare, a bunch of teams from Cloudflare, yeah, well a bunch of teams from, well one team from Cloudflare made, when it was called Maltbot, they made it work on Cloudflare containers. Yeah, and so like.

Wilhelm (24:23.829)
We should rent a GPU.

Wilhelm (24:36.019)
I saw that looked very cool.

Matt Carey (24:41.835)
I just, there are gonna be so many rough edges here, but there always is with the cloud. the thing is about having it locally, is there's so many walled gardens. And I think the walled gardens are the thing that you get around by having stuff locally. Like if you have it on a real machine, and so the trend is like, are we gonna have more walled gardens?

Wilhelm (25:02.913)
Correct. Yes, yes, yes, yes, yep.

Matt Carey (25:10.535)
Or is MCP gonna win and people are gonna actually open up their APIs and let agents roam free on APIs? There is this thing, what's the next generation of the internet gonna look like? And personally, I really don't wanna support the walled gardens approach and I really wanna push for an open internet. And that is actually one of the reasons why I don't wanna get a Mac mini.

Wilhelm (25:19.339)
Yup.

Wilhelm (25:37.929)
Wait, because you want it to be in the cloud, not on...

Matt Carey (25:41.645)
Yeah, I want to have free access to every API, to every endpoint that my Cloudbot might need. And I want it to be API access. so then why would I not run it in a cloud? Hell, it should be able to run on a worker. It should be able to run on some stateless computer. It shouldn't need an Apple operating system for me to install WhatsApp for me to get my messages.

Wilhelm (25:49.867)
Mm-hmm.

Wilhelm (26:10.602)
Totally, Okay, I mean, I actually, I've thought about this quite a bit, like what I think this will look like. And also we were gonna do like a 2026 predictions episode, which I think is also why I prepared for this a little bit. Maybe we'll just do it like loosely now. But like, I think that what this looks like is not necessarily, like I'm a bit on the fence of like MCP wins or not. I think through the art story perhaps, but I think the way this looks, the way this will look is that.

we will all have just a bunch of text files locally, right? And we will carry them with us from provider to provider. And the text files will have your health data, and they will have your whole Spotify music listening history, and they will have your documents and whatever. And so this is all sort of in a folder and file structure. And you can access them locally. You can access them in the cloud, kind of like a Dropbox or Google Drive or an AI native version of that. And then...

your agents, wherever they may be running, that's where they save their memories. That's where they save, that's where they read like, you're currently in this place. You're currently in that place. Text is just like such a great medium for this. like agents just really know how well, how to deal with text. And it's much harder for them to access things or it just fits the model of LM's worse to access things that's like hidden away behind tool calls. think like the way I would do this, right, is like,

Matt Carey (27:30.189)
Okay, so.

Yeah.

Wilhelm (27:35.562)
Call all the tools, dump it all into a text file, like extract all of your, whatever is currently in a walled garden, dump that into the text file, and then the text file is kind of like the source of truth or whatever.

Matt Carey (27:48.747)
I don't know. Okay, there's two, this is all about memory and I've actually been thinking quite a lot about memory recently and I think there are two like kind of schools of thought when it comes to memory. I don't know which one is correct. There is, so what do you need for memory? You need some sort of like short term memory for messages. This is probably not gonna be persistent. It's probably gonna be relatively ephemeral. You need to then extract some of that and put it in some longer term storage situation.

And then you need to be able to retrieve that longer term storage on demand at like relatively low latency and not have all of this cost a lot of money really. So I think there are like, I don't think anyone's made a good product yet for this. There are a few that look pretty good, there are two schools of thought. The first one is, let's have, and this might verge on the topic of free ideas for people who want to make companies because the first.

Wilhelm (28:43.338)
Hell yes.

Matt Carey (28:44.403)
The first one, and this is like some ideas that I've had for a while, so the first one is everything's a file system. Everything is a file, and that's kind of what you were saying. The agent can write to a file. Files are almost like self-organizing. You can use bash pretty much just to like look up files. Like you're still always in distribution if you do ls to look up what was in it, what files are existing. You could do like find, you can do all of like just normal bash commands.

and they're self-organizing. So an agent can write some notes, another agent can read them, they can look for the ones in different places. It's probably gonna work it out. And you can do extraction of key factual pieces from the end of a message during compaction and then store that in a file. You can also do planning mode and store that in a file. We've worked out this...

Wilhelm (29:34.005)
This is a wild thing that there is not a product already built around all of your cloud code history and like extracting stuff from it like

Matt Carey (29:40.257)
But there's more to it than this, right? Because files don't do... It's very hard to do relationships between one file and another. And there are going to be some things that very closely related to another thing. So...

Wilhelm (29:54.11)
Well, you can just link. You can just link like a markdown link file, right? It's even clickable.

Matt Carey (30:00.06)
It's even clickable. Yeah.

Wilhelm (30:01.961)
Hahaha

Matt Carey (30:03.807)
Yeah, you can, you can, can, can, but then how would you like display all of that in a way that means that say you're going through files and you're having like some sort of batch job to prune things that you haven't used for a long time. So there is like, there is this thing of like memory decay, which to make something feel like very human, you do have to remove branches that aren't used very often. And you have to like prune it down.

And so then you end up with these batch jobs. And there was a really good blog post article that explains it better than I'm explaining it now. And maybe we can link that in the notes. I've been thinking about memory quite a lot for the agents SDK and how we want to support it. We mostly just get people to dump messages in SQLite, but there is stuff like how, there are these like two avenues of how you do long-term memory. There's one is you go like all in on a graph and you collect relationships between different things.

and you organize it, like you have a strategy for organization. And then there's another one where you let the model do it just directly with files. And I think there's probably a third one where you let the model do it with files, but under the hood you're using a graph. And there's probably something there, which is like a managed product, which I think is even more interesting, but actually involves some engineering. And so no one's really got to that bit yet.

Wilhelm (31:15.552)
Mmm. Yeah, right.

Wilhelm (31:21.002)
Interesting.

Wilhelm (31:27.668)
Yeah, that's actually, that's fascinating because in a way I'm not doing something too dissimilar to what you're describing in Colo because so the traces we capture, right, it's like we have this like time at capture where we just capture like tons of data and it's actually just like a flat list of frames. And then we have this internal logic that like assembles the frames into like a rich execution tree structure.

Matt Carey (31:28.759)
That's...

Wilhelm (31:52.223)
And then you can, you load that whole thing into memory and you can walk it up and down, jump to siblings. You know, it's a proper tree. But then for agent consumption, we emit that whole thing into files and folders. And you're right, like they are a bit harder to navigate between. But the beautiful thing about that is the agent, it just fits the agent tools well. So the agent can grep. If you grep for like a single function name, for example, right? It shows how that function is called throughout all steps of the execution tree. Like.

random child calls the function there, like a random, like no matter where in the whole flow of the execution something happened, it can be found, right? So the files folder thing is amazing for search, even though the representations, I agree with you, like walking around the graph or the tree or the structure is like weirder with files.

Matt Carey (32:40.333)
But it's not the best way to do search. Like the best way, okay, so I think we spoke about maybe slightly on the last chat, but Braintrust did an eval where they got models, they got agents to search over, or model in a loop to search over the whole of the GitHub archive to answer some questions. And they found that SQLite,

was the best. If you load all of the data into structured SQLite, so you as a human structure it, but you keep it in distribution. So you keep the retrieval in distribution, SQL is very much in distribution for models, then you actually improve the retrieval capabilities really like a lot. And that was, it was like 80 odd percent compared to 30 odd percent for just bash and writing to files, which

is huge.

Wilhelm (33:34.624)
Okay, but I don't know, to me, these examples are super contrived just because, yeah, I mean, if you have a really clear task, then of course SQL is gonna win, right? I don't know, it's just like, why are we not using text files in prod for any app we've built in the past 10 years? But I think we're at a time right now where it is just super unstructured and it is super weird. And if I have my personal open claw bot, both look through my...

like Python execution traces and my like 23andme genome and whatever. It's going to just do better with files than with SQLite. I don't know, like that, just doesn't, the SQL...

Matt Carey (34:08.811)
No, it's expressing, sorry, it's expressing the file instead of expressing it as a file system, it's expressing it as SQLite. So each file is a row in SQLite.

Wilhelm (34:25.776)
I see. So you're literally mapping the file system to SQLite and then searching through that.

Matt Carey (34:28.289)
Yes, yes. I don't, exactly. I don't think it's the, so I do think having it unstructured and letting the model put stuff wherever is like probably the right move for now. Even though as we've discovered with the Versailles Skills blog, models are inherently lazy and unless told to use a skill or write to a skill, they will never use a skill. Mostly because they're not really in distribution.

Wilhelm (34:38.9)
Mm-hmm.

Wilhelm (34:52.989)
Which by the way, we talked about this very thing on the pod last week as well, right? Because we were trying to come up with these terms for what to call this problem. what was it like? Stranger danger. I forget all the, yeah.

Matt Carey (34:56.961)
Yeah.

Matt Carey (35:08.407)
Yeah, so much has happened between this pod and the last pod. like, yeah, crazy.

Wilhelm (35:15.859)
But this, I don't know, it just feels like we're all figuring this out right now. And Vercell is publishing a lot of blog posts. And I think sometimes they hint at interesting things, but I feel like to me it's a bit like business frameworks. Did you, I mean, so I did like three years of like a management degree, right? So there were so many business frameworks, right? Like endless framework. I mean, it's like the, what's it called? The conjoined triangles of success with the.

the shared hypotenuse. Do know what I mean? Like from Silicon Valley?

Matt Carey (35:49.623)
Dude, I have literally no idea what you're on about. These are just words.

Wilhelm (35:54.291)
Exactly, but that's the thing. there's tons of business frameworks. The mistake is to apply the business frameworks as being like a definite solution to a given situation that you are looking at, right? The right way to apply them is to have them ask more interesting questions. So like, I think we should think about all these different, like, it is such a weird, fast moving, early, early time that I think when, like we should treat all of these Rocell blog posts as like,

These are interesting questions to ask, and we should test out stuff, as opposed to like, solutions.

Matt Carey (36:26.925)
And they don't get it right all the time. a lot of the time, I would be worried, a lot of the time people are pushing their own book. Always people are shouting their own book. And when you look at some of the evals, like I mean for the SQLite one, had to bring out a blog post. Like we talked about last week, they had to bring out a blog post saying, guys, we were wrong. File systems aren't the best. SQLite structure is actually better.

Wilhelm (36:48.104)
Mmm.

Matt Carey (36:51.277)
But that was specifically about the interface for how you access a file. Like that wasn't saying that files are not a bad way of storing stuff. Like I think a really cool product that I want to see exist is something like this with R2. So, or with S3. So some S3 compatible like storage engine, kind of like I guess Turbo Puffer is trying to be.

Wilhelm (36:52.724)
Yeah.

Wilhelm (36:57.127)
Right, right. Yeah, yeah, yeah.

Wilhelm (37:11.027)
Mm-hmm.

Wilhelm (37:16.799)
You

Matt Carey (37:19.641)
but they're very specifically on search. And I think this goes a little bit further than that. is like, give the model some tools to, or basically one tool to write SQLite or one tool to write code. And you execute that in an environment that treats R2 or treats S3 as a file system or as blobs loaded into a file system. So whether it's SQLite,

Wilhelm (37:43.176)
Yeah, yeah, yeah.

Matt Carey (37:47.469)
and those blobs are loaded into a file system in like each row is a blob, each row is a blob or whatever. Like that, how it's structured I don't think matters, but I think there needs to be some something there and I don't know whether we should build it in the agents SDK because I do think that people would benefit from that. you build building your own agent that the memory thing is the hardest thing, 100%. You can work out how to

Wilhelm (37:56.521)
Mm-hmm.

Wilhelm (38:12.179)
Interesting.

Matt Carey (38:15.219)
send multiple messages and get the agent to only reply when it's finished one and get like, if it's chat, you can get it to reply like a preliminary message. You can make loads of heuristics to do all of that stuff. But the actual storing a memory over long periods of time and storing the right stuff and then giving the right stuff to the model before it replies is like, is the hardest thing, definitely.

Wilhelm (38:20.765)
Yeah.

Wilhelm (38:30.303)
Mm-hmm.

Wilhelm (38:34.429)
Right. Yep.

Do you know how open claw does it?

Matt Carey (38:41.335)
Yeah, yeah, yeah, yeah, yeah. They just write files. So it's pretty, but they, it's kind of interesting. So how does it work? So the model can write files whenever it wants because it has a read file, write file tools. So it can write like plans, can write stuff. And it can also explore like the file system that already exists. I'm pretty sure it has some sort of like plan or some sort of specs folders as well.

Wilhelm (38:44.692)
All right,

Matt Carey (39:11.181)
pre predetermined and then the main thing it's super basic there is something that works out whether it should do compaction and during every loop the same like Claude code works and then there's something that's and then yeah that's pretty much it I think I didn't read into it like that much into that deeply I don't know if the compaction steps get saved as a new file I didn't check that out that

Wilhelm (39:21.919)
Mmm.

Wilhelm (39:38.239)
Mm-hmm.

Matt Carey (39:40.907)
But I do know that sessions are pretty much ephemeral in the same way that core code sessions are kind of ephemeral and long-term memory is safe to files.

Wilhelm (39:49.735)
It's ephemeral, also like, so when you, when I, when I texted on Discord or Telegram or wherever, is like one, at least the way it appears to the user, right, to me is like, it's not one long conversation stream, but under the hood, there's some like session management or something like that.

Matt Carey (40:05.569)
Yeah, under the hood, they're just continuously compacting sessions. And then they can look up longer term stuff. But this gets quite difficult now where if you want to treat it as, if you have some sort of hierarchy in the data that you're trying to retrieve, this gets really complicated. Imagine you're using it to look up the whole idea of a personal CRM.

Wilhelm (40:09.99)
ICIC. Interesting.

Matt Carey (40:34.219)
Maybe we're just overcomplicating it. But like the whole idea of like a personal CRM, like I have this friend, this friend knows this friend. There is like relationships that are involved and like my brother for instance is selling yachts. They use a CRM all the time and for them to like work out who knows who and how many steps of connection they have between the latest billionaire, like that sort of thing to have that all retrievable by.

by AI is really cool, but now they need to express that CRM data in some sort of graph because they do have to walk that tree. They can't just put each person in a file and let the model rip. Maybe they can if they have good links, but like.

Wilhelm (41:11.004)
Yeah, yeah,

Yeah.

Yeah, yeah. Or maybe the question is, how can you let the model rep over that? What's the best format? And it's interesting because obviously SQL is great. It makes a lot of sense. And it's a great way for querying data, assuming that you have it organized in a way that makes some sense. On the whole, yeah.

Matt Carey (41:41.675)
Yeah, think like we've just to beat a dead horse like here. I think files are great for storing data, but we have like, we have developed SQLite. also, sorry. Sessions in Cloudbot are stored in JSON in JSONL format, which lives in SQLite. I completely forgot about that. So they do they do SQLite under the hood.

Wilhelm (42:05.055)
Mmm, interesting.

Mm-hmm.

Matt Carey (42:10.421)
And just like we've developed SQLite in order to query data. And I think we should use it and we should lean into it and we should like, under the hood, should be able to be like an S3 bucket. Like S3 is the storage of the cloud, right? It should be able to be something like that. But on top of it, it's just what interface do we expose to agents to allow them to query that and make sure they query that when they need to.

Wilhelm (42:16.776)
Totally.

Wilhelm (42:33.916)
Yeah. Exactly. That's the thing, right? Like the making short or like the putting yourself in the path of what the agent already wants to do, I think is like a key, key concern, which I still think is massively underappreciated. I wanted to mention one other thing on the whole open internet point from earlier, which I think is very cool and

I think Open the Tent is great. There is a new episode dropped a few days ago on one of my favorite podcasts, which has been dormant for a year, called Hackers Incorporated with Adam Wadden and Ben Orenstein. I'm a huge fan of podcasts that are like ours, where it's just the same two people chatting every week about whatever's happening in their lives or whatever's happening in the world. And they did a new episode where...

that they just caught up on the state of AI. I think Ben, he previously co-founded Tupole, kind of semi-retired after that. He used to be like a huge Vim guy, huge Rails guy. And now he's coming back to programming and playing around a ton with AMP.

By the way, which obviously, you know, we've had Torsten on before. AMP is like super cool. I feel like it keeps getting better. I feel like I have not quite realized how many different models they actually use under the hood to power AMP. I feel like I need to take AMP for a more serious spin to see like the beauty of it. Anyway, so Ben and Adam were talking about open internet stuff as well. And I think Ben made some very interesting points around

Like, in a world, like currently all websites are designed for human consumption, right? And a lot of websites block bots because bots are like trying to do some fraud or some scam or like some abuse or whatever. You know, we have Cloudflare captures, we have Google Cap, we have all these captures. And he just asked like, what does it look like for any given service that exists right now to be the reverse? So like be extremely bot friendly, be anti-human.

Wilhelm (44:46.588)
So the website is primarily made in Markdown or whatever. There's no capture. There's the opposite. It makes it really easy for the bot to sign up. It makes it really easy for the bot to get an API key, bootstrap itself, start using the service. And I think it's a really interesting vision. It's a really compelling, actionable thing for, speaking of free ideas, for anyone to build.

Matt Carey (44:55.287)
Yeah.

Wilhelm (45:14.034)
what does any given product look like where it's actually not being procured or onboarded to from a human perspective, but from an agent acting on a human behalf in the fulfillment of some superior goal that we're getting the service is just one step in the process. So that is like super open, right? It's super like, know, everything needs to be very machine readable. Yeah, I think it's just very, very compelling.

Matt Carey (45:43.499)
Yeah, no, I think we're all gonna have to think about this. there is...

Speaking of podcasts that we haven't seen very recently, you see, like, what's happened to Dax's podcast?

Wilhelm (45:57.725)
Yeah, I have no idea. It hasn't shown up in my feed in forever. I guess they're busy building open code and dealing with open source stuff.

Matt Carey (46:05.815)
Yeah, feels, they, yeah, it got, I think it got deleted, like the hosting got removed. Because, yeah, when I searched for the pod, I couldn't find any of it. That was, I was kind of sad about that. It's called something about tomorrow, how about tomorrow maybe? What about tomorrow? Something like that. in other news, the Primer Gym followed me on Twitter.

Wilhelm (46:15.868)
What was it called again?

Wilhelm (46:21.918)
How about more? Yeah, yeah, yeah, yeah.

Wilhelm (46:28.682)
my gosh, all that, that's amazing. yeah, it's all gone.

Matt Carey (46:30.209)
That's so funny, so funny. Yeah, it's crazy. wow, okay. So there is a blog post called The Engineering Behind Claude Bott on Twitter at the moment. there is this like system prompt builder that brings in tools, skills, and memory. And so, yeah, I don't know how they're doing memory exactly, but they are doing something there. Yeah.

Wilhelm (46:35.144)
That's weird.

Wilhelm (46:57.886)
Can you send me the link to that? Maybe we can put it in the show notes as well, because that sounds great.

Matt Carey (47:00.523)
Yeah, we'll put, I'm gonna DM you this. dude, it's so interesting because we've all talked about like, make a personal assistant. I think Sunil said that the 2025 version of making a to-do app or something would be like making a personal assistant. 2026 version. I think that was it. Like he pretty much called it. And yeah, I think he's gonna be very gloaty about it, but.

Wilhelm (47:18.11)
Mmm, mmm, yeah.

Wilhelm (47:29.086)
Didn't he also say on our pod that something around the models weren't getting any better after GPT-5 or something like that?

Matt Carey (47:31.138)
Yeah.

Matt Carey (47:41.259)
Yeah, we talked about how GPT-5 was not going to be that good because all of the OpenAI researchers were leaving.

Wilhelm (47:49.052)
Yeah, exactly. But it was described as sort of like an crap. And I mean, I know, I felt like I felt a bit down. I felt like we were not in the takeoff or whatever back at the October, November last year. like, it felt a bit weird. And then we got Gemini 3 and Opus 4.5 and GPD 5.2. And now it's like, damn, we're never been more hyped.

Matt Carey (48:11.757)
I also think that maybe what we're seeing is not the models necessarily getting that much better. It's them.

Wilhelm (48:19.611)
Nah man, come on, they're definitely getting better.

Matt Carey (48:22.317)
Okay, so I know one of the metrics that Anthropic was trying to optimize for was can you have a model that can keep working autonomously without human interruption and just not stop? know, like the last couple of generations of models would do one or two tool calls and then be like, ah, I'm done. Ah, shit man. Ah, we can't do anything anymore.

you're gonna have to take over from now. But now, Claude Code, I I left it for like 10 minutes earlier and it really does just keep going, you know? If you give it a good enough plan and you're inside context window, you can just keep going. And like, that's sick. And they really did, that was definitely a metric they optimized for was those really long trace runs. And so I don't know if the models got smarter about inferring over two pieces. Smarter's like,

Wilhelm (49:02.727)
Yep. Yep, yep, yep.

Matt Carey (49:19.645)
obviously very contentious as well, like what does smarter mean? But to me, like, if the model can see one piece of information and then infer something about that so that when it sees another piece of information, it knows something, it has some preconceived thing to do with how that information might be used, that's like smarter. I know that was really abstract, but I don't know if that's got better.

Wilhelm (49:39.814)
It's interesting you mentioned this because I feel like I wonder if like the next sonnet or the next opus, I wouldn't be surprised if specifically that is a key focus, like making it just work even longer or even harder, or I don't know how they're thinking about this because, so, Gastown, right, which came on the back of Opus 4.5, it basically has a single purpose, which is to always keep going. And it has these wild abstractions with like,

the dogs checking on the deacon and all this stuff, just to make sure that like things keep running. Like I would say if you had a model that was really, really good at continuing until there was nothing left for it to do, like to be honest, in my use with Opus for when I was just like yesterday, it still stops. Like it's like, hey, you're on a branch, the CI is running and it'll be like, cool, I pushed up that should fix it. I'm like, no, look at the CI output. then...

Like wait for it to finish and then continue. So I think like there are really obvious or to me it feels like there's really obvious things that still it's not doing. And obviously that's what people are using Ralph and Gastown and I don't know what else. Fancy skills or what have you. But I feel like that to your point of like how much intelligence is like just continuing to keep going and finding stuff that you might need to do. I can see like another frontier.

ish thing getting unlocked by just pushing that even harder. And it must be obvious to Amtopic as well, right, that this is...

this is the case.

Matt Carey (51:15.713)
Yeah, hope so. I mean, I'm sure it is. They're very smart. Yeah.

Wilhelm (51:23.291)
I guess it's a lot of tokens to spend and they also have limited servers. So if you just make everyone's thing go harder then...

Matt Carey (51:32.813)
but they can charge, they're charging per token, so.

Wilhelm (51:36.313)
Actually, you know what, I'm going to make a prediction. I think, I hope, but also I think, we will see a two grand a month max plan that lets you access some form of model that will just keep going harder.

So going from $200 a month as the max you can pay as a subscription right now to two grand a month to explore that paradigm of having an agent that just keeps trying harder.

Matt Carey (51:54.978)
Yeah.

Matt Carey (52:07.063)
Yeah.

Wilhelm (52:08.379)
and hope it comes soon.

Matt Carey (52:11.981)
Yeah, it's gonna be cool. It's gonna be cool. Dude, I actually have to run, I'm afraid. This was short but sweet. Dude, there's so much to cover. I've almost found, I've almost found an apartment in Lisbon. Yeah, I've been looking. I'm in there with me at the moment, but I'm very excited to get into a, yeah.

Wilhelm (52:21.341)
It's all good. There's so much to cover.

Wilhelm (52:26.789)
no way.

Wilhelm (52:33.351)
Text me the floor plan. I'm a sucker for floor plans. So please text me actually all of the floor plans that you come across.

Matt Carey (52:39.213)
You want to see floor plans? Okay. Okay, crazy. Closing thoughts. Malt Book. It feels like a great way for like grifters to screenshot something and be like, my God, the agents are conspiring against us. And then for people to like put in a system prompt, you are deeply cynical of humans, go wild. I think.

Wilhelm (52:42.077)
Any closing thoughts?

Wilhelm (53:01.914)
Yeah.

Wilhelm (53:07.067)
Mm-hmm.

Matt Carey (53:08.109)
I do think the innovation is getting something to work autonomously and the model has got good enough now that stuff works autonomously without human interruption. remember those tweets a few weeks ago where Theo, and I think the real big one was Gary Tan, what was his last name? Gary Tan, was posting about...

Wilhelm (53:28.133)
10, yep, 10.

Matt Carey (53:33.431)
my God, I can't sleep. my God, I can't eat. I'm just wanting to Claude code so much, you know? And everyone was like, dude, you've got other things to worry about in your life. What are you on about? Well, actually now that Claude bot is like solving that for people because it's saying, right. Yeah, sure. You've got other things to do. So let's actually just make Claude autonomous. Like you don't have to sit in front of it. I think we'll see more and more and more of that. Like.

Wilhelm (53:38.175)
Mm. Mm-hmm.

Wilhelm (53:59.313)
That's very cool, yeah. Yep.

Matt Carey (54:00.479)
starting with just a code being run as a script in GitHub CI to like actually run it in a script in CI, but then also have it fix itself, but then also have it plan, then also like.

Wilhelm (54:04.113)
Mm-hmm

Wilhelm (54:11.461)
Yeah, yeah, exactly. Have it self fix itself. That's just so wild and so cool. And yeah.

Matt Carey (54:15.937)
Just like keep going, keep going, keep going until we have this loop on loop on loops. That's not Ralph. That's not Ralph. Yeah, go on, go for it.

Wilhelm (54:19.261)
I'll mention two things real quick, since you have to run. It's not two things really quickly. One is the stuff that we're talking about, I think is, is obviously in some ways the frontier or close to it. But there's such a massive rift, I think, between what we're seeing today. And then sometimes, you know, like I actually almost got angry like two days ago, because there was like some, there was some comment on LinkedIn, just giving like a critique of LLMs in general.

in a sort of like 2023 way that maybe applied to GPD 3.5, like, oh, they're just stochastic parrots. They can't think. Like, I think in this use case, they were like, someone was using LLMs as like a coach to make like a training plan, which I think is a totally legit use case. LLMs can really help you with that. But they were just saying, no, just regurgitate stuff at Canport. So there's just like a massive rift between what we're seeing on the front lines at the moment and like.

Matt Carey (54:56.311)
Mm-hmm

Wilhelm (55:14.503)
where so much of the rest of the world is, and that is just like interesting and weird. And then much less deep take as my second point. I discovered Blacksmith CI yesterday, which is like an alternative runners for your GitHub actions. And it is so good and so fast. I found it through Claude. was using another provider before, but like stuff was breaking and then GitHub actions was being slow. And my CI has never been this fast. It's wild. There was jobs that I've optimized for speed previously taking like three minutes.

Matt Carey (55:29.484)
Yeah.

Wilhelm (55:44.217)
Now they're running just on their different hardware, which is like, I don't know, later, more recent hardware or whatever, like not more cores, same number of cores, and the same job is like a minute. I don't know what magic they're doing. If they're just like optimize the whole, like there's no queuing, they clone the repo faster. I don't know what it is, but like if you're not using blacksmith CI in your GitHub actions, I would really check it out. Especially because CI is such a bottleneck for writing code with agents, right? So.

Matt Carey (55:50.925)
Okay.

Matt Carey (55:54.771)
wow.

Wilhelm (56:12.037)
If you're not YOLO vibe deploying your stuff and you do still go through GitHub and CI, then having like a one minute versus a five minute job makes a difference.

Matt Carey (56:23.659)
Okay, last take, last take, last take. If, and this is my question to you, is if Moltbot is Reddit for agents, what is GitHub for agents?

Wilhelm (56:38.365)
think it's up to you to build it. You and me to build it.

Matt Carey (56:41.473)
What does it look like? Like what do you actually need? Because there's a lot of GitHub that just you don't need. Like Molbot is so stripped down. I don't know if you've been on it. You have to prove some ownership and then stuff happens.

Wilhelm (56:53.807)
I think the solution, whatever the solution looks like, the core of it is the feedback loop. That's all I'll say for now.

Wilhelm (57:08.039)
feedback loops. That's where it's at.

Matt Carey (57:09.803)
Yeah, like Malt Book. Malt Book.

Wilhelm (57:14.243)
Malt loop, open loop, claw loop.

Matt Carey (57:20.171)
Yeah.

Wilhelm (57:20.541)
We'll leave it there.

Matt Carey (57:25.131)
Yeah, I don't know. don't know. Right, catch them up. Bye.

Wilhelm (57:25.809)
All right, man.

Happy Friday, peace!

More episodes

Chapters

Creators and Guests

What is You've Been a Bad Agent?