You've Been a Bad Agent

  • Matt's trail marathon in record-heat France
  • Matt's hardware wishlist
  • AI Engineer World's Fair
  • Claude Sonnet 5 ships live on air, and the case for keeping a personal benchmark
  • GLM 5.2 at Opus 4.5 leve
  • Fable's relentlessly proactive
  • Quinn Slack's Freedom of Intelligence rally
  • Chad gets personal
  • Poke Human, RentAHuman, and Mercor

Links:
  • Omius cooling headband: https://omius.com/
  • What the science says about the $200 Omius headband (Outside): https://www.outsideonline.com/health/training-performance/omius-cooling-headband/
  • Colmi R02 open-source Python client (tahnok): https://github.com/tahnok/colmi_r02_client
  • Noop, the open-source no-subscription Whoop app: https://www.techradar.com/health-fitness/fitness-trackers/this-looks-awesome-theres-now-an-unofficial-open-source-app-for-reading-whoop-data-that-doesnt-need-a-subscription
  • Google's data centre from 2,000 retired Pixels: https://research.google/blog/a-low-carbon-computing-platform-from-your-retired-phones/
  • AI Engineer World's Fair 2026: https://www.ai.engineer/worldsfair/2026
  • NVIDIA DGX Spark: https://www.nvidia.com/en-us/products/workstations/dgx-spark/
  • Timeshifter jet lag app: https://www.timeshifter.com/
  • Fitbod: https://fitbod.me/
  • Poke Human (Interaction Company of California): https://poke.com/faq
  • RentAHuman (Y Combinator): https://www.ycombinator.com/companies/rentahuman
  • Mercor: https://www.mercor.com/
  • Quinn Slack (Amp): https://x.com/sqs

Creators and Guests

Host
Matt Carey
agent and mcp at Cloudflare
Host
Wilhelm Klopp
building @kolo_ai

What is You've Been a Bad Agent?

Wil and Matt discuss tech, startups, and building really cool things with AI. Sometimes joined by (actual expert) friends.

Wilhelm (00:01.248)
Yo We're back. It's officially summer and we've been enjoying life.

Matt (00:03.281)
Hey Hello.

Matt (00:11.44)
We have. We're both looking quite tanned. You're looking quite tanned.

Wilhelm (00:12.49)
Or kind of

Thank you. Yeah. I actually I get sunburnt like every day the sun is out in San Francisco. Well Europe's been in a heat wave all of like last week and here it was like depression week, like extremely grey every day. Like very few people realize this, but like June and July are like the worst weather months in San Francisco.

Matt (00:33.014)
I often think that's why all of the good models get released. Basically like everything everything like gets released when there's a dull week in San Francisco. Like when it's like super warm, everyone's like why everyone in Europe's like why why why are we not getting any new stuff? It's like, it's just cause just everyone's outside chilling out.

Wilhelm (00:38.796)
Wilhelm (00:49.39)
Ha ha ha.

Everyone's outside. Do you think this will be our last calm summer in life?

Reasonably calm summer where things slow down a bit. Well, we're on an exponential, sir.

Matt (01:00.604)
What mean? What do you mean? Slowdown is in like okay, like RSI, like recursive self-improvement. Okay, so the the idea that the models are gonna get better and then everything's gonna get better and we're gonna live in some type of like crazy utopia and log X amount of time.

Wilhelm (01:09.868)
Yeah, like 'cause I yeah.

Wilhelm (01:21.154)
Hopefully Utopia. Yeah. Well I mean the the the pace of the pace of change, right, has been absolutely wild for our industry. Like it's like completely different world we live in to like a year ago. But I feel like you have felt things slow down maybe a little bit the past few weeks. I mean maybe that's because all the models are under restriction from the US government and we don't have new toys to play with. But I just wonder what future summers will look like.

Matt (01:52.262)
Maybe we touch grass more.

Wilhelm (01:54.84)
Speaking of touching grass we've been doing some touching grass the past few weeks, that's what we haven't recorded in a in a in a in a month. you did a marathon this weekend.

Matt (02:06.757)
Yeah, trail marathon in in France, in the middle of a heat wave, which was entertaining. I'm actually it was two days ago, and I'm still feeling pretty ropey. the legs are feeling pretty good, but definitely the head my my legs are kind of fine now, but the head is like I don't know, I I had a bit of hallucination at the end, I got a bit too hot, and then the red cross stuck me in a

Wilhelm (02:22.135)
That's wild to me that your legs are feeling good.

Matt (02:34.957)
a bucket of ice when I cross the finish line for forty five minutes. Yeah.

Wilhelm (02:39.263)
Is there i is there a medical term for what you is it like a heat stroke? Did you officially have a heat stroke or what's the

Matt (02:44.665)
No, I think it's hyperthermia.

Wilhelm (02:48.033)
Hyper okay, I see.

Matt (02:50.951)
'cause I think I'm not entirely sure what my temperature was when they dragged me into the ice bath, but it was after about twenty-five minutes in the ice bath, it was still thirty-eight point seven or something like that. Yeah, thirty-eight point six. And then it started going down and after like forty minute it wasn't going down for a while and so they were a bit concerned and I had more and more ice on me. And it was kind of weird. They like it it wasn't like an ice bath like you might get in an ice sauna. It was I was lying on a stretcher.

Wilhelm (03:03.905)
Yeah, damn.

Matt (03:20.306)
And I was covered in a I was like with a tarpaulin underneath me and they sort of just wrapped me in this like tarpaulin and then just filled filled the edges, all the gaps with ice. yeah.

Wilhelm (03:32.055)
But it felt pl did it feel unpleasant? Like did the i ice f or did it feel deeply pleasant because it was cooling you down after like a

Matt (03:38.658)
No, I actually didn't f it didn't feel that cold at all, which was really weird. It was a r it's a really weird experience. Like I was covered in ice and it felt kind of cold on my legs, which was actually really nice. because that might be why my legs feel so good. But the the rest of my 'cause I had but the rest of my body, like it genuinely didn't feel that cold, which thinking back now is kinda worrying. But at the time I was like, this is this is odd. Like you feel the pressure of like ice all over you, but I didn't feel cold at all.

Wilhelm (03:42.231)
Wow, yeah.

Wilhelm (03:54.189)
Mm.

Matt (04:06.895)
I was actually still sweating in the ice. yeah. Well not really sweating 'cause like you're kinda dumb by then, but I felt still felt hot. And I've felt hot ever since. Yeah. It's I think it's gonna take me few days to recover from this. They yeah.

Wilhelm (04:15.457)
Yeah, yeah.

Wilhelm (04:21.983)
It and and this was just for context for everyone, this was like on one of the hottest days ever recorded in like the history of France. Yeah.

Matt (04:29.966)
France. Yeah, I think so. in in in in that particular i where where we were, it wasn't s as hot as some other parts of France. So some parts of France hit like over forty, forty-two, forty-three degrees, if not more. which is insane for France. and it wasn't like it wasn't like southern France like like the southern France, like Nice or Marseille or like one of the beach towns that hit those. It was like inland France.

Wilhelm (04:39.425)
Okay.

Matt (05:00.142)
just south of Paris that that hit like these crazy temperatures. yeah, I think that there is something weird going on with the weather. I mean climate change and everything. But also that I mean there is like it's France's warmest June on France's warmest day on record, it's the UK, and then they had like three warmer days after that. And then the UK had like the warmest June on record, the warmest

Wilhelm (05:01.803)
Yeah, yeah, yeah, yeah.

Matt (05:27.566)
June record the day after, the one with June record the day after. And everyone's saying there's gonna be this like crazy like El Nino storm system over this winter. So yeah, if you're in a place that gets like strange weather, I would be properly prepped for this winter. I mean, I'm kind of excited. Yeah.

Wilhelm (05:36.419)
yeah, yeah, yeah.

Wilhelm (05:43.339)
It's because all the data centers are taking all the water. That's what's causing us.

Matt (05:49.291)
Yeah, dude, we actually can't say that because we're gonna get clipped and then and then you're gonna get so much hate. Or somebody's gonna somebody's gonna find your LinkedIn profile and then use you as like a source for the fact that data centers are taking all the water. So actually just south just south of me, just south of Lisbon, there's this data center in a place called Sinesh, which actually does use water from the sea, but the sea is super cold there. so it's not a closed loop system. It does eject back into the sea, I think.

Wilhelm (05:52.188)
Mm-hmm.

Wilhelm (06:01.52)
yeah, yeah. Yeah, yeah, yeah, yeah, yeah.

Wilhelm (06:09.376)
Okay.

Matt (06:19.585)
but it uses this like really cold water from the sea, which is kind of sick, to cool the data center. And the sea's like super, super cold there. So I I don't know how it works like if you're playing with the natural habitat of like the ocean around C Nesh. I'm assuming you are, which probably isn't great. But it's I it's like a it's a very cold sea in that moment, and the data center probably doesn't need a huge amount of water compared to the Atlantic Ocean.

Wilhelm (06:23.336)
Mm-hmm.

Wilhelm (06:46.107)
this I I was gonna ask, this is not the Mediterranean, this is the Atlantic. Yeah, yeah.

Matt (06:49.119)
No, so yeah, so actually all I think all of Portugal is on the Atlantic Ocean because even in the even in the south, you think you're on the med, but the med kind of only starts like in in Gibraltar, yeah. Yeah.

Wilhelm (06:53.493)
Right. Yeah, yeah.

Wilhelm (06:59.499)
Yeah, After Gibraltar. Yeah, yeah, yeah. Okay, so I wanna tell you a funny thing about so h on this heat topic, a bunch of like Iron Man events were like modified or cancelled. Like Iron Man Nice was gonna be on this weekend and it was just completely cancelled. Like a bunch of friends were there, like obviously, you know, you you build towards this for like I mean, some people train for Iron Man for like years. Certainly you do like a big build for like a few months and then like the the main race you just can't do it. and there's like a whole drama

drama about it. Iron Man Frankfurt was shortened, which I've never seen before. That they I think they they shortened the whole thing to like sixty percent or seventy percent or something like that, o of the whole race.

Matt (07:40.419)
Dude, I get it. I get it. Like didn't someone try and do Iron Man Niece and died?

Wilhelm (07:46.605)
I think regul yes, regularly people do die.

Matt (07:49.333)
Like it's no no no no, but someone tried to do it like by themselves, like on the day of the race and fell off the bike and died from exhaustion. Or not from exhaustion from heat stroke or something and then they fell down the mountain and died. yeah, absolutely insane. Like I I I I really think like the w like forty degree weather or like late thirties is not something that you should I I I never really took it that seriously, but it's it is r it's ridiculous.

Wilhelm (07:53.754)
God God

Wilhelm (08:04.108)
That's awful.

Wilhelm (08:15.457)
Yeah.

So speaking of taking it seriously, let me tell you a funny story. so in like a lot of these like Ironman races and stuff, are like in hot temperatures and especially because the run is so late in the day, right? Because usually with lots of these running races, I mean all the best like marathon performances, et cetera, they happen like never in the summer months, they happen in like October or in the spring. And then they start quite early so that you have cool temperatures because it does make such a difference for

Matt (08:18.391)
Yeah, I couldn't take it seriously from now on.

Matt (08:24.556)
Gone.

Wilhelm (08:47.67)
For running. But if you're doing a marathon like at the end of an Iron Man, you start running at like 12 p.m. or something. So it's like it's pretty hot. So there's a lot of heat training that people do, and there's a bunch of devices as well. And one of the devices, which is really fascinating, it's called the Termius headband. And I don't know if you've seen this, but it's this headband that you wear, and there's like these little blocks of essentially graphite that sit on your forehead. Like it's actual

like graphite blocks is it the phone. And then on the outside it looks a bit like a heatsink, like what you would see on like a CPU or like a motherboard or something. and the idea is that I think the marketing line is that it expands the surface area of your forehead by like 300 times. And then when you put water on your massively expanded forehead, it helps you cool down. And I have used this thing before in in racing. I mean it costs like two hundred dollars or something.

Matt (09:34.106)
What?

Wilhelm (09:46.431)
And I mean there's a lot of other more silly things in triathlon that you could spend money on. but a lot of a lot of athletes, a lot of pros use this, and like I think i I could totally buy that there's like a placebo effect. But a friend once sent me a study where they tried to like measure how much of a difference it makes in performance. And they basically had like five people do an all-out 5K in hot conditions, like with the thing on and without the thing on. And there is no measurable improvement in performance.

but they did measure a like decrease of like one percent of temperature, but only when measured inside their butt with a thermometer. So so the net effect of this thing is that you're you're you're cooling your butt down like a little bit, which is kind of like it's a good like kind of summary for how silly triathlon is in general, but we love it.

Matt (10:43.755)
Well, as I mean yeah, people use but thermometers a lot and the French love it, that's all I'm saying.

Wilhelm (10:54.252)
Next time you do this marathon, you can borrow my my headband.

Matt (11:01.089)
Might save my butt. Yeah, I also saw Nike have these running like shirts that look super thick, but they have loads of small holes in them. And then in some areas they have these massive holes in them. And that's meant to it's that fabric, there's something weird meant to be good. S something weird that's meant to happen there where it it's got all of these holes, so it's like super venting and it's got massive holes where you don't need fabric.

Wilhelm (11:02.464)
Ha ha ha.

Wilhelm (11:14.989)
yeah, yeah, yeah. Yeah, yeah, yeah.

Matt (11:30.817)
But then the the fabric has like got this thickness to it, which I don't really understand. But there is something there's meant to be something wicking or cooling there where it like brings the water, it brings your sweat or something away from your skin faster. It's like a wicking effect. but it's literally like super thick. It looks weird. but I kinda like the look of it. I I know I guess I was wondering like how much more heat tech is gonna be a thing. I saw I saw peak performance.

Wilhelm (11:50.795)
Interesting.

Wilhelm (11:57.665)
Yeah, yeah.

Matt (12:00.34)
because I was in Chamonix, so it's like awesome. You can go around all of the shops and they have like all of the best kit there. Like the kit that you only normally see like online. It's in like every single one of these shops, like the pro, pro, pro, pro, pro gear. and so Peak Performance have this like first range of running kit, and it's crazy light, like just optimized for heat. and I'm just wondering like how much in the future like we will be optimizing more for that. Like Salamon have these massive hats, these massive whites.

Wilhelm (12:04.916)
Mm. Mm-hmm.

Yeah.

Matt (12:30.186)
like wide brimmed hats that are kind of like beekeeper's hats without the netting. optimize optimise for heat. yeah, I guess I I guess it just i ne something that never really like I thought about. Like I'm a type of guy like pick whatever shorts fit me as long as they're black and let's let's roll with it. but like yeah never really thought about the heat before before never really been a problem but

Wilhelm (12:34.442)
Wow. interesting.

Wilhelm (12:41.196)
That's cool.

Wilhelm (12:45.526)
Yeah.

Wilhelm (12:52.531)
Mm-hmm.

Matt (12:58.217)
I guess if we have ri a lot of rising temperatures, most races do end up being in the early to late summer months. we're gonna have to start thinking about it more.

Wilhelm (13:06.252)
Mm-mm.

Wilhelm (13:10.06)
Yeah, it's cool that there's so much innovation happening on this front. I think like in London Marathon, didn't Save wore he he wore like some kind of wild new mesh top that like helps with heat.

Matt (13:25.957)
I don't know. It wasn't actually that no, it was quite warm for the London Marathon this year, wasn't it? I think it I think it was actually quite warm. Yeah, but it's always it's a weird time of year the London Marathon because it's like early April. And so it can either be absolutely freezing or raining or roasting. It's like there's like three options. yeah. But dude, I had such a good I had such a good lunch today. went to like the

Wilhelm (13:33.302)
Yeah, I forgot.

Wilhelm (13:40.298)
Yeah, yeah.

Wilhelm (13:46.966)
Mm-hmm.

How's yeah.

Ha ha ha.

Matt (13:54.653)
Very, very local Portuguese place. and you pick your fish, the one you want, and they make it for you, on a on a grill out the back. And it's like super low key, there's there's no menu. the receipt comes and it's just like a post-it note with some random scribbles on it. It's like it's it's a great place, big fan.

Wilhelm (14:18.471)
Your summer is in full swing.

Matt (14:20.638)
Yes, exactly. Exactly. I'm just I'm recovering, you know, like I've done I've done a big thing, done a big effort, and now I don't know, I've just got to like go go look after myself, you know? Go look after myself.

Wilhelm (14:33.331)
A hundred percent, yeah. No, it's it's time to live life. What else have you been up to? Make us make us jealous those who who don't who don't spend who don't summer in Europe. what else what else is happening in your life at the moment?

Matt (14:39.646)
Ha ha ha.

Matt (14:52.32)
what's going on?

I mean the mountains, it was a big deal. Like you wake up in the morning, you look at Mont Blanc and you're like, Wow, that's pretty cool. And then before you go to bed, it was like thunderstorming and then just before the sunset, like the sky would all clear up and these like incredible sunsets with just like the sun piercing through looking at Mont Blanc, like really wild and you're having having a little beer on a terrace. Yeah, enjoying yourself.

Wilhelm (15:15.435)
That's awesome.

Wilhelm (15:21.779)
Are you a big APRO man?

Matt (15:25.236)
not hugely. It's not something that I'd ever have at home. but yeah, like maybe maybe in the beach bar's close to me actually, just down the road. Really good Aperol. Yeah, dude. I have when you come visit, I have maybe like on on on the beat the stretch of beach near me, there's maybe twenty, thirty beach bars. Cause it's it's a three, four kilometer stretch.

Wilhelm (15:36.765)
Each bar. That's what I want to hear.

Wilhelm (15:45.216)
Yeah.

Wilhelm (15:52.169)
Yeah, that's wild.

Matt (15:52.259)
And there's just beach bars er beach bars every few hundred meters. So yeah.

Wilhelm (15:56.874)
We went to this festival in Barcelona at the beginning of June. And it was just we did, yeah, yeah, yeah. Yeah, it was it was awesome, yeah. But it was also just like like you just arrive in Barcelona and like you're just having like dinner at like ten PM, the sun's still kind of out, like on the streets with like friends, you're having pizza at April. It's just it's it's really warm. Like in in SF you almost never have warm evenings. So it's just like

Matt (16:00.637)
Did you go to Primavera? yes. I thought it was that.

Wilhelm (16:25.469)
A beautiful, beautiful vibe. And we say very centrally. There's just like it it's it's kind of weird, but like metro grocery stores, like very central grocery stores, aren't really a thing here. Like you have pharmacies that have some kind of groceries in them as well. But the even there you have to like walk ten minutes, even if you live s centrally sometimes to to to find them. And then you have like corner stores, which sell like alcohol and snacks and stuff. But like in Barcelona, I feel like y

Matt (16:52.307)
Okay.

Wilhelm (16:54.707)
y within like a one minute walking radius, you have like five supermarkets. Which is just wild. And they're great. Like Spanish supermarkets are phenomenal. I'm I love supermarkets actually. I just love as like an experience. Not even for the utility. I just love them as an experience.

Matt (17:05.051)
I

Matt (17:12.263)
I think my expense of Spanish supermarkets is it's like fifty percent crisps and alcohol.

Wilhelm (17:18.387)
Really? damn. Yeah, wild.

Matt (17:21.02)
Like I I don't know, I'm gonna offend so many people now, but I don't know. Maybe it's the ones I've been to in Madrid and stuff. Like in in towns, I felt like Spanish like the little the the little supermarkets I think are mostly crisp and alcohol. And you do get those here as well, but you also get more like shops where you can actually buy produce. Like you have like actual fresh stuff as well.

Wilhelm (17:45.085)
Mm. Yeah, yeah, yeah, yeah.

Matt (17:46.769)
And there's a lot of that. I saw this really cool graph of Europe where it was like what percentage of people's groceries? I'm gonna use the word groceries, I never use the word groceries, but you've used it now. what percenta it's not really a thing in England, groceries. shopping, I guess.

Wilhelm (17:58.773)
What do you use? What do you say?

Wilhelm (18:04.724)
Okay. yeah. Fair fair.

Matt (18:06.481)
Yeah? Yeah? anyway, what percentage of people's groceries are like fat are like processed food? Or like ultra processed food or something. And in the UK it was like fifty percent. It was like like

Wilhelm (18:16.49)
Uh-huh.

Wilhelm (18:21.524)
yeah, bro, you need to stop sending me these slop Instagram reels about how shit the food in America is. I can't watch them, man. It's just depressing.

Matt (18:27.468)
Ha ha ha ha.

but in the UK it was like it was like fifty percent. And in like Germany it was like forty odd percent. and in France it was like thirty late thirties. and in Spain it was like late twenties, it was kind of the same in Italy, and in Portugal it was like ten. And it's like that that kinda that kinda tracks. There is like very little rubbish in our supermarket. No, the the reel that I sent you, which I thought was hysterical, was it was all about you brought it was all about the Norwegians.

Wilhelm (18:46.44)
No, no way.

Yeah.

Matt (18:58.986)
shipping in their own fruit and food into the US because they didn't trust any of the groceries in the US. And I'm gonna offend everyone from the US and I'm gonna do it meaningfully. if you have only if you've only ever had groceries in your own country, you should leave and go somewhere where they actually have real food. Like

Wilhelm (19:11.297)
here we go.

Matt (19:23.588)
It's it's kinda horrific. The oranges the orange like standard oranges

Wilhelm (19:26.622)
Well I think I think you can get really nice groceries in the US as well. I think it just costs a lot more.

Matt (19:32.775)
Yeah. Okay. If you're gonna get stuff imported from like Mexico maybe.

Wilhelm (19:37.064)
Like I just think the beautiful European experience is like you walk into a grocery store and like there's some like tomatoes and they cost you like thirty cents and they're incredible.

Matt (19:46.864)
Yeah, Portuguese tomatoes are amazing. So in Portugal we have like blueberries, tomatoes, pineapples. I went and had got a pineapple earlier. they're literally grown like just down the road. the blueberries are grown in the north of Portugal. the tomatoes are grown like just a little bit inland from us. They have avocados in the south of Portugal. They have like all of the stuff, like all the groceries you might like, like all of the like fresh produce, like really good veggies, really good potatoes are in the north as well.

Wilhelm (19:54.174)
Ha ha ha.

Matt (20:16.166)
They have everything and it's it's mad cheap. It just gets transported around Portugal. Portugal's tiny, there's like 10 million people here. yeah. I was chatting to a friend actually and they were I they they they live in London and we're and they're originally from China and we were chatting and we were like I was like, do you just like do you want to ever like leave London and just like I don't know experience somewhere a little bit smaller? And they were saying to me, they were like, yeah, so actually London feels kinda small.

Like the city I grew up in was twenty-five million people. And I'm like, what? London's like just is around ten million people, I think. the whole of Portugal is eight point five million people, and Lisbon is one point five million people, and it's by far the most heavily d the most densely populated place around. and yeah, I just I just I find that crazy.

Wilhelm (20:48.34)
That's wild. Yeah, that's wild.

Wilhelm (20:56.906)
Mm-hmm. Yeah, yeah.

Wilhelm (21:13.31)
Yeah, yeah, that that same.

Matt (21:13.509)
I I don't really I'd like to go to China actually. If there's I I've got a friend who makes a hardware device and I wanna go to China with him and just like just experience it a little bit. I'm not sure I I'm not sure I'd wanna live somewhere with so many people, but I

Wilhelm (21:30.741)
You were making the pitch at some point that everyone like you you should build a a hard like it was like a PSA to hardware startups. Don't build software, just make really good hardware and ship us your APIs or whatever or good docs. You could build a soft like a you could go to China, make your own little hardware device and then fulfill the vision. You could make it like a challenge, like can I build a Garmin clone with no software in a week?

Matt (21:42.054)
yeah, yeah, no, I still I still believe in that.

Matt (21:49.947)
Don't tempt me, bro. Don't tempt me.

Matt (21:59.377)
don't do that to me. Don't do that to me. I there's two things I want to build. there is the the garmin the Garmin clone, the actual like the the whoop with a watch with a with a watch face. because I've tried I I thought about the rings. There's a bunch of that you can buy the knockoff rings from China that have their own open firmware, but I kind of want the watch face. Like I'm if I'm wearing something like that, I actually want to be able to tell the time. So so I like the knockoff garmin with open source hardware.

Wilhelm (22:11.43)
yeah.

Wilhelm (22:23.837)
Yeah, yeah, yeah, yeah, fair.

Matt (22:27.981)
open source everything I think would be so cool. And then the other one gone.

Wilhelm (22:30.378)
So there is yeah, there is one you can buy for like sixty dollars or something that is like it's a bit like the five stick or five Stack. Maybe it's from five Stack,

Matt (22:42.411)
okay, M5 is sick. They're so cool. I saw a really cool e-reader from them. I I'm I also think phones should be e-readers. Like I have this other thing as well. Like I don't need my phone to have colours anymore. I really don't want it to have colours. I want a no-distraction phone. but I still want it to have like iMessage, which is kind of tough because then you're an Apple. So I kinda made my phone. I kind of made my I turned anyway, there are like a bunch of like weird things with hardware at the moment.

Wilhelm (22:54.514)
Mm.

Wilhelm (23:06.266)
Є

Matt (23:11.54)
And then the last one that I want to do, which I might just do, I need to get a 3D printer, is make a coffee grinder. This is going back to my roots. I studied mechanical engineering many like years ago. But I really want to design a coffee grinder that's like super quiet. And you must be able to do this now because my coffee grinder, most of the noise comes from the the the motor because they're super high torque motors, or

Wilhelm (23:28.808)
yeah.

Wilhelm (23:36.532)
Motors. Yeah, yeah.

Matt (23:41.729)
in my case I think it's actually a really low torque motor and most of the noise comes from the gearing to make it high torque.

Wilhelm (23:45.94)
Just h hand grind it, bro. You don't hand grind your coffee?

Matt (23:49.517)
You can, but then they're also a little bit noisy. No no no nah. They're also a little bit noisy because then you have like these like coffee grounds shaking around and they're in normally like a piece of glass with metal around them. Like you must be able to dampen the fuck out of that. Like there should be no reason why why you c why I can have airpods in and not hear anything else, and yet my coffee machine wakes up every well, my coffee grinder wakes up everyone in my building. Like there should be no reason that that exists. Anyway.

Wilhelm (23:58.74)
Right.

Wilhelm (24:03.783)
Yeah, yeah, yeah.

Matt (24:18.446)
I thought about playing playing with some stuff, yeah. And you must be like them there must I just think no one's really like coffee grinders are designed like the high end ones are designed for cafes, the low end ones are designed to be cheap, the mid end ones are designed to look pretty and to match like your coffee machine or something. And and it's just like some people have high end ones in their house, but then they're really loud.

Wilhelm (24:18.569)
That's damn. Yep.

Matt (24:46.901)
And they're huge. Like there's one awesome one, but it looks like a telescope. And it's like, dude, like why would you have that in your house? Like it looks sick. It's this massive piece of metal. But yeah, anyway. This is my obsessions with this randomly designing weird stuff.

Wilhelm (24:55.463)
Yeah, yeah, yeah.

I think you should Yeah. You should maybe maybe you need to build a robot, then hook up Codex, give it a slash goal, make me the perfect coffee grinder.

Matt (25:09.742)
No, need to put an agent inside auto inside whatever 3D modeling software people use these days. They use Autodesk Invent I used to use Inventor or Fusion. These people still use that? I guess so. But I to be honest, I I probably wouldn't even download 3D modeling software. I must be able to just like make like vibecode an app that can read an STL file directly and just modify the file directly. That that surely those those modeling softwares have just died because

Wilhelm (25:21.499)
I have no idea, to be honest. Yeah.

Wilhelm (25:36.68)
Yeah.

Matt (25:39.132)
maybe they haven't. There's massive enterprise contracts, but I would not be buying shares. This is not financial advice.

Wilhelm (25:45.955)
Ha ha ha

Matt (25:49.401)
Yeah, wow. that was a tangent. Yeah, coffee grinders need to get better. Send me that thing for open source garmin. Dude, my garmin broke. My garmin's actually stuck in a depot in the US. It was trying to go to France and it's now I sent it to France and it's it's ended up in a depot in the US. And I'm really hoping they got the address right. But I can track it and it's somewhere in California, actually. Which is kinda wild.

Wilhelm (26:13.673)
That is kinda wild, yeah. Yeah, I need to dig up dig up that link. I think a friend sent it to me. it was definitely sold out because it was like I think, yeah, it's just wild that you can buy a proper watch with like a heart rate monitor and like everything, like Wi-Fi Bluetooth like sixty dollars.

Matt (26:19.075)
be cool.

Matt (26:22.67)
Yeah, so cool.

Matt (26:29.848)
But I guess like the ESP, y you could get you could print a tiny circuit board based off y you could you could prototype with an ESP and a wi and Wi Fi and blue with had which has Wi Fi and Bluetooth and then you could get a little heart rate monitor. because the rings, this is what I don't understand, is those rings are like twenty dollars. The pre-made rings with six days of battery life. You can buy them, yes. Th there's one called the Colme R two or something. I'll I'll send you that one.

Wilhelm (26:48.105)
Yeah.

Wilhelm (26:51.688)
really?

Matt (26:59.17)
But it's so cool and the hardware's op the firmware is open source. So people have made like apps for it and so did you see the guy that ha while we're on the topic of health, there's a guy that hacked Tears Whoop hardware i on Twitter.

Wilhelm (27:10.897)
yeah, yeah. To tell him which when he's most stressed when wh whose colleagues

Matt (27:16.685)
No, there was that one, but there was the other one that his whoop subscription had lapsed and so he hacked his hardware and made a new app for his hardware that wasn't blocked. And I was like that that's actually kinda smart. I hate things where like whoops are just e waste. Like if you don't pay the two hundred dollars a year, like for most people your whoop is effectively bricked. That's just sucks. Yeah.

Wilhelm (27:23.627)
yeah, nice.

Wilhelm (27:35.26)
Yeah.

Mm-hmm.

Yeah, yeah, yeah.

Matt (27:43.786)
Sucks so bad. My dad's probably has like three of them because he keeps on upgrading them. So he just have all of his old whoops. It's like they're just dead. Like no one can use that.

Wilhelm (27:50.129)
Yeah, yeah. There was a cool Google project recently, I think, that was about re like bringing old Android devices back to life and using them as some kind of distributed compute.

Matt (28:01.782)
I saw that, like some data center of phones and it's like they're just taking inspiration from the phone armies, from the drone phones or something.

Wilhelm (28:04.604)
Yeah, yeah.

Wilhelm (28:10.121)
I don't know that. But yeah, that's that's a way to re keep using them I guess.

Matt (28:11.864)
Yeah.

But I heard Google's running out of compute. I heard the latest rumours are they're completely out and they're having to buy compute from other people now.

Wilhelm (28:24.073)
Yeah, I mean I think th they're doing some like they're doing they they're offering like new debt or something or or doing some v novel financial engineering at least for a company like Google to afford more like Capex.

Matt (28:39.543)
Easy man. We're just in such a weird world.

Wilhelm (28:42.705)
Okay, let's we have some stuff to get through. Some some agenda topics. What do you want to talk about out of all these things that we have?

Matt (28:50.829)
We

I think I think first of all, I think it's so cool how Swix has managed to get 7,000 nerds to San Francisco all in one place to just like nerd out over AI engineer welfare. I'm so jealous I'm not there. I will be in Sam Fran in October, hopefully. And be very exciting. I will I will send you them. I think I think we talked about it, but I will send you them.

Wilhelm (29:08.957)
Yeah.

Wilhelm (29:16.347)
nice. Wait what are your dates? Did did I know that?

Okay.

Matt (29:24.101)
I'm like I think what Swix has done with with AI engineers is super inspiring. Like, generally captured a zeitgeist, managed to find not just like the best people in the US, but the best people in the world to come and speak about the topics that they're super passionate about. People that in other industries don't tend

To jump on camera every six months to a year and spill their whole roadmap and why they did everything they did over the last year. Like, these aren't people that you would typically think, this is what they're that they're about. And somehow the AI Engineer World Fair has managed to like capture the zeitgeist enough and AI engineer events in general that that.

Wilhelm (30:00.924)
Mm-hmm.

Matt (30:20.288)
that he's th they've managed that and I I'm just like kind of gobs backed. It's not like the Yeah, it's insane. The attendee list is just incredible. It's just incredible.

Wilhelm (30:26.64)
It's wild. I'm amazed. It's so big. Yeah. It it I think it was less than half the size last year. Like but there's like a massive conference. Yeah, 7,000 people. This is kind of like a community run event, right? This is not like like he doesn't work for one of the labs or anything like that. Like this is like kind of a grassroots sort of thing. Obviously there are like lots of big sponsors and things like that. but yeah, like

Open AI is giving talks, keynotes, and topic, like I think the whole s the whole theme of the thing this year is like software factories, which are really super interesting. So yeah, I think it's it's it's really cool. Very special.

Matt (31:08.076)
Yeah. I

yeah, I'm I I'm just like c still gobsmacked. I think the London event was super, super sick. I w yeah, I wanna go the SF one at some point. I wanna go to the SF one. Are you enjoying SF?

Wilhelm (31:28.646)
Yeah man. it's it's it's great. I like it. I love it. What happened recently that was kinda funny?

Wilhelm (31:42.587)
Actually no, I won't even talk about let's not talk about Corgi Cafe. there's a I it's actually really close to where I work. but I I'll I'll have to go back because I don't think it was in a good state when I when I arrived. There was just like a dis there was there was just like a single distressed person. Like it's just like okay, fine. I'll I I was like, okay, I have to check this out. I it's cooler it's a 24 7 cafe.

Matt (31:47.82)
How did you go?

Matt (31:58.57)
What what no con you have to sp

Matt (32:13.164)
So for context, for context, in the last no no, we ha we ha you've mentioned it now. So for context, in the last pod, we talked about how Harry Stebbigs has no life. and that was promoted by that was that was prompted by a post that he made where he was basically hyping up the founder of Corgi because like he built a cafe underneath his office so he never had to leave his office and he could like

Wilhelm (32:13.468)
no, this is so sloppy to talk about. I don't want to talk about this.

my god. No.

Matt (32:43.883)
I don't know, he had like a cafeteria, and I was just a bit confused. I first that's just a bit of a weird reason to hype someone up. but like, I mean, go for you go whatever. But you're saying you actually went to the cafe? since then, since then he the the the co-founder of Corgi has done some very weird stuff online. He's like, I think people should just search him up themselves and like make out what they think of it. I'm not I'm not gonna I'm not gonna draw a line here, but yeah.

Wilhelm (32:53.768)
Okay.

Wilhelm (33:11.374)
I yeah, okay. I think the charitable take is that like there's a lot of people in San Francisco, there there's th there's there's a lot of people in San Francisco who wanna build their dream and they have to work somewhere. And if they don't have an office yet or whatever, you work in a cafe. That's kind of great and beautiful, always been like this. Problem with cafes in San Francisco is like some of them are like a bit strange. Like a lot of them

Matt (33:17.044)
Are we being nice now? We're being so nice.

Wilhelm (33:40.38)
Especially the ones that have the good coffee, they don't have seating or they don't have Wi Fi and they close at like four PM or three PM or something like that. I mean, does it make sense? I don't know. Like it's a choice.

Matt (33:51.912)
Yeah, that makes sense. Yeah.

Matt (33:56.862)
Yeah, because I don't want i if I'm if I'm making a nice bougie cafe and serving wonderful like I don't know it's San Francisco so like ten dollar coffee to I don't know, people who of wanna buy ten dollar coffee. I don't wanna serve one ten dollar coffee to a like someone to sit down and open their laptop and take one of my tables for the whole day.

Wilhelm (34:00.034)
Mm-hmm.

Wilhelm (34:21.307)
So just don't have any tables. Well, I don't know. The point is

Matt (34:24.959)
So I think no when I went to okay, I went I went to blue I went to Blue Bottle in Hayes Valley, I think, last time I was in San Francisco. And they are they're literally in a garage. Like not only do they not have tables, that one's great. Like that one is like, I just turn up for my coffee and then I go about my business. I don't like I I'm I'm kind of against the idea of just just like pushing myself on the cafe owner.

Wilhelm (34:35.269)
Yeah, yeah. I think that's their original location apparently, that that someone said that to me.

Matt (34:52.968)
Like sure, if you have an empty cafe and you have seats, there are cafes in London where like they have a downstairs area that's not super nice and so they let it they let people sit there for co working, but the upstairs area is where like they people come and go and like mingle and chat. So they have like a quiet area and an and a loud area and that I like kind of understand. Like if you do want to support your like local I don't know

Wilhelm (35:09.297)
Matt (35:21.182)
Work from homers, work from cafeers, then then like then like roll with it. But like you can't you can't be serious that like think how much Wii works cost in the US. There is a reason why they cost that much. Like rent's expensive.

Wilhelm (35:35.432)
I mean well okay, I mean I think like so people have been working in like Starbucks to do work like for forever, right? And it's not like Starbucks is against us or whatever. And Starbucks Weaver is open quite late, like they're op and and they obviously exactly they have free Wi Fi. but I think like the the the the charitable like take on Corgie Cafe like, wouldn't it be cool if like, you know, you could keep working even at nine PM? Like I I'm sure it's the same for you, Matt, that like you some of your best like coding or whatever has happened like

Matt (35:41.662)
Yeah. Forever, yeah.

Matt (35:46.986)
And they have free Wi Fi.

Wilhelm (36:04.335)
at midnight or one AM or something. So they were like, okay, let's open a twenty four seven cafe where you know, the builders can work like whenever they want. and and they opened that well, so when I went, they didn't have any coffee and I c wanted a coffee. So I just didn't stay very long. they did have phone booths in there, which I thought was cool. like that's that's nice. But yeah, it it was a

It was a slightly strange vibe, but I I will return another day to see to to sample the coffee.

Matt (36:38.633)
That's crazy. by the way, Claude Sonnet five just came out.

Wilhelm (36:42.798)
Whoa.

Matt (36:44.389)
On live on air. It came out I don't know, I mean it came out twenty minutes ago, but we're still pretty good.

Wilhelm (36:46.393)
On air No way.

Should we kick off our wide ranging evalds and benchmark suites as we go? Actually, do you have do you have personal a personal set of benchmarks that you that you run?

Matt (36:58.583)
yeah.

No, I like I started building this thing like map bench and I had a bunch of tasks that I had like reasoning traces for and I was gonna try and like export like do some data mining on my laptop to like find all of my old traces and find ones where the model was able to do stuff, find ones where the model wasn't able to do stuff and then and then create like a benchmark and I couldn't I got like halfway through and I

Did some cleaning and then I just stopped and couldn't be bothered, but I probably should finish that. I just like it feels like my tasks are quite wide ranging and not that applicable to anyone else. But I guess that means why that means I should just make a benchmark. So no, I don't have one at the moment, but probably should.

Wilhelm (37:46.716)
Yeah, yeah. I feel like it requires quite a lot of like discipline. I don't really have one. I don't really have one either. although I think like one thing that's kind of interesting, I think that would it probably wouldn't work with like the Sonnet 5 release, but it would work with like it would work with like a fable or like a mythos or something like that, which is just like here's all my like notes I've been taking for like ten years and all of my stuff, like find interesting things, like find interesting connections.

Matt (38:05.542)
Mm.

Wilhelm (38:14.299)
And then like give me the top things, and then you can compare that to like an opus output and probably it would be better. Or you can see the difference.

Matt (38:16.466)
Think

Matt (38:20.681)
People people do that with yeah, people do that after they use like the Codex desktop app for a while because it gathers so much information about you that you can like it just saves it all as well and like you it can reference previous chats and things and it's like not not sandbox at all. I think people do that.

Wilhelm (38:27.18)
Mm. Mm.

Wilhelm (38:33.211)
Yeah, yeah.

Wilhelm (38:36.965)
Yeah, actually maybe that's something that just that's a great point. Like it's just applicable to anyone. Like if you use Cloud Code or Codex or whatever, it saves ever all your conversations, right? And like and and Cloud Code natively shipped like a slash insights command that was supposed to give you like interesting insights over your past conversation history. And the output of that was just total slop. Like it just wasn't good in any way. But I think it's prob it's probably one way to judge a new model that anyone can do is just like have it run over your history and then

Suggest interesting things or improvements to your agent's files or

Matt (39:08.376)
Hmm. But there'll be a point where I don't know. I think there'll be a point where like that becomes good enough that it's always a little bit insightful. And then it's gonna be hard to judge. I think you do need some like past fail metrics. Yeah. But that's quite a lot of bottle releases recently. Like we had Fable Five for however many days we had it for. What those wonderful, glorious days of AGI. And then and then do you have Fable V back now? No, you you must you're a US

Wilhelm (39:21.745)
Sure, sure, sure. Yeah, yeah, yeah.

Wilhelm (39:27.461)
Yeah, yeah, yeah.

Wilhelm (39:34.055)
No, no, no, no. I'm not a US citizen. I I mean I don't think it's available to anyone at the moment.

Matt (39:38.8)
It jazz is

Matt (39:45.2)
Okay. I heard a rumour that like that it was just on Twitter that it will be available to US citizens if you like register in the Court app or something ridiculous like that. But

Wilhelm (39:55.205)
Yeah, it's been a bit mixed comms about this, right? Like I feel like there was some yeah, I I actually I really don't know what's gonna happen. I think there's a there's a rally today from Quinn Slack, the AMP and source graph founder that's called like Freedom of Intelligence or something, which is not company affiliated, I think they're being quite clear, but it's just like like a rally that just says we want like open access for everyone.

Matt (40:04.902)
Yeah. The AMP founder, yeah.

Matt (40:19.963)
I I think I would have gone to that. Firstly, because I really love events and I think they're good fun. And secondly, we were talking before we went on air about like philanthropic projects. And I do I've been trying to think about this a lot recently. And I do I do think people universally should have access to intelligence. If some people have access to it.

I think everyone should have access to it. Otherwise it's it's kind of like education in some way. Like you give the the like in the UK at least, like there is a pretty much a right to education. Like there is state education, there is other types of education, but like pretty much everyone should get an education. And if you don't, then then you're gonna be disadvantaged in life. Like I don't think that's controversial.

Wilhelm (41:18.234)
Yeah, yeah. I yeah.

Matt (41:18.863)
I think the same is gonna happen with with AI models. And like I think the it's gonna be super weird, but I can I can actually imagine a world where everyone has an NVIDIA Spark GTX in their house. And I almost bought one pretty recently because I do think I I just think there's there's gonna be th this idea of like a house brain where it just controls what your house does.

Like your house as like an organism, like I think will start in the US in San Fran, and maybe we'll even like maybe it will even power your humanoid robot at some point. Like you'll have this your own compute and it'll either go that way or it will go another way, which I think is more dystopian, where there are a few companies that control who has access to high level intelligence and who doesn't. And that feels way more scary than

having open access and you're just limited by cost. Because cost is something that like nation states can work around. It's something that individuals can work around. It's something that yeah, sure it's prohibitive, but like it's not it's not an end game scenario.

Wilhelm (42:33.255)
I'm just gonna run to the toilet, be right back. Maybe we can yeah, I'm just gonna stop and actually we can start it back together.

Matt (42:35.674)
Go for it.

--- [recording resumed after break] ---

Wilhelm (00:01.398)
Okay, I'm back. We're talking about model stuff. Yeah, yeah, we're we're on.

Matt (00:04.512)
Nice. Are we are we recording?

Sack.

Wilhelm (00:10.708)
okay, I wanted to talk about some stuff that I've been building recently because yeah, I didn't have Fable for very long. I was in London actually and kind of like sick and it was raining and I didn't know it would be taken from my cold to dead hands so quickly. It would be pried away. It was wild getting back to Opus and then just feeling like, wow, I used to talk to something smarter, and now Opus four point eight felt felt like dumb. Which is wild. Like expectations adjust so quickly, like compared to Fable. Like I felt like

Matt (00:23.384)
Ha ha ha ha

Wilhelm (00:40.194)
Fable, you could you could really feel the step change in a way that maybe we haven't had since Opus four point five in November last year.

Matt (00:52.447)
Yeah, no, Fable was sick. Fable rattled. Fable got a bunch of stuff really right. 'cause I I didn't know it was gonna be taken, but I was also like I had had these like sort of like seven or eight things I needed to do that were very single-threaded, were pretty easy to do by themselves for for a for a human for me. they were very well defined and I just like

Rat. I just got f I just got one Claude code thread and was just like workflow all each of these things, that one workflow for each of these things in Fable and it spun off these like dynamic workflows to create subagents and okay and I just left it overnight. and genuinely out of the eight things, I think it did like four of them to a state where they could just be merged.

Wilhelm (01:49.698)
Yeah, wow.

Matt (01:49.835)
And I was like, that's previously this would have been like well before AI, that would have been maybe two weeks to three weeks work. With the older models, it would have been like three maybe three or four days. Three days. Let's say three days if you're doing TPRs a day.

Yeah. And with Fable it was an overnight job to get like five or six of them just ready to merge.

Wilhelm (02:20.652)
Yeah, yeah. I think Simon Willison described it as like relentlessly proactive. Which felt felt similar t to my experience. you know, you just mentioned the NVIDIA Spark thing. Do you think you could run GLM five point two on that? Which I think is are you gone?

Matt (02:25.527)
It was really good.

Matt (02:34.252)
Yeah.

Matt (02:38.75)
With two of them you can. So each one has a hundred yeah, each one has 128 gigabytes of RAM. There are of VRAM even. they network together. So you won't get like fast tokens, but I think you'll get like seven to eight tokens per second. I think that's what Reddit was telling me. and you can put two of them together. And so then you'll have 256 gigabytes of VRAM. But

Wilhelm (02:41.542)
no way.

Wilhelm (02:59.598)
Okay, so a and yeah.

Matt (03:08.192)
The problem is I think you can still only run it at like four-bit precision, like tiny precision. So you are probably better using one of the Gwen models, the newer Gwen models, that are designed to be on a single, a single GPU, and you can run that at like more like, I don't know, like 8-bit or even like 16-bit precision, like FP16.

Wilhelm (03:14.294)
Mm.

Matt (03:38.035)
Yeah.

Wilhelm (03:38.126)
Yeah. I mean it sounds like most of the like model providers are serving up GLM five point two at eight bit quantization, which

Matt (03:48.563)
Yeah. Yeah yeah. No basically no one is serving it at full at full sixteen.

Wilhelm (03:53.42)
Right. And I think you don't lose very much from my limited research.

Matt (03:58.459)
No, no. But eight bit I still think you need like seven hundred it depends how much context window you want. So if you want the full million context window, you need like a bunch more gigabytes.

Wilhelm (04:06.176)
right, okay.

Matt (04:14.677)
for the for your K for your KB cash. Yeah, it gets that gets that's that starts exploding, I think. I'm still not like entirely sure on the maths. I've like written it down a bunch of times and like I've watched a bunch of YouTube lectures about like how it all works and I still couldn't explain the maths on like how on because you can just calculate this stuff. Like there there are calculators online about how much VRAM a particular size model will need or particular makeup of K V cash and people can you can work it out quite easily.

Wilhelm (04:14.764)
Interesting.

Okay.

Wilhelm (04:43.691)
Yeah, yeah, yeah. Is there anyone you trust in particular? Like so the reason I'm asking is because I I didn't think that open source models would catch up to this level of quality. Like I thought it would take forever or maybe would never happen that we'd get like an Opus four point five level open source, open weights model. And we have now basically, right? Like the GLM five point two is at that level.

Matt (05:05.107)
Yeah, well the new Mistral model, dude, I don't wanna say it just yet, but I don't know if it'll be open source. But there's this whole like giggle about Le Chat on fat.

Wilhelm (05:11.787)
Wait,

I thought that was completely fake. Is that a real thing?

Matt (05:17.789)
Yeah, I don't know. Le Chat on Fat or Le Chat on Long, but there are some like leaks coming out of like a of of some Japanese Twitter accounts. I actually have no idea if it's true. But anyway, no, you were saying about open source models. Like yeah, I didn't think they were gonna catch up either that much. I maybe was a little bit more optimistic than you were. I was less optimistic that the labs would pay this, like the Chinese labs would pay this much.

Wilhelm (05:21.757)
What?

Wilhelm (05:27.563)
Okay, okay, yeah.

Matt (05:44.275)
But I was I underestimated their ingenuity on how they do it and I saw this crazy tweet. You know we what you know what we all slept on? Buying shares in ZAI.

They're literally a public company.

Wilhelm (06:02.189)
sorry, I thought you were talking about ZML for a second, which is the the French company. No, like the makers of yeah, yeah.

Matt (06:07.549)
No, no Z Z A I. The the people who make GML five point two they are legit a public company. And we Hong Kong.

Wilhelm (06:11.948)
Yeah, yeah, yeah.

Where are they listed?

right. Yeah, I had no idea.

Matt (06:19.73)
I think. Yeah, we all miss buying shares in that because we saw the previous like was it four point seven or like the previous model doing pretty well and their shares have gone, I don't know, six to eight X since January.

Wilhelm (06:34.273)
Interesting.

Matt (06:35.924)
Crazy. But yeah, no, I I underestimated their ingenuity about how they would get all these distillation traces, how they would like how they would catch up, and like I mean Deep Seek has done insane work, insane like foundational level work, and open sourced a bunch of their their stack, which is crazy. And then like the ZAI stack has been I mean they've they've done more of like a data play where they've done more distillation from Claude and

They are, I saw this crazy tweet. I don't if it's true. so it's just like caveat this, where they have a hosted endpoint where you like you can use it as like a base URL for any Claude model. And if you do, and they run a classifier over your prompt and they find it in distribution for their model, they will they'll serve you their model, and otherwise they will redirect you straight to.

to cl to anthropex models and then they will save your reasoning traces for further distillation.

Wilhelm (07:41.559)
Who who does this?

Matt (07:43.877)
ZAI, the GML people. So, like that to me means that they could not only could they get something that was as good as Claude in terms of like raw capability, they could potentially, if they had more reasoning traces and more people using this day-to-day than Claude, which we've got to remember, China's a massive place, and there are a lot of people using this,

Wilhelm (07:45.162)
I see. interesting.

Matt (08:13.82)
Like there is potential they could have a bigger a wider variety of data than than anthropic. Potential. Like or at least to be up there with the variety of data. And I don't know how they determine whether something is in distribution or not. I think that's like that's pretty cool. I want to look into the mechanics of how that works.

Wilhelm (08:26.208)
Yeah, interesting.

Wilhelm (08:32.992)
That is fascinating. Okay. slightly different thing I wanted to talk about, which is I've been doing more work on more work on Chad since we last spoke. And there's two things in particular that I think just like make a good point about why I think people should be building their own stuff like this. Because it's been like very, very useful and and and obviously really fun for me to like work on this. But I think and and obviously

th this is like a very well known user facing application of these LLMs, like building hyper personalized software. But I think it's just it's worth repeating and worth ex everyone experimenting with. Because most software, right, is just like kind of mass produced, right? Or like the the last era of software. It's like a team sitting down being like, okay, there's this use case, there's this like problem, we're gonna build some software that kind of like solves it for everyone or solves it for like a large group of people.

Matt (09:17.201)
Hm.

Wilhelm (09:31.968)
But now you can really just like hypertune it to your own personal needs and workflows and it's like really good. And I'll I wanna give two examples. So one is like a jet lag recovery app. where in the past I've used an app called Time Shifter. Have you ever used Time Shifter?

Matt (09:51.076)
No, but you I'm pretty sure about a year ago it was one the first things you recommended to me on this pod. Yeah.

Wilhelm (09:56.317)
really? that's cute. That's good. It's a really good app. So like there's like a I think it's all quite deeply rooted in science for how to do jet lag recovery well. Basically in this app you put in your flights, like you type in your flight numbers and dates and stuff and then it knows where you like from you from where you're coming and where you're going and how long the flight is and all these things. And then it will give you like a timeline that starts maybe like two days before the flight and then ends two days after the flight. And it's like, when should you get up?

When should you get bright light? Should you sleep on the flight or should you not sleep on the flight? when do you when do you go to bed? Like should you use like melatonin to help you get to bed? Should you wear sunglasses on the flight? Like when do you start drinking coffee? When do you stop drinking coffee? And it's a cool app. and it's an app I've paid for in the past. I think it's like a one time fee kind of unlock thing. But the thing is like it is kind of like mass produced and it's and and it it's incredible value for what it is, like.

But you can just go a bit further and like personalize it. So I've talked before about how Chad has a mobile app or I I start calling it like my own personal super app because it has just like all these completely different, distinct bits of software in it. But one of them is now like a personalized like time shifter thing. and the cool thing is it just integrates well into the rest of the system. So Chad already has access to my calendar, it already knows my flights. So I actually don't even have to enter the flights into another app or remember to do that like

some days ahead of my trip starting, it just does it automatically because it has the whole it's the whole system, right? So that step already falls away. But then you can also personalize it to yourself. so for me, I know b based on my own experience, but also based on my genome and my DNA, which Chad also has access to, that I'm a very fast metabolizer of caffeine. So

Matt (11:51.789)
Okay.

Wilhelm (11:51.969)
For me, actually, the caffeine advice in Time Shifter was never very like accurate because it tells you to stop having caffeine like super early in the day. But for me, I know like even if I have caffeine like 8 p.m., I can still sleep really, really well. So it can like personalize like these bits of advice, right? And then the third thing is TimeShifter doesn't know when did you actually get up, like and then adjust your

Plan based on that. It doesn't know how well did you actually sleep. Did you sleep on the plane, right? But Chad has access to my garment data. So it knows how well I slept. It knows if I slept on the plane. It knows when I got up, which all feeds back into the like jet lag adjustment machinery to give me a very personalized plan and personalized notifications about when to go to bed and when to go to sleep. So I think like it you can kind of see how like the general j the generic, yeah.

Matt (12:24.229)
Hmm.

Matt (12:45.048)
Yeah yeah yeah yeah no a hundred percent like the malleab malleable software. Yeah. And so on Chad on Chad do you have like a but do you have like just like one screen with like a bunch of mini apps?

Wilhelm (12:49.322)
Yeah. So th it's just yeah.

Wilhelm (12:58.102)
Basically, yeah. Yeah. I I I should finally write this bloody blog post about it.

Matt (13:03.844)
Definitely need to. I'd be well interested, yeah.

Wilhelm (13:06.582)
But yeah, it's it's it's like I mean I can hold it up. It's like it's like a lot of tabs and then a lot of non tab screens.

Matt (13:13.552)
Okay. Yeah, that's too many tabs, dude.

Wilhelm (13:17.812)
That's that's your opinion, bro. and and then I'll give one other example as well, which is something I rebuilt I think two weekends ago. I've been using like a a like workout like gym lift app, lifting app, called Fitbod. And I actually quite like Fitbod. but there's like tons of these workout apps, right? Where you like log your sets and log your weights and all of this stuff.

Matt (13:20.688)
Mm-hmm.

Matt (13:36.922)
Mm-hmm.

Wilhelm (13:47.51)
But the thing I really liked about Fitbod is that it tells you exactly what to do. So you basically just say, I'm in my gym, here's the equipment I have in the gym, here's how much time I have for this workout, like say an hour, and then it generates you the whole workout for you. So you don't even have to pick what you do in the gym. Like it uses some science, some program, some idea, and then just suggests exercises for you to do. And then it knows which exercises you've done in the past and can adjust.

Your like next workout based on that. And it's and it's it's really cool. And it's like a monthly subscription, it's really valuable. Like I've loved using it. I think it's helped me get into like a nice like gym habit. But recently I've been getting frustrated with some minor things in Fitbot and then some bigger things in Fitbod. So I was like, screw this, like I'm just gonna build rebuild this into Chad, into my Chad Super App. So now I have a lift tab inside Chad and

Matt (14:19.632)
та сек.

Matt (14:28.942)
Yeah, that's sick.

Wilhelm (14:45.949)
It's basically some of the core ideas from Fitbot that it pre-generates this workout. But similarly to the whole time shifter example, it just like incorporates everything else, right? So it pre-generates the workout on the back of my sleep data. So it knows how recovered I am. It looks not just at previous like strength workouts I did, but also like but like my entire Strava activity. So it knows if I've done like a run, my like legs are tired. And then also crucially, the one of the things I was frustrated about with Fitbot is

It's like a deterministic engine that generates your workouts. Like I feel like it wasn't actually that smart. And I'm like, I want like an opus level quality of LLM smartness to tell me what these workouts are. Like sometimes it felt like the Fitbot strategy for what I should do just kind of shifted around in like awkward ways. And I'm not sure if I was actually building muscle or like am I just on some like really outdated

program that like doesn't actually make sense for me. So now every time, or I think actually the way it works in the system is there's like a weekly review where a lot of there's a lot of deterministic, sensible stuff that generates the workout, but once a the the the daily like workout, but then once a week it gets it all everything I did, all my like sleep data, my rec like the recovery stuff, the lifts I did, everything gets fed into an LLM and it's like, okay, here are like here's what I did.

Now like adjust based on that. And then of course, because it's an LLM, I can also feed in like longer term goals. So I'm doing this like triathlon in November. I need to build towards that. I'm doing some random sprinting side quest to see how good a sprinter I could be in September. So like it it can all like feed into the system. So that's another example of yeah. Personal AI.

Matt (16:25.688)
Ha ha ha.

Matt (16:33.561)
So yeah, so cool. Personal AI, man. It's like I still think this whole I still think like the consumer AI thing hasn't worked out yet. and I just don't un You're a very productivity driven person, respectfully. Nah, nah, you love these like productivity hacks and stuff. Like the fact you were using like a flight

Wilhelm (16:43.403)
Yeah, I agree.

Wilhelm (16:51.275)
Don't know if that's an insult or a compliment.

Matt (17:01.997)
Like a flight shift tracker itself. I think a lot of people don't use stuff like that. Although I've tried, I just could never be bothered. But I guess like I guess that you're like you're quite driven by that type of stuff. So for you, this makes absolutely perfect sense, which probably means that for everyone else who don't doesn't care that much about this, it will just appear as a feature in some app that they do really care about.

in a few years time. Like my Garmin can automatically like my Garmin hasn't had or my Garmin had it's not here. My Garmin had an app that could that could do the flight shift stuff. but I never used it because it required me to put in the flights ahead of time and I always forgot. Yeah.

Wilhelm (17:33.459)
Yeah, yeah. I don't know exactly how this translates. Yeah, yeah. Mm-hmm.

Wilhelm (17:43.743)
Yeah, yeah, yeah, yep, yep, yep.

Wilhelm (17:50.927)
Exactly, right? Which is like wild. Like you your flights are on your calendar. Just like so okay, I'll I wanna tell you two other fun things about Chadlift, that I just remembered. So one of them is you can you can access the whole iOS ecosystem and APIs, right? So did you know that AirPods, the AirPods Pro 3, have a heart rate sensor?

Matt (18:01.283)
Yeah.

Matt (18:13.398)
I did know this, yeah, this is crazy.

Wilhelm (18:14.921)
And and it's actually more reliable than gar like the Garmin heart rate. So there was like a study done about I the Garmin heart rate has like an error on of like six percent or something. The airport heart rate is like one percent. So and then also there was always a like obviously heart rate helps you measure a little bit like how hard you're working. And if you're doing if you're entering your your sets and stuff on

Matt (18:29.496)
That's sick.

Wilhelm (18:43.131)
a phone, obviously the phone doesn't have heart rate, but guess what I'm always wearing while I'm in the gym? My like AirPods. So I it so now it can measure the heart rate alongside everything I'm doing in the app and then sync it all to like Strava on the other side.

Matt (18:48.32)
AirPods. Yeah.

Matt (18:57.003)
Wait, can you can can you access AirPods heart rate through like an external app on on Apple?

Wilhelm (19:06.435)
I think it would be quite easy to build. The the I mean the way it works as I understand it is that like your app registers with like health kit as a user you like approve it, then you start a health kit workout and then you the live airport data is streamed and then also you get like the full history when you save the health kit workout afterwards.

Matt (19:18.348)
Okay, okay.

Matt (19:30.135)
So okay, so in the app you do that? Yeah.

Wilhelm (19:33.575)
yeah, like the app code does that, yeah.

Matt (19:36.469)
Okay, that's really interesting. Yeah. man, I hate how good Apple R hardware. It's n

Wilhelm (19:41.939)
It's good, right? And then okay, and then here's another random thing. Something that really bothered me with Fitbod is so when I travel, right, or like in the gym, even some weights are like in kilograms and some are in pounds. Or like if I go like to the gym in London, it's in kilograms and here it's in pounds. In

Matt (19:53.101)
Mm.

Matt (19:58.529)
Yeah, yeah. You just divide it by two point two.

Wilhelm (20:01.844)
But in in Fitbot, there's like a toggle. Like you can very easily toggle between pounds and kilograms. But it would do the like precise conversion and then tell me to get a dumbbell that's like twelve point three five eight kilograms. Which obviously doesn't exist. Like that is so stupid. But like if you're you know on the end of like a mass produced app like that.

Matt (20:19.009)
Nice.

Wilhelm (20:26.696)
you're at the mercy of the developer. You have to like file a bug report and be like, this is kind of annoying and then they'll like never prioritize it. But if you have your own app, you can just be like, hey Claude, just make the toggle snap to weights that actually would exist. And then now you have the like problem solved. So anyway, that's my pitch over for this. But I think yeah, it's it's it's really fun and it's like actually quite useful.

Matt (20:49.089)
Yeah, I agree with you. I think it's cool. I think everyone should have something like that. I I okay, so on that topic and we probably need to finish because we've been going for ages. but it was something that you didn't have on the list, but just on the to idea of personal AI, did you see you probably won't because you're not on Twitter anymore, but Poke, the interaction company of California released Poke Human.

Wilhelm (20:58.846)
We have, yeah.

Wilhelm (21:12.368)
I saw this. Yeah, yeah, yeah. Very cool. Yeah, good go on.

Matt (21:14.411)
And we have we've talked about this before because I I actually have a note from January from twenty twenty five where I wrote a whole thing about humans as APIs, humans behind APIs, because I thought there would be if you wanted to automate tasks in your life like boring tasks, like for me for instance, I this is like such a real dumb one, but I really want to get

Wilhelm (21:27.194)
Mm.

Matt (21:40.789)
I I want to get a cleaner that comes like once a month or something, just to clean areas of my house that I forget about. And that I don't want it to come once a week. I just want it to come randomly once a month. Or whenever I'm feeling like I need to do a deep clean of some room, I just want to be able to like call a cleaner. And there are apps that allow you to do that. In London, it would be very easy. Portugal, a little bit less easy. But there are people, there are apps. There are people like I can get a contact, I can find the person down the road.

Wilhelm (22:00.117)
Yeah. Yeah.

Matt (22:10.944)
But if I wanted to be this like slightly autistic AI is gonna automate everything, how would you solve that problem? Because I think it's kind of fun to think about it in that way, because there will be other problems that you in a more of a work context that you try to automate that have the same sort of sort of characteristics. And in that case, what you need is you need like a pool of people that are willing to do stuff for some amount of cash, like almost like Uber.

Wilhelm (22:31.966)
Yeah, yeah, yeah.

Matt (22:41.221)
have a pool of people willing to drive you around. and you need to have some way of connecting to them via you need to have some agent that you're using that can just call out to them. because you need the agent because it's like a it's like a otherwise you just might as well just download a cleaning app, right? Like if you're doing some sort of like personal automation, you're doing it through some sort of agent, like Poke in this case. and then I thought.

This service would exist because I was like, there are gonna be these things, things like deliveries, things like even things like lawyers certific like lawyers signatures, like accountants, like stuff where you need like potentially just one off tasks. Maybe you might have a relationship with them in the future, but initially you just want something one off that could be outsourced and could be outsourced for maybe cheaper than you could get it locally, also, which I think is kind of interesting.

Wilhelm (23:40.052)
Yeah, yeah, yeah, yeah, yeah.

Matt (23:40.357)
and so yeah, Polk released this feature. well first of all I thought this company called Rente Human released this as a whole product, as like humans behind APIs. And they had this whole platform where agents could like ask humans to do stuff and people could go on the platform as a human and like look through the tasks and like decide to do them. But obviously and there was an another couple of companies that came afterwards and I think Rente Human went to YC. I haven't seen what they've done since then, but this couldn't go like

One of two ways, right? I I think the Fiverr founder talks about this quite a lot. When you make a platform when peop you can ask people to do anything, then suddenly people do start doing anything. And like it can get pretty explicit, it can get kind of dangerous. And like the Fiverr people had to decide like what are we gonna allow on our platform? And I think that's that's that's an interesting one as well, like what do you allow on your platform? But

like all this ha all this came to a head and like Poke released this as a feature. So Poke's like like a like a personal agent and in their ultra plan you can offload tasks to humans. I haven't used it yet, but I'm pretty excited to like you can just be like for something that an agent couldn't do, you can it poke can offload, like as part of its one of its tool calls, to a real person, a squishy human being.

Who can do the task and you pay via Stripe and like it's pretty cool. I think they are very innovative, and I I really do rate, I mean, I'm kind of a bit of a fanboy. I I do rate their thinking in this world and their branding is epic. And just in general, like it's I might have qualms with how the product works sometimes, but I do root for them, and I and when you

Wilhelm (25:11.986)
As qual yeah.

Matt (25:33.16)
root for someone and you don't you can't conceptualize how, I think that's kind of also a mark of a good founder. Like there are a few founders who I root for like that. And it's not because like I love their product and use it day to day. It's I just genuinely root for them because they tell a good story. They have seem to have a vision of a world that is different to my own and like maybe even longer reaching than my own. And yeah

Wilhelm (25:39.411)
Yeah, yeah.

Wilhelm (25:58.004)
What were some of the examples they gave for the human? I think one one of them there was someone like who wanted to get like a flight refund or something and the agent couldn't figure it out but then the human like made it happen for them.

Matt (26:10.879)
Yeah, there was that what there was that I think was that the main example that they used? it's funny because I'm just now thinking like Jason from Codex used his Codex goal to like get a flight refund from JetBlue or something, and that was like his like example or something that he did last week. I think like things where you have to be on call to someone, there was there's always the general travel, like personal assistance type stuff, like

Wilhelm (26:28.04)
Very nice.

Wilhelm (26:37.724)
Yeah.

Matt (26:39.678)
These type of things, like, they are quite hard to actually automate. and having a person to do them like for me, being able to find a cleaner locally would be pretty banging. I mean I'm probably just gonna call someone up and find someone. but I've had a few Yeah, go for it.

Wilhelm (26:51.209)
Yeah, yeah.

Wilhelm (26:54.761)
So the human would like

They would like join some Facebook groups that are local, maybe like ask around and then eventually if give you the number of like a a cleaner, maybe. Is that like the the task?

Matt (27:07.592)
Yeah, or maybe they would even like organize it end to end and pay them and do all of that stuff and like just just just like get references and I don't know, just onboard this person into my life kind of. I I I think this type of stuff works really well for like one off tasks. If it's like very repeatable the thing you're gonna do, then you're probably better with like a dedicated UI, dedicated support, dedicated

Wilhelm (27:36.242)
I see, yeah.

Matt (27:36.573)
This sort of stuff. But yeah. I I think it's wild that they're working with they worked with Mercall to do this. Merkle and WAP. So I don't know if they're the two providers of people, but like

Wilhelm (27:45.925)
I didn't realise that.

Wilhelm (27:51.325)
I don't know what but yeah, Merkle they I thought they were doing like a lot of what was it, like finding people who with expert knowledge to help train AIs or something? That was like one of the things that they

Matt (28:04.51)
Yeah, they're the recruitment data labeling company. Anyway, I just thought that was a cool product release. People should check it out. Yeah, we should probably end. I need to go have some food.

Wilhelm (28:11.081)
Totally.

Wilhelm (28:15.859)
All right. Let's play the intro as the outro

Matt (28:20.285)
Let's do it.

Matt (28:30.089)
Буякаща.

Wilhelm (28:31.508)
Peace everyone.