Oxide and Friends

What do LLMs mean for the future of software engineering? Will vibe-coded AI slop be the norm? Will software engineers simply be less in-demand? Rain and David join Bryan and Adam to discuss how rigorous use of LLMs can make for much more robust systems.

In addition to Bryan Cantrill and Adam Leventhal, we were joined by Rain Paharia, and David Crespo.

Previously, on Oxide and Friends:

Some of the topics we hit on, in the order that we hit them:

If we got something wrong or missed something, please file a PR! Our next show will likely be on Monday at 5p Pacific Time on our Discord server; stay tuned to our Mastodon feeds for details, or subscribe to this calendar. We'd love to have you join us, as we always love to hear from new speakers!

Creators and Guests

Host

Adam Leventhal

Host

Bryan Cantrill

What is Oxide and Friends?

Oxide hosts a weekly Discord show where we discuss a wide range of topics: computer history, startups, Oxide hardware bringup, and other topics du jour. These are the recordings in podcast form.
Join us live (usually Mondays at 5pm PT) https://discord.gg/gcQxNHAKCB
Subscribe to our calendar: https://calendar.google.com/calendar/ical/c_318925f4185aa71c4524d0d6127f31058c9e21f29f017d48a0fca6f564969cd0%40group.calendar.google.com/public/basic.ics

Adam Leventhal: 00:00

Last week, he made us wait for, like, four minutes. Well, my man in the office says Hello, Adam. Says Bryan here. Hey, Bryan How are you?

Bryan Cantrill: 00:09

I'm doing well.

Adam Leventhal: 00:10

How are you? I'm doing very well. And we've got all the oxide friends here. We've got David and Rain.

Bryan Cantrill: 00:15

And Rain.

Bryan Cantrill: 00:16

Great.

Bryan Cantrill: 00:18

You know, if our predictions episode was only a week ago, and yet it it already feels like at least one of our predictions already feels like such a lock. It's amazing that it was even even considered a prediction as little as a week ago. I think this ennui, the software engineer ennui and then your absolutely brilliant naming of deep blue for this sense of software engineer on we. Wondering what the real purpose of anything is that the LLM could just do everything for them. It feels like this has already taken root in the last week.

Bryan Cantrill: 00:52

Is that my imagination? Feels like this has been

Adam Leventhal: 00:54

I think it, it I think I don't think it's just your imagination. I think it may also be my imagination. But when I saw someone, not even tag us, but but just describe the feeling as Deep Blue, I was like, wow. This is this is really getting there. We've really made it.

Adam Leventhal: 01:11

Yeah.

Bryan Cantrill: 01:11

We made it. We we've definitely arrived. You know, how ironic would it be if that cease and desist came from IBM for for naming for for sullying the good brand of Deep Blue into a kind of, like

Adam Leventhal: 01:25

I'll tell you the predictions market does not have that one coming.

Bryan Cantrill: 01:28

So Exactly. Let's see Deep Blue disambiguation on the Wikipedia page where they actually need to clarify that we're not talking about the the the the software engineering neo depression, LLM based depression.

Adam Leventhal: 01:39

Yeah. If we see the, if we see the the poly market, on that spiking, we know that there's a c and d coming and some insiders are profiting on it.

Bryan Cantrill: 01:47

Oh, it's my understanding of those insiders are supposed to be us. Right? Isn't that the way isn't isn't that what point market isn't that sorry. Isn't that who they serve? Isn't that I think

Adam Leventhal: 01:54

in this case, it would be it would be the folks at IBM about to sue us. But, yeah, I mean, that's that's basically

Bryan Cantrill: 01:59

Can we take out a position on getting a c and d? Is it the guy who's in C and D over the course? They're they're gonna be Oxide and Friends. I mean, old conventional wisdom, Oxide and Friends, bingo card. New conventional wisdom, Oxide and Friends, polymarket.

Bryan Cantrill: 02:12

Yeah. It's got a

Adam Leventhal: 02:14

good good hedge. Right?

Bryan Cantrill: 02:14

Yeah. Yeah. It should be. But it it feels like this has really been, I mean, we knew this last week, but just the presence of LLMs. What is l what do LLMs mean for software engineering?

Bryan Cantrill: 02:29

I feel like I've seen, like, six different pieces a day on, right, about talking about you will like, what does this mean? What does it not mean? I and I feel like there's there's quite a bit of of noise out there. I I will noise. So a lot of consternation out there.

Bryan Cantrill: 02:48

That is for sure. It this is a this is an issue that has a lot of people thinking about it one way or the other for sure. I definitely learned the the I mean, there is a demographic, and if you it's

Bryan Cantrill: 03:05

hard to say

Bryan Cantrill: 03:05

because like these alright. Look. If you're in this demographic, you are gonna think that we are belittling you by making other people aware of this.

Adam Leventhal: 03:14

I I just wanna pause

Bryan Cantrill: 03:15

right now. Yeah. All listeners,

Adam Leventhal: 03:17

please write down what large group of people Brian's about to alienate.

Bryan Cantrill: 03:22

Excuse me. I I'm being handed this folded piece of paper. Listen. I just have I I I've staked the mortgage payment on a c and d, so I'm really I need I really need to get the goose this thing along. So really trying to after our you know, we we we tried to get a a c and d from the Republic Of Germany last year after offending all of our German listeners, but we that that nothing came with that.

Bryan Cantrill: 03:42

So there are there's there's a virulently anti LLM demographic out there. And and I like I I I I get it. And but that's what not what we're gonna talk about. I guess we already are. Sorry.

Bryan Cantrill: 03:59

Whoops. I

Adam Leventhal: 04:02

was like, wait. Those folks? You already alienated all those. Like, can't folks are

Bryan Cantrill: 04:05

they are there. That hornet's got you've already kicked many times.

Adam Leventhal: 04:09

Like, do not follow yourself on blue sky because Right. You maybe you should.

Bryan Cantrill: 04:13

I know. I I also feel like I'm doing you know, I had a boss who did this once who'd be like, listen. We're gonna go into this meeting, and I want no one to mention insert name of former customer. Like, we're just not gonna talk about them. I'm like, I wasn't gonna bring them up.

Bryan Cantrill: 04:27

And then, like, the very first I wanna explain to you why former customer is no longer a customer. I'm like, okay. Didn't okay. So oh, I get it. When we were in the car ride down, you weren't talking to us.

Bryan Cantrill: 04:38

You were talking to you. Like the part of your brain that tries to not screw everything up was trying to talk to the part of your brain that actually in fact screws everything up. And that part of the brain wasn't listening as it turns out. So I kinda feel like same thing for me here on like nobody bring up the fact that there's a demographic that believes that LLM use is immoral. I will do that from the top.

Bryan Cantrill: 05:01

So no. I'm sorry. Alright. Well, this is a hash. We can

Adam Leventhal: 05:05

cut all this out. Right? This is Yeah. Yeah. Sure.

Adam Leventhal: 05:07

Sure.

Bryan Cantrill: 05:07

Sure. That's right. But we are to the contrary, what we wanna talk about today is the what we have seen is that we wanna split this kind there's this false dichotomy out there that you are either vibe coding, a term that I get again, I believe is not gonna survive the year, a prediction that maybe not be not be faring that well in its first week evaluating our predictions one weekend. The that it where you have a fully closed loop in an LLM is just simply creating software of its own volition. That is kind of like that that is one pull.

Bryan Cantrill: 05:46

And then the other pull, of course, is like, no. No. These things are like, you should never use them. They should be used for anything. They're, you know, etcetera, etcetera, etcetera.

Bryan Cantrill: 05:53

And Correct me I'm wrong,

Adam Leventhal: 05:54

but I feel like a shibboleth of vibe coding is this idea of, like, you just do it. And if you don't like the results, you do it again. And if there's a bug, you do it again. And you're never sort of cracking open that not like, seeing what all the gooey middle is. You're just like, just go for it.

Adam Leventhal: 06:11

Like, kind of a lack a a tautological lack of curiosity

Bryan Cantrill: 06:15

Of curiosity.

Adam Leventhal: 06:16

About what's going on inside.

Bryan Cantrill: 06:18

Yes. Well, this which is part of the reason I I think that the term will die with this because I think the term is gonna be associated with that lack of curiosity. But, yes, absolutely. And there are, domains in which that lack of curiosity may be okay, but other domains which it's not. So that's kind

Adam Leventhal: 06:32

of the the the those are the kind of

Bryan Cantrill: 06:34

the two poles. And I think what we have we what we believe, what we've already seen is that there is a big, big, big middle ground. And in particular, what we have seen is LLMs are can actually be used to result in more rigorous engineering. And it's actually not even that hard. I think that there are that there's and I've got I've got some specific I my some specific and recent experience.

Bryan Cantrill: 07:01

Adam, maybe I I I could lead off with that introduce our colleagues. So I have been exploring, using Claude code to do kernel work, to do, we've got our host operating system is Helios. It's in the most derivative. And I had some, like what I thought was a good you know, you always have to have wanna have, like, a good first task for these things. Just like when I picked up Rust.

Bryan Cantrill: 07:24

I wanted to find, like, the good like, what is the right thing to try Rust on? My first thought of a doubly linked list ended up being that was the was the wrong idea. Right. That was the worst thing. So okay.

Bryan Cantrill: 07:35

Let's let's not do the worst thing. Let's do a different thing. And I mean, just and actually, you know, you kinda have the same experience with Rust of, like, picking up not a great first thing, although not deliberately. Right? I mean, because you did a Sudoku solver.

Adam Leventhal: 07:46

Yeah. And and grammar. Yeah.

Bryan Cantrill: 07:48

And and, I mean, how and how was that as a good as a first Rust project? It was It was, like, both. Very early for Rust.

Adam Leventhal: 07:58

It was early for Rust. It was early for Rust, and just the work that has gone in the intervening of, like, ten Yeah. Plus years Right. To making it approachable and the error messages sort of, like, convergent rather than divergent. Like, I think my big frustration was was, like, go go try this.

Adam Leventhal: 08:19

And it's like, oh, wow. That's much wronger. Like, don't like, who told you to do that? You told me to do that. What are you talking about?

Bryan Cantrill: 08:23

I didn't tell you that. I don't even know I don't know what you're talking about. Yeah. Exactly. So but you but you also so we I mean, in hindsight, would you today, would that that would be a fine first project.

Adam Leventhal: 08:35

There was nothing that I think so.

Bryan Cantrill: 08:36

The project itself. Yeah. Yeah. Right.

Adam Leventhal: 08:38

Yeah. I mean, it was it was it was even simpler than the thing you that ended up being your first Rust project.

Bryan Cantrill: 08:45

Right. So you always wanna have a good kind of first thing for these things. And I've been kinda waiting for a good like, what is a good thing to use Claude code on? Because I just wanna, like, see how it does basically on this stuff. And I had some, like, some relatively straightforward scalability work that needed to be in a lock that needed to be broken up.

Bryan Cantrill: 09:04

I knew I how I wanted to do it. It was gonna be a little bit tedious, but I was just kinda curious to see how it did.

Adam Leventhal: 09:12

And it should be said that the idea here also was like you're you're breaking up this lock in a way that many locks before it have been broken up. Is that fair to say?

Bryan Cantrill: 09:21

Yes. Absolutely. That there was actually like, what needs to be done here is really quite straightforward, and I can describe it prettycom pretty completely, to to quad code. Yeah. And I'll I'll I'll drop a link to to the actual, the the actual bug itself.

Bryan Cantrill: 09:41

It's almost seventeen eight sixteen. So I'll drop a link in for that. And so you can you can see exactly what the problem was at hand. Pretty straightforward. Now I was gonna use and, like, very deliberately not using it I'm definitely not closing the loop, not vibe coding it, not one shotting it, but really because in particular, like, I am not we're not even I'm not even gonna let it build anything.

Bryan Cantrill: 10:06

Right? I'm gonna let it we're we're gonna go into the source base, and I just wanted to see how it did. And I really did, like, did remarkably well. One thing that was really interesting and I would and but not mean, not definitely not perfectly and had some subtle issues that needed to be resolved, but we got those resolved pretty quickly. And I think I would say like it had two subtle issues, but it had also did not have a subtle issue that it could it it made a subtle discovery as well.

Bryan Cantrill: 10:37

And the thing that was really interesting to me about it is I was unleashing it on like a pretty big source base in terms of Lumos. And it was really interesting to watch it effectively read block comments to understand how subsystems worked. And to understand and so reading not just code, but also comments. And all in all, it was really pretty impressive. I you know, it it definitely understood.

Bryan Cantrill: 11:00

I mean, it's we're talking almost here. So it's like, this is not like, anything that you have trained on that is the Linux kernel or the BSD kernel is, like, literally not gonna apply. It would be very easy for you to create arguments to to functions that didn't exist. I'm talking about the Kstat facility, which is a facility that doesn't exist in the so it's like, you really cannot rely on on something that you've really trained on. You're gonna have to kinda look at this.

Bryan Cantrill: 11:28

But it was good. And I would say, like, net net, it probably saved me probably about in terms of the, like, the actual time to implement this, it probably saved me, like, half the time. Spent about two hours on it to have something that was I was pretty confident would work and did work versus I think it'd probably be about four hours on it. And it's someone's interesting. Well, how the LMs train on Lumos?

Bryan Cantrill: 11:49

It's like, yes. But the what it was you the the way it was iterating, and if folks haven't used Cloud Code, it is really, it's worth experimenting with, especially on an established source base. And so one of the things that I would just like to throw out there as, like, a first way that these things can help increase your rigor is by asking questions about a source base. And clearly, like, you know, all of the caveats apply that you can get the wrong answers and so on. You need to verify these things, but, it it was really I made it much more, made the it figured out a lot of what needed to be done, surprisingly quickly.

Bryan Cantrill: 12:27

So, I will absolutely be using it again for other kernel projects if only to as a starting point to, and I I I you know, one one thing it did, it was funny is, Adam, it needed to add a field to a structure. And this is the actual structure itself. None of the fields is commented. You know how we like best practice would be to comment every structure member? And in this particular source file, none of the structure members are commented.

Bryan Cantrill: 12:57

And the what its proposal was was to actually comment the structure member, like, for bad reasons. Like, we're not gonna do that. We're we're gonna be consistent with what's there by not commenting the new member that you just added. Like, the code you wanna actually write is actually cleaner than what's there. But it does think it it the other kind of thing that it it brought to mind is, like, boy, there's so much, like, technical debt kind of things.

Bryan Cantrill: 13:16

And one thing I think would be interesting that actually we're gonna see is people going into existing source code and commenting it better, using Claude that just have better comments. And then obviously validating all of its work and and, you know, not allowing anyway, that's that that that's kind of my my story, from my experiment over the weekend doing a Lumos kernel work and came away pretty impressed.

Adam Leventhal: 13:39

Awesome. That and did when you started that project, did you have a sense of what the code was probably going to look like?

Bryan Cantrill: 13:46

Yes. Yeah. Definitely. Yeah. I mean, like, this is one of these where I in in many ways, had biased it for maximal success.

Bryan Cantrill: 13:55

I knew I had a pretty good idea of what it was gonna look like. But there's also some fiddly bits that people, you know, look at the on I actually will put a a link to the diff in the actual bug. It's like, there there are some fiddly bits to get right, actually. There's a little bit of math that needs to be that you need to do correctly. It's not, but, yes, I definitely knew what the code was gonna look like.

Bryan Cantrill: 14:14

And this is it doesn't span multiple files. We're not introducing a new subsystem. Like, this is pretty straightforward as it goes. So this is, I would say, in a relatively a case that is that I really picked because it's kinda biased for success. Also picked up because we need to do it, by the way.

Bryan Cantrill: 14:30

Yeah. I mean, it's that's the other thing. It's like, this is like

Adam Leventhal: 14:33

This was not a yak shape. This was like It's not a shape. Doing it in four hours or you're doing it in two hours either way.

Bryan Cantrill: 14:38

Either way, it had to be done. That's exactly right. I would say the other thing is that the the four hours versus two hours ends up being really, actionable because I started this at 10:00 at night. And it was like, there's a pretty big difference between going to bed at midnight and going to bed at two in the morning. You know what I mean?

Bryan Cantrill: 14:56

In terms so, you know, sometimes like that that difference can be so yeah. It was pretty impressive and gave me the belief that we could actually use this in lots of other places. But that is my limited experience. I I wanna I definitely want to so we've got two of our colleagues here. We've got we've got David and Rain here.

Bryan Cantrill: 15:20

And both of you have used LLMs quite a bit and have discovered, I would say, new vistas of rigor. Rain, do you want to kick us off on some of the stuff that you've done where you found this to be useful?

Rain Paharia: 15:39

Sure. So there's a couple of different things I can talk about here. One of them is kind of the first work that I did. That was around May. And then the other one is like the work I did around December with like reorganizing types and stuff.

Rain Paharia: 15:55

Which one should I go with?

Bryan Cantrill: 15:56

Let's let's actually start chronologically because let's start as you're kinda getting into this stuff.

Rain Paharia: 16:02

Yeah. Yeah. Yeah. Yeah. So I guess, you know, like, as as you pointed out, though, a lot of the memes around, you know, LLM based coding are, you know, vibe coding.

Rain Paharia: 16:15

Right? You don't pay attention to the code. You just, like, let yourself in the flow or whatever. Right? That is I I have to say, personally speaking, that is kind of exactly the opposite of the way I want to build software.

Rain Paharia: 16:28

And, you know, for me, like I want software to kind of, you know, aim towards correctness. I really want a high degree of rigor in my software. So when I came into LLMs, came in with a huge amount of trepidation. I was really worried about I was just kind of trying it out, right? And I was like, Okay, I want to make sure that everything looks good and so on.

Rain Paharia: 16:52

So the first use that I found that I thought was kind of really impactful was so I wrote this we were having a bunch of issues at work around, like, you know, how do we store keys and values in maps? And so I kind of on the side around like April or so, I kind of started prototyping this approach which lets you store basically lets you store keys and values side by side next to each other. And I spent a few weeks, you know, kind of trying that out, right? Like, and, you know, I did a bunch of prototyping. I did a bunch of work.

Rain Paharia: 17:33

And then, you know, and then it was like an interesting experience because that was all handwritten, right? So it was like three weeks of around 2,000 lines of code, carefully handwritten, like there's a lot of unsafe code. And it was pretty challenging. But then I realized that one of the things I needed to do was that if you define a map in Rust, there's a lot of things you need to add to that map in order to make that, like, a functional API. So if you look at Rust, like, HashMap or BtreeMap or whatever, there's like, you know, there's a ton of different APIs that are all like, some of them are syntactic sugar, some of them are more primitive.

Rain Paharia: 18:15

An example is like, say the entry API, which if you've if you've used Rust maps, you might be familiar with the entry API. So that's an API that lets you kind of say whether an item is occupied or not, and it lets you insert an item. I think it's a it's a beautiful design, but it is a very verbose design.

Adam Leventhal: 18:33

And

Rain Paharia: 18:34

and so the map library I was writing, and I'll just drop a link to it. It's called, IUDQD. This map library had four different maps. Right? And so one of the things I was dreading was, okay.

Rain Paharia: 18:47

Oh my god. I need to write like all of these map APIs four different types, right? And that is just like terrifying. So it's like, okay, you know, you have a prototype and maybe you have like one of those types, but then you have these three other things. And for each thing, you need to go in and update the map type.

Rain Paharia: 19:07

It's just like it would be a couple of weeks of work at least. And it would be pretty hard for me to justify that work as opposed to kind of just ambling along with the default maps. But then I really also wanted to get this in the hands of my coworkers because I actually really excited about this pattern. So what I ended up doing was that I ended up handwriting one of the maps. And then I told, I think back in the day it was like Sonnet 4.1 or something.

Rain Paharia: 19:38

Right? So this was, you know, we were like, we were a couple couple generations before. Right?

Bryan Cantrill: 19:43

Back in the day of like eight months.

Rain Paharia: 19:45

Yeah. Right. Right. And and so I just told it to kind of replicate, you know, the same APIs across all of the other maps. Right?

Rain Paharia: 19:54

Yeah. And it just nailed it. Right? It just, like, it it just, like, it it, like, you know, it like there were, like, local differences to things that kind of adapted the map types to those differences. This was like, I wanna say a total of around 20,000 lines of code.

Rain Paharia: 20:12

Then I asked it to generate doc tests. And and, you know, like, one of the one of the things you should do for and if you look at, say, the Rust core types, like, you will see that, like, every method has a doc test associated with it. Right? And so, you know, I wanted to get that kind of rigor, right, where, like, every method has a doc test associated with it. And I I don't know about you, but, like, I hate writing 5,000 lines of doc tests.

Rain Paharia: 20:34

Right? And I just told the LLM to do that. Right? I kind of, you know, guy I gave it a couple of examples to start with, and I just told sonnet4.one, I think, to do that. And, you know, it just kind of replicated that the things that it wrote, like, thousands of lines of doc tests.

Rain Paharia: 20:50

And, you know, this work that I'd been dreading because it would be, like, weeks of work, it took me, like, I want to say, like, less than a day to get, like, the whole thing ready. So it was three weeks of careful, deep analysis and work and thinking about unsafe and so on. And then one day of I was talking to someone in Blue Sky about this, and I think they described it as like a pattern amplification machine where

Adam Leventhal: 21:17

Interesting.

Rain Paharia: 21:18

Right? And so you give it a pattern and it just kind of amplifies that pattern into the rest into, you know, whatever degree you want. Right? There's like, I spent, like, the thing is that before LLMs, would have probably, like, I would have, like, investigated, like, a code generation library. I would have, like, tried out macros or whatever.

Rain Paharia: 21:38

And all of them have, like, some downsides. The LLM kind of doing things and tweaking things locally as it went along and things like for a B Tree map, it'll say ordered and for a HashMap, it won't say that. Just making sure that the documentation is all aligned and everything. It was like that was my first experience, and it was like a great experience where, like, it wasn't a one shot, but it was like, I wanna say, like, maybe, like, five or six prompts total, and it just kind of just nailed it. So that was my first experience.

Bryan Cantrill: 22:14

So about yeah. A bunch of follow-up questions. So that's really interesting. So one, I mean, this is I mean, this is the kind of tedium that you do kinda like I just said, you say about the doc tests. We all know the doc tests are great.

Bryan Cantrill: 22:25

As a user of something, you really appreciate them. Right. Just takes a lot it takes a lot of time to, like, to get that working correctly. It's really easy when you as a as a human arc I mean, like bluntly cutting and pasting. Right?

Bryan Cantrill: 22:40

As when you are cutting and pasting Yeah. It's super easy to make a mistake where it's like, oh, that doc test, by the way. Have you looked at the doc test? Like, that actually, you just cut and paste it. You you change it in in two places, but not the third.

Bryan Cantrill: 22:53

And so now, like, what you have is kind of nonsense in the test. Like, well, that's that's not very good.

Rain Paharia: 22:58

Like, or the test is testing the wrong thing. Right? Like, they're testing the wrong method or testing the wrong structure or whatever. Like, it's so easy to make mistakes here.

Bryan Cantrill: 23:05

So easy to make a mistake. Yeah.

Rain Paharia: 23:07

Yeah.

Bryan Cantrill: 23:09

It's okay. So another question I have for you because the other thing is that when you are I mean, as you say, it's it's I've got the pattern. I want you to replicate it. It also makes for a code that's pretty easy for you to review. Are you like this is kinda reminds me of my experience.

Bryan Cantrill: 23:22

Like, I I I pretty much know exactly what I'm expecting here, and I'm gonna be able to review this pretty quickly. Rayne, one question I've got for you, because one thing that was super surprising for me is and like, look, maybe, hopefully I'm in a safe space here. Right. Right. Like, you you I've got the the brain that I engage when I'm writing my own software, and I struggle to engage that when I'm reviewing someone else's software.

Bryan Cantrill: 23:49

You know, I try to. Yeah. And and the best reviewers, I think, are able to review code as if they themselves are writing it. And I think I I but to me, like, I really have to work on that. And I definitely know when I'm in the, yeah.

Bryan Cantrill: 24:04

Yeah. This probably works mode of my brain versus the, no. No. Wait a minute. Like, this like, I I I need to, like I I'm in, like, doing my checklist before takeoff, and I'm like, I'm gonna die in this airplane if I don't get the flaps down correctly.

Bryan Cantrill: 24:17

So I'm like and the thing that was super surprising to me is that when I was reviewing Claude's work, I was in that mode of, like, I'm writing this myself. And, like, very heightened state of alert, really reviewing things closely, finding some subtle things in the script. Did you find the same when you when you were reviewing the code that that it had written?

Rain Paharia: 24:38

In this case, I think so I had I have I have the same struggle that that you do. Right? Like, where I'm, you know, when I'm reviewing code, especially when I'm on, like, look look on github.com. At the I'm sure we all have our complaints about the GitHub, you know

Bryan Cantrill: 24:53

Yes. If

Rain Paharia: 24:54

UX, I'm sure. Right?

Bryan Cantrill: 24:56

Yeah. Like oh, by the way, like, here, let me show you all the trivial stuff. The nontrivial stuff, I don't know. That's a lot of file. That's a lot of lines to render.

Bryan Cantrill: 25:03

Let's not review that.

Adam Leventhal: 25:04

Too big. Why why bother?

Rain Paharia: 25:06

Why bother? Right? So I had a bit of the same experience. I feel like I was kind of somewhere in between here where I think much of this depends on how or at least for me depended on how intensely you and the LLM were pairing with each other right

Bryan Cantrill: 25:26

yeah

Rain Paharia: 25:26

so I've had experiences with an LLM like so for this for this one of an LLM I just like you know it just was doing its thing and I was not paying a huge amount of attention And then I ended up, like, reviewing it, and and, you know, it it, like, made, like, maybe two or three mistakes. Right? But, like, also, like, I feel I felt like, you know, I was pretty assured by the fact that all the hard bits were kind of handwritten and then, you know, the LLM was just like wrapping those hard bits. Right? So it was like, it was doing like relatively easy things.

Rain Paharia: 25:56

There have been other things that I've used LLM, for and especially like Opus 4.5 over the holidays. And that's for those ones, like, ended up having it like this very intense, like, mind meld bearing session. And, like, that felt like, you know, I knew every single line of code and what it was doing. Right? And so I was, like, you know, carefully kind of working through things, and that was, like, a wild time.

Rain Paharia: 26:22

But, like, I felt like it depends on kinds of mode I end up using it. And so so, you know, it it depends. But I do you know, like, even, like, the current LLMs and and, again, this can change because I know I know things have advanced so quickly. But even current LLMs have they they they do get things wrong or they do things suboptimally or do they do they think did do things in a way that's unmaintainable. And you do have to pay attention to that.

Rain Paharia: 26:48

Right? And that is part of the rigor, which is, like, okay. Like, I feel like I have built up some muscles around this from having used it. Right? And so I think part of the rigor is also, like, getting some practice with, like, looking at LLM code and reviewing it.

Bryan Cantrill: 27:03

Yeah. Interesting. So the so so in this first use case Yeah. You I've got, like, I've got a lot of just tedium that needs to be done. And I do think that I think is really interesting about the about this case is you you're doing something we do a lot, which is like, okay.

Bryan Cantrill: 27:17

I've got this problem. I kinda wanna solve it in a way that's a little more generic where I where my my my colleagues can use it and so on. But we we always have the tension. We always on the one hand, we always encourage ourselves to, hey. This is a good opportunity to build a new abstraction to if you think this but we're also all kind of realistic.

Bryan Cantrill: 27:36

Like, yeah. But, like, we can't, like, not ship the next release or what have you because we're kind of focused on you know? And and there's always that balance. And to take this thing that like, oh, this to to to reduce the amount of work involved in this by a factor of four Yeah. Maybe the difference between doing it and not, you know, where it's

Rain Paharia: 27:56

like straight up. Right? Yeah.

Bryan Cantrill: 27:58

Right.

Rain Paharia: 27:58

Yeah. I I think I I actually suspect David has a few things to say because I know David and I have some have had some chats about this. But, like, for me, like, there are, like, new vistas that open up, and I think that's the way I think David put it. Right? So there there are things that were simply not feasible to do given, you know, company priorities and, like, personal life stuff going on and, like, all the different things that that are involved in, you know, a human's life that I feel like have opened up.

Rain Paharia: 28:26

Right? And so for me, like, IDD QD actually, like, the goal of this library was to increase the amount of rigor in our software. So I think it is very cool that, you know, is able to kind of work on this. Right? So this is a way you increase rigor is you build an abstraction that increases rigor even if it is tedious.

Rain Paharia: 28:44

Right? That that is an increase in rigor, right, in the overall system.

Bryan Cantrill: 28:48

Totally. Yeah. So, David, I mean, you were as Rain points out, like, you were among the earliest adopters at OXET. I think you've really shone the light for a lot of us and and, you know, showing what these things can and can't do. Do you wanna talk a little bit about your experience of kinda getting into this?

David Crespo: 29:03

Yeah. Yeah. Yeah. I I mean, for a long time, I think until this year, really, when Claude Code took off, I was using LMs as kind of like a fancy search even before they weren't really even before they were actually search engines, and, you know, everyone was like, not a search engine because you're getting this very lossy picture of what's in the model weights. Even then on things that they were trained very well on, which is, like, what I work on, web dev, they were great even, you know, just as just for retrieval.

David Crespo: 29:27

So I was using them a lot for that or, you know, small snippets. This year, I think, is when it really, took off that the models could really, do more complex autonomous, things based on a very small description. And more importantly, I think, pull in, like, what you were talking about where when when the Cloud Code is looking at the Illuminus code that you have on disk, it's pulling in context that it doesn't have. And that's very different from

Bryan Cantrill: 29:53

Right.

David Crespo: 29:53

Yeah. You know, it's not so much you know, the typical use case the typical use is, you know, you ask it a one sentence question, and there there's only so much detail that you can get back out of it because there's just not enough texture in the question to tell it what to tell you back. And so, like, when, you know, I gave the the talk about LLMs at at Oxicon in September, A lot of what I stressed was, like, the the way to set up the problem for yourself is, like, you wanna give it enough so that the answer is in some sense contained in what you give it. And what these agent tools do by just living in a in a repo and pulling in whatever context they want is, like, that they give themselves that that texture and context. So that's that's really what's changed, this year from the way I was using it a really long time ago.

David Crespo: 30:36

Was like, I was trying to you know, I wrote a CLI that lets you pass stuff on standard in, and you could dump files into it. But, you know, giving the things the ability to just do that stuff on their own, it makes things so much easier because you don't have to, you know, manually select a list of files to that's worth looking at.

Bryan Cantrill: 30:51

And so what what kinds of things were were you kind of where were you first really beginning to use this to beyond just search or what have you, where you're beginning to like, okay. I can actually use this to I can pair with it, as Rain was saying.

David Crespo: 31:04

Yeah. The early things this earlier this year were things like stubbing out. Like, I would stub out a test. This was this was before they've got good enough to really, like know, you can tell it the kind of the shape of the set of tests that you wanted. It'll write 50 tests.

David Crespo: 31:15

Before that, it was more like you would write the title of the test and maybe five comments saying the steps of the test, and it would fill in. You know, it would still feel great because you'd be saving all this typing of the most tedious kind. You know? Make this request. Check this, you know, this on the response.

David Crespo: 31:31

That was where it started to feel like it was really helpful, and I think I gave some some demos of that kind of thing where it's like, you know what you want, and you can tell it piece by piece, and it'll fill it in, and it would do a good job. This is kind of what people are talking about, you know, in examples where they it it can follow a pattern really well. Like, if you give an one example, you do this thing yourself once, and you need to do it five more times, it can follow that pretty well. But more recently, it seems like it's it's you know, with Opus 4.5, it's been able to figure that stuff out on its own even without the stubbing out of all the details. One example, the thing that really impressed me when Opus first came out was something quite different from from you guys' examples because it was an example where I'm not an expert.

David Crespo: 32:13

It was specifically something where, like, it was kind of a pure test of the thing's ability because I didn't know anything about what I was doing, I was still able to get to a surprising good result. And this was debugging crashes in the ghosty terminal. So I ran into a couple of crashes. I've never written a line of Zig. I don't know anything about the code the the ghosty code base.

David Crespo: 32:33

I've never looked at a crash dump to my shame as an Oxide employee. So but I you know, a few crashes that I wanted to investigate that, know, there were things that I couldn't find anybody talking about them on the Ghosty GitHub, so I figured they were pretty rare. So I I looked into them, and I just have Opus essentially figure out. I you know, the only thing I really had to do was find the Rust port of Minidump Stackwalk to, like, look at the at the crash dump and pointed at the problem, and and and I knew where the crash dump was located on my disk. And then from there, it basically was able to, like, look at the code, statically analyze it, and find the source of these of I found three different bugs this way.

David Crespo: 33:13

Real and then these was able to write up the bug reports, and they were confirmed to be real bugs and and and fixed. And I so that was what really unsettled me was that this was an area where I really knew nothing and just using my sort of, like, sense of what sounds like it makes sense to validate that I wasn't gonna be posting AI slop on the ghosty GitHub, I was able to come to, you know, three real, bug reports without really putting myself putting very much into the process.

Bryan Cantrill: 33:38

Yeah. That is wild. And so they are were they primarily operating on the Stack Backtrace, or were they stack backtrace plus was it actually walking data structures, and was it actually, like, meaning

David Crespo: 33:48

There was no live debugging. I think it was looking at the stacktrace and then looking at the code that and, you know, the the the error that came up and then just sort of thinking about what the what could have happened in the code to cause the error.

Bryan Cantrill: 33:58

Interesting. Yeah. That that is interesting. I want to be using them as debugging tools a lot more, and I'm I'm very curious about this use case. So that is I mean, and that is wild.

Bryan Cantrill: 34:12

So when you you submitted the, I mean, this great thing about Ghostie being open source, and, Mitrakshimoto's project is like, you know, I mean, I would just like to say good on Mitchell. Like, as obviously does not need to work for the rest of his life, has made generational money, and he's writing a TTY emulator. I just think that it's, you know, that's pretty great. I think that is every software engineer's dream. But then making it open source.

Bryan Cantrill: 34:40

Was the reception to the but you said these were these are bugs confirmed to be bugs. So it sounds like what I found was legit.

David Crespo: 34:47

Yeah. You know, part of it was that, you know, the bug report itself was even to some extent out of my depth. Like, a couple of them I was really confident, and then one of them I was like, it sounds really good, but I just wasn't able to you know, I didn't know enough about how Ghosie worked or how Zig worked to Yeah. To really evaluate. So I was nervous, but I was, you know, upfront with a lot of humility of, like, I'm really not sure about this, but I but it sounds so good that I cannot hold it back.

Bryan Cantrill: 35:10

Okay. Alright. So what let's actually talk about the first two where you're like, okay. I don't know any Zig, but, like, I'm a software engineer. I've I've I I know many other programming languages, where you were like, okay.

Bryan Cantrill: 35:23

I'm pretty sure that I just based on the its description and me looking at this code, I'm pretty sure I've got a legit bug here. Could you describe kind of those first two a little bit in terms of, like, what did you what I mean, you see you you had confidence. You go like, I I can actually not knowing very much Zig or knowing only the Zig I've learned, I think I've got a legit bug here.

David Crespo: 35:44

Yeah. One of them was very simple because it was like a copy paste error where they were just referring to the wrong variable, and you could tell, you know, it was supposed to be grapheme bytes, and it was hyperlink bytes. You know? And and you could tell that that was, so it was like, okay. That sounds pretty pretty straightforward.

David Crespo: 35:59

Another one. This was, like, two months ago, so I can't

Bryan Cantrill: 36:03

Yeah. Yeah. I'm so sorry. The two months ago being several, yeah, eons ago, especially in LMMA, Adrian.

David Crespo: 36:08

Yeah. The really complicated one was that, you know, something it was a mutex, lock that was, like, not being taken at the right time, and so there was, a conflict in, and and so, you know, reasoning about that was pretty tough for me not understanding how the code worked. But it was pretty impressive that the the model was able to see it. You know, like, this is where you should have taken a lock and you didn't.

Bryan Cantrill: 36:27

Yeah. Okay. So another thing that I think is really interesting is the and then so Mitchell himself replied to, on you you've linked all of the issues in the chat. So, obviously, people can get those. I think one of things that really like about this, David, is that, like, you lead off by saying, like, look.

Bryan Cantrill: 36:45

This is I've been using Cloud Code with Opus four five to investigate. You're very upfront with, hey. I this an LLM has done the work here as a way of, like, saying, I'm not I like, someone else is gonna need to look at this who's got greater domain expertise, which I think

David Crespo: 37:00

is This is worth Yeah. It's it's worth looking at and go see specifically. They have a very they have a quite clear LLM disclosure policy. So Mitchell has been pretty open that he uses LLM's, tooling, but he also has they really want upfront disclosure. Yeah.

David Crespo: 37:14

So they made it easy by telling me exactly what to do.

Bryan Cantrill: 37:17

Yeah. That's right. I was more worried

David Crespo: 37:18

about sort of the the embarrassment if my issue was fake for me to be like one of those guys posting issues that are fake that the LM told them, you know, was a bug.

Bryan Cantrill: 37:27

Yeah. Well and the but but, you know, you you obviously quadruple checked all this stuff, and it looks like you had so alright. So so this experience, as you said, was I mean, like, the the way you say it was, like, unsettling. When you describe it as unsettling, why why was it unsettling?

David Crespo: 37:44

Yeah. Well, I thought this was such a clear cut case where it was obviously not my expertise that was operative here because I didn't have any. You know, I had like, I there was some high level, like, I could tell that it it felt legitimate. And there I think there may have been one or two things where it came up with something that that I that I feel like that doesn't sound real. But, you know, the amount of guidance that I actually provided in the process was a was a very small proportion of of what actually took place.

David Crespo: 38:09

That was what what I think felt unsettling about it. And, you know, the the guidance that I did provide also didn't feel, you know, that ineffable human taste that that people love to attribute to themselves. It really wasn't that. It was like finding the the rust port of the of the stack trace, you know, symbolicate or whatever.

Bryan Cantrill: 38:29

Right. Right. What I do love the fact that you're like I'm actually even a little bit embarrassed. You say in your these these issues that you're like, I'm I but it's also getting like, what do I what else do supposed to do? Like, this thing crashed.

Bryan Cantrill: 38:40

Like, I I just supposed to, like, not give someone the feedback that this thing has crashed and I've got, like or am I supposed to just, like, sling an issue in there with I mean, it's like it just feels like you're being actually helpful to the project.

David Crespo: 38:56

Right. Well, if it had turned out to be fake, I wouldn't have been. Yeah. Guess If my diagnosis was wrong, then that was just creating work for them. So it hinges, pretty tightly on on the fact that they were legit.

Bryan Cantrill: 39:08

Yeah. Yeah. And that it's it was okay. So I think it goes to know, we've we've talked a bit about RFD five thirty six where we kinda talk about our own LLM thinking at Oxide, and it just goes to that, like, having empathy for the person that's gonna read this and the making sure that in this case, contextualizing it. But also like it sounds like you're doing your own checking to make sure that the degree that you can.

Adam Leventhal: 39:35

Yeah, it's You know, it's interesting. A lot of these sort of attributes or these qualities that we attribute to LLM generated code are all things that as we're talking about, like, I've associated with other colleagues. I'll I'll just I'll I'll provide an example. I just mean, when you're doing a code review, Brian, I think my guess is, like, the degree of scrutiny that you feel yourself applying may change depending on where that code came from. Like, certainly does for me.

Adam Leventhal: 40:04

And not so much at oxide, but, like, when I was at Sun, there were some times I'd get a code review of, like, I really need to imagine what I would been been like to write this so that I know what I'm looking for. In other cases, you're like, well, there's some code and there's some tests, and I'll I'll look around. But, you know, a lot of the thinking has probably already been done.

Bryan Cantrill: 40:23

And then On that, like, from my you're exactly right. And, like, I mean, this oh, look. I'm just ashamed to say it, but I'm gonna say it. Like, the the way I would review code from like a a nemesis, you know, a nemesis integrates code and you're like, I am gonna I'm gonna get my and I'm like, one of the things I realized I needed to do was for my own self review and for reviewing people that were not my nemesis, I needed to, like, channel that dark part of my brain that's like, I'm gonna I I'm gonna find the this thing in here. And that's like I mean, it's embarrassing to say, but it's definitely true.

Adam Leventhal: 40:58

Yeah. Well, I I do that because, for you know, when I'm reviewing someone who I consider a friend, and I wanna do them the service of helping them with their code. But I guess we're just motivated differently. That's fine.

Bryan Cantrill: 41:07

Yeah. Okay. Okay. Are you it feels like you're just trying to explain away a bunch of the code review quick comments. Very good code review comments you get.

Bryan Cantrill: 41:13

Feels like a a comments you'd give a nemesis. But

Adam Leventhal: 41:16

That's right. Okay. Alright. One man's nemesis is another man's friend. There you go.

Adam Leventhal: 41:20

Oh, but and then, know, David, as you're as you're describing, you know, it's like, you don't wanna file a crap bug report. Like, man, have I seen some crap bug reports where, you know, people take you on this wild ride through a core file, and you end up just nowhere. You're like, I okay. I'm following, but, like, all of this is just blather. Like, you don't need an LLM to hallucinate.

Adam Leventhal: 41:43

Like, we've been doing

Bryan Cantrill: 41:44

that. Yes.

Adam Leventhal: 41:47

And we've seen these bug reports where you're like, okay. Like, you have there's certainly a lot of information here, but you've actually not contributed. So that same that empathy you're talking about is is so at the core of of of engineering, full stop, irrespective of the tools we're using.

Bryan Cantrill: 42:04

Yeah. No. That's a very good point.

David Crespo: 42:06

It is really infuriating to see, like, a a bad, you know, AI bug report. Like, I'm probably more optimistic than most people about LLMs, and I think part of that is just, like, working at Oxide. And I don't really see anybody doing the pathological things that I that we hear about online. You know, everybody's so careful and serious at Oxide. So I wonder I worry that I'm biased toward optimism because I'm not seeing the, like, the median user of these tools.

David Crespo: 42:31

But then, you know, I see one example. I get one bug report on a on a repo that I'm actually familiar with. I'm like, forget it. Throw these out. We're done.

Bryan Cantrill: 42:38

Yeah. But I but to Adam's point, like, I don't like that when you get bogus bug reports without LLMs either. I mean, when you get or you get, like, bogus PRs. Yeah.

Adam Leventhal: 42:47

But it's harder to write off all of humanity. Like, you can write you can write off LLMs.

Bryan Cantrill: 42:51

I guess that's true. Yeah.

Adam Leventhal: 42:52

It's more it's more limiting if

Rain Paharia: 42:54

you

Bryan Cantrill: 42:55

Yeah. Yeah. Absolutely. Because we but I do think that that, you know, you get people a lot and we definitely have this happen where we will make things that are that we are open sourcing, not making a big deal out of it. We're not trying to create a community out of it.

Bryan Cantrill: 43:08

We're just open sourcing it kind of hygienically. And then someone will come along with a kind of like spurious PR to change things. We're like, no. Sorry. No.

Bryan Cantrill: 43:17

This is like not LLM assistant. This is in the pre LLM age. It was like, no. No. This is sorry.

Bryan Cantrill: 43:22

This is actually not helpful. So, Adam, just to your point that, like, the the the the lack of empathy is, is is definitely try by PR ing is not new as as someone's pointing out in the chat.

Rain Paharia: 43:32

I think the difference, though, is that, you know, like, LLMs will, like, do amplify that problem. Right? Like, you know, you can kind of get I was talking about this with someone. Like, you can get something that is not great by in, like, a few minutes as opposed to maybe a few

Bryan Cantrill: 43:51

hours. Great.

Adam Leventhal: 43:52

You're you're so spot on. And I I think we my experience has been, like, a unproductive, like, not empathetic colleague. Like, that's fine. Like, if I I can run faster than you. I can keep up.

Adam Leventhal: 44:06

Like, I'm gonna like, you're not gonna outrun me. I don't need to, like, worry about kind of diverting you in the wrong places. A highly productive, unempathetic, careless colleague, like, that's it takes way It's dangerous. Take a 150% of my effort just to, like, keep them from doing harm. And you're right, Rain, that, like, it takes that formerly plotting, you know, colleague who you or or or collaborator who you had to, like, keep on the rails.

Adam Leventhal: 44:34

It keeps makes it much harder to steer them.

Rain Paharia: 44:36

Yep. Yep. It's like a Gishkalup almost. Like, I feel like that's that's how I think about it. Right?

Rain Paharia: 44:42

Where it's like a Gishkalap for issues. I've luckily not faced too many, like, crap bug reports. I've seen some AI bug reports, but they've all been, like, very high quality. So, you know, kind of at the standard that, like I think I would expect myself to write a bug report. So again like I am biased towards optimism here but it is something I'm worried about like I do look at people just you know putting up garbage and it's like okay well it's it's now harder to filter out garbage or you know I I have to say on the flip side, a thing I've done is I've used opus4.5 and fed it a bug report and told it to tell me whether this bug report is real or not.

Bryan Cantrill: 45:22

Yep.

Rain Paharia: 45:23

So I, yeah, so that's that's you know, may maybe that's the way to keep up.

Bryan Cantrill: 45:27

It's just

Adam Leventhal: 45:27

like the

Rain Paharia: 45:28

LM stuff. Yeah.

Adam Leventhal: 45:28

It's like some open source Javan's paradox or whatever. Like, there's no money involved here, but I just mean the the cost of creating PRs and projects and all of these things has dropped so much that the volume has just accelerated.

Bryan Cantrill: 45:42

Well, I think I also do think that with these open source projects, especially, I mean, you know, god bless small communities where you've got I mean, it's like, I I mean, I I would be almost, like, intrigued by someone who's like, I'm gonna use an LLM to file a bunch of bugs against a Lumos. You're like, that's weird. I mean, that's like, that's not a versus like

Adam Leventhal: 46:01

Talk to someone about Yeah.

Bryan Cantrill: 46:02

Talk to someone. Yeah. I mean, like, I'm almost like I'm almost like not opposed. That's a that's a very okay. Versus, like, a project I mean, certainly, we saw this with Node where, you know, I I've been in very, very large projects with many, many, many contributors and very small projects.

Bryan Cantrill: 46:17

And there's a lot to be said about being in a small project, and a lot to be said about a project that doesn't attract as much attention because it doesn't attract as much of that kind of negative attention either. So there's, there are I think this problem is, I'm sure there are, there there are some high profile repos for whom this problem is really, really acute and, you know, maybe that was that way with with Ghosty and Mitchell. But for a lot of the stuff, at least I work in, it's it's not really an acute problem. Yeah.

David Crespo: 46:46

I've been surprised that the tooling for maintainers hasn't been able to keep up. I mean, you could expect some lag. Right? There's the u the volume of of garbage has to balloon for a bit before it becomes such a big problem that people are incentivized to put some work into solving it. But I think one of the things we're gonna see in the next few months is maintainers more openly using LLM tooling to, like, cut through that morass of of of AI bug reports and

Bryan Cantrill: 47:13

AI And for and for code review too. I mean, think just like it is Yeah. The code review is I mean, honestly, like, my eye opening moment with respect to LLMs and software engineering was on Oxide and Friends when we had a listener who had access to GPT four when I did not have it. And I Adam, for some reason, I can't even and and, you know, this you've got to figure out exactly when this was. I guess it's a little hard to ring the chime for an episode that I can't I can't recall more yeah.

Bryan Cantrill: 47:42

Yeah. Yeah. Just ring it. Exactly. Give give the people on YouTube something to complain about.

Bryan Cantrill: 47:46

But we, and I'll go back and find the episode, but the thing that was really interesting is I had had like a PR that day that I was I linked to, and someone dropped in a g b t four code review of that PR. And I'm like, wow. This is not all wrong. Like, is not also not great, but this is definitely not like garbage, the the the the comments that it has. And that was a long time ago with respect to LLMs.

Bryan Cantrill: 48:12

And I I mean, code review, it just feels like the opportunity for code review is really rich. And, Dave, to your point of, like, not really giving like, why don't maintainers not have, you would you would and maybe they do, and I'm missing it about, like, it just doesn't feel like GitHub is providing it. Anyway, why am I doing this? Of course,

David Crespo: 48:48

$20 a month a seat. Can write that in a script. So it's like so I wrote a script that pulls the diff, the comments, the PR body, and just feeds into an LM. Just says review this. And, you know, you you have downsides there where you're you know, a lot of times there's additional context that's not in the diff.

David Crespo: 49:03

Like, if you're using something that is imported in the code already, the import is not in the diff. So it's gonna say, are you sure you're importing this? It doesn't know that the test pass. Doesn't know that CI passes. But you can get quite a lot that way.

David Crespo: 49:15

You know? It can find, you know, a mismatch between your SQL migration and your main DB in it SQL. It can find inconsistencies really well and inconsistencies in between, like, your human readable stuff, like your PR description and the actual code. There's a lot you can do there, even without what we now have, which is, like, the tools that can go and vacuum up anything that they need to and to to validate their hypothesis about why the PR is broken. With Cloud Code, it can write a new test that validates that the code as written doesn't do something.

David Crespo: 49:49

You said there and there's a lot of low hanging fruit there that we're not we're not really touching at all.

Adam Leventhal: 49:53

On the topic of low hanging fruit, I think my nemesis on GitHub is stale bot. Right? Like, I I hit some real bug. Oh.

Bryan Cantrill: 50:02

Stale bot.

Adam Leventhal: 50:04

I feel like LLMs could slay stale bot. You know, they're like, well, this bug is six weeks old, so I guess nobody really has it or whatever. It's like, nope. There's a crash dump and a core and a X trace and a bunch of information

Bryan Cantrill: 50:17

Yes.

Adam Leventhal: 50:17

And at least helping to weed through that so that you could sort of deter maybe neuter stale bite a little bit or or make sure you keep around the things that, like, refer to real problems with sufficient data to diagnose them or maybe diagnose them autonomously. But, you know, so a lot of the work that you I know, think there's a theme here. But a lot of work that just doesn't get done, that could get done, and in some cases, maybe doesn't need the highest level of sophistication to complete. You know, those are great tasks.

Bryan Cantrill: 50:49

God, Adam, you are so right. And and god, I hate stale bot with the white hot passion of 10,000 sons. I stale bot is such an indictment. I mean, like, we don't talk about stale bot enough. We don't we like, for all of you decrying the future, look into the past.

Bryan Cantrill: 51:09

Stale bot is I mean, stale bot is everything wrong because, I mean, it is just like, oh, well, no one has seen this issue in six weeks, so we're closing it. Like, how does what? How does that make sense? That doesn't make sense.

Rain Paharia: 51:22

The weird comes from Facebook, by the way. I know Facebook had an internal sale wide, and then someone built an external version. And and, I mean, it's the Facebook culture. Like, it was exactly the kind of thing that you would expect Facebook to make. And so you know?

Adam Leventhal: 51:36

Move fast and leave broken things around?

Bryan Cantrill: 51:39

Yeah. Move fast and close out bugs that haven't had any activity in the last forty eight hours? Okay. Then yeah. But it is also so gross because like but no.

Bryan Cantrill: 51:49

No. I made look. I made the dashboard green. It's like yeah. So Adam, you were right.

Bryan Cantrill: 51:56

I hadn't even thought about what this means for stale bot. I the stale bot, you are a marked bot. I I hope LLMs because that's a it's a great point of just, like, if nothing else I'm sorry. If, like, if we get rid of stale bot, was worth it all.

Adam Leventhal: 52:10

I I gotta say. It's I mean, you're gonna be up till, like, four in the morning, vibe coding, like, the joker of of, Stalebot. You know? I'm just gonna just trail Stalebot around, like, reopening and fixing bugs that it's trying to close.

Rain Paharia: 52:25

I actually have an example of of a bug that I feel like I would have just ignored, in the past, but had a much better time with OPUS 4.5. And so, this is a bug on, you know, on Cargo Next test, is a personal project. And and the bug is titled SIGTTOU when test spawns interactive shell. Now if if you've spent any time on this stuff, you're like, eyes my eyes glazed over pretty much. Right?

Rain Paharia: 52:51

And so it's like, okay. You know, this person actually, did a nice investigation with Claude and kind of posted this and said that they, I've worked with Claude to get good attribution for this and reproduction. It wrote the words below, but I stand behind them. Right? And so it was like a pretty well written issue, But, you know, it's the sort of thing that I really would want to dive into.

Rain Paharia: 53:13

Like, you know, it it kind of gave an example of, like, you know, this is used by all these other projects, and so you should do this thing as well. And it is, like, it is one of those, you know, things where, like, okay. You gotta spend, like, a whole day, like, investigating what the other projects do and how it fits and, like, you know, really getting to the root of the problem. Right? And I I'm, like, pretty lazy generally, and I'm, I don't wanna do that.

Rain Paharia: 53:39

And so, you know, I'm, I would either do a half assed thing. And, honestly, in the past, I would just do what, you know, what the suggested fix was. Right? It turns out that the suggested fix is actually, like, woefully incomplete, which is, you know, which is where, like, I feel like, you know, I I kind of gave this to OPUS four dot five. And so, you know, one of the things it said is that, like, less in VIM and a few other projects follow this pattern.

Rain Paharia: 54:00

Right? So I actually gave it the less source code, and I gave it the VIM source code, and I gave it the source code to, like, a bunch of other things. And I was like, okay. Like, dig into this. Like, what do these projects do?

Rain Paharia: 54:11

And so this kinda comes back to the, you know, asking questions of code bases you're unfamiliar with. And so I did that. Right? I had no idea about the less code base. I had no idea about the VIM code base or anything.

Rain Paharia: 54:21

And so it spent like ten minutes or so, and it actually wrote up a nice summary of like, here is what all the projects do and so on. Right? And then I, you know, I kind of, you know, I was like, okay, you know, this makes sense. And then, you know, I tried that and, and so, you know, it was like an interesting, like, it took me like maybe two or three hours to do. And the final fix for that was like pretty small.

Rain Paharia: 54:44

It was like 130 lines of code or so. But like, it was great because like, you know, we tried the first thing, right? We tried the suggested fix. I, you know, the LLM did the work and the LLM kind of, you know, wrote the test, which is its own annoying and janky in its own way. And then, you know, I kind of tried that.

Rain Paharia: 55:02

I I, like, dogfooded it a bit. I found that, okay, this isn't complete in various ways. And then, you know, we iterate a few times. And so there are so many places along the spot where pre LLMs, I would have just dropped out and be like, ah, this sucks. I don't wanna deal with this.

Rain Paharia: 55:18

You know, I'm done for the day or something. Right?

Bryan Cantrill: 55:21

Well, it and totally, like, when you do that, like, you're gonna kinda take one of two things. It's gonna be like, oh, we'll just take this kind of, like, mediocre fix. Right. Or it's gonna be the, like, maybe I'll just let Stalebot finish this one off.

Adam Leventhal: 55:33

Right. Right. I don't have to kill it. I'll just I mean, even this title reign, like Yeah. Sig t t o u.

Adam Leventhal: 55:42

Okay. Fine. Sig t t y out.

Bryan Cantrill: 55:44

What? No. And you gotta, like, the queue, like, antiques roadshow. You gotta be like, okay. Like, I'm now gonna go into, like, POSIX signal semantics from I mean

Adam Leventhal: 55:52

Well and and then you're like, when test spawns interactive shell, it's like, well, here's a thought.

Rain Paharia: 55:58

Don't that. Don't do that. I mean, yeah. No. Seriously.

Adam Leventhal: 56:04

Yeah. But but the yeah. It makes it attainable, and makes it makes you get past the, like, have you tried not doing that? I don't know. That sounds like a dumb test.

Bryan Cantrill: 56:11

Yeah. Okay. And so but and then this the thing and I do think, like, this is a really important point because then okay. So you you pick this up now properly because it's easier. We've lowered the the friction.

Bryan Cantrill: 56:22

You actually get this completely fixed. Getting this fixed makes next test more robust. It makes it more rigorous. Like, you've actually like I mean, on the one hand, it's like, oh, okay. Really?

Bryan Cantrill: 56:34

I mean, as you say, Adam, like, maybe don't spot a new doctor. But like, hey, actually, no. Now you can though. You know what I mean? And I think that like, I just see this in lots and lots of places where we are gonna make our infrastructure actually more robust because we can now go pick up a bunch of work that we just weren't gonna get to realistically.

Bryan Cantrill: 56:53

We, the people who work on this lower level infrastructure, we're just not gonna get to.

Adam Leventhal: 56:58

Yeah. So I have an example of some some work that I got to finally. I I I mean, Rain described described herself herself as as lazy. Lazy. Rain, I offer this counter ex I got like, I I think you're kind of bringing a knife to a gunfight here.

Adam Leventhal: 57:12

But

Bryan Cantrill: 57:13

Yes. Out lazy Adam, will you? Yeah.

Adam Leventhal: 57:16

Oh, boy. Because and, Rain, you know this. Like, I have been wanting to do an OpenAPI diff library since before you joined. I'm sure I've been, like, talking up this vaporware of, like

Rain Paharia: 57:27

Yeah.

Adam Leventhal: 57:27

And and I I made multiple earnest attempts at starting it, and it was just it's just one of these pieces of code that's like, there's no good way to do it. All the ways to do it are gross and boring and stupid. And, actually, it doesn't even it this is not, like, your case that code running in privilege mode or whatever. This is some code that, like, if it segvs, like, it reboots the machine somehow, like, it's fine. Like, I don't know.

Adam Leventhal: 57:55

Like, it's just not that high stakes. Very low stakes. Yeah. And the thing that got me across the line was, you know, I started using some of the open API open API excuse me, OpenAI models in Versus Code and mostly using it just through the lens of of a very smart completion. And it allowed me to kind of repeat this pattern as and that that I was that I wanted to use to make sure I wasn't forgetting to compare certain things.

Adam Leventhal: 58:23

And as Rain was saying, absent this, I would have, like, written some code to write code. I would have, you know, written some Earl script or something stupid to output a bunch of code or or use a Procmac or something like that. But my my real so that was great. And I actually got the thing working, and it was really fun to build. My real breakthrough was then it was coming to demo day, and I want to show it off.

Adam Leventhal: 58:46

And this is a library, so there's not, like, a front end to this thing. So was like, okay. I'll write a little CI tool. And literally, all I wrote was function main, open a comment, said parse the first two command line arguments, and literally the rest of the program, I just had tab completed. It's like, I think this is it figured out this is probably the program you wanna write.

Adam Leventhal: 59:06

I had to fix a couple little things here and there, but it was very eye opening. And and then that became my demo in addition to the actual library I built, but was like the the live coding the live, I guess, coding of just hitting tab and watching it do the thing that it inferred I wanted to do.

Bryan Cantrill: 59:23

So I would argue, Adam, that you have embodied all three of Larry Wall's famous virtues of a programmer. That you're you've shown your laziness, your impatience, and your hubris in a stroke. The but I actually but this point of laziness is really important because we all know and we we kind of speak about euphemistically as laziness, but we all know that like a hallmark of of good software engineering is coming up with powerful abstractions. And when you are kind of repeating code multiple times, that part of your brain is like, this is not the right abstraction. And because Adam, both you and Rain mentioned like, I would have made this a proc macro or I would have done something Yeah.

Bryan Cantrill: 01:00:07

Where because, like, I I think we, like, over index on that where we're like then this this whole, like, the dry thing that do not repeat yourself where you become so over indexed on it that you then do things that are actually either generating suboptimal artifacts. So it's like, there are times where it's just like, actually, it's just not that big of a deal to have code that is like similar but slightly different in three places. Like, it's okay. We're all gonna live. But we really resist doing that, and LMs make it easier to kinda do that.

Rain Paharia: 01:00:38

One of the things that annoys me the most, and, you know, I'm very, very grateful to the Rust open source community. Right? And and this is just to qualify that this is nothing this is not gonna be negative at all, right, to the open source community. But one of the things that really annoys me is, like, you know, in, like, the Rust doc, you click on the source link. Right?

Rain Paharia: 01:00:57

And the source link leads you to, like, a macro definition. Yeah. You see that for, like even for, like, the in the standard library, there's a few examples with, like, the integer types. So you cannot click, for example, you wanna see, like, the next power of two implementations, which is, like, I mean, you know, it's a bit manipulation thing. I wanna look at that.

Rain Paharia: 01:01:14

Right? You click on it, and it doesn't show you that. And it's, like, really sucks, and I hate it. So I kind of have made it a point in my libraries that I would much rather copy paste code just so that you can click through the source link and you can get it. Right?

Rain Paharia: 01:01:27

And so, you know, and so macros are just like, you know, they don't work with that. And but the d r the don't repeat yourself is to use a macro. So it's like, okay. Well, LMs actually do provide a kind of a a better solution to that.

David Crespo: 01:01:41

Yeah. Totally. A lot about this because, like, if you think about where our intuitions come from about what is worth abstracting or, like, what is too much repetition, they're so tied up with the expected reader of the code, whether that's, like, ourselves or the people that we know. Like, they're they're they're tied in very, very tightly with our intuitions about what people can handle and what's reasonable to expect of other people. And what is reasonable to expect of an LLM is radically different from what is reasonable to expect of other people.

David Crespo: 01:02:13

And so, like, the amount of repetition that that is tolerable in a code base or is manageable is is seems to me, like, what may way higher. And it's not just in a code base. Like, we were thinking about, API design, API response shapes earlier, and we, with I was working with Adam on this that we we errored on the side of making a response shape, like, more flat and less nested even though we lost a little bit of type information that way just because the flat one was a little easier to read and the type information we decided was not really worth it keeping in that particular case. But I think when you assume that future developers will have LMs at their disposal, I think that it must tilt the calculus toward encoding more type information at the expense of readability because that type information is what is gonna keep LMs on the rails, in the future.

Rain Paharia: 01:03:01

It's like, you know, all the good practices, like, make invalid states unrepresentable and all of those things. It turns out that all of that actually helps LLMs a lot too. Just go

Adam Leventhal: 01:03:10

with Yeah. Is there anything like does is do LLMs like Rust? I know that's a weird kind of question, but I I I've wondered that if the the the things that we appreciate about it in terms of, like, not being able to, like, represent invalid states and so forth, if that is a useful property when LLMs are constructing code.

Bryan Cantrill: 01:03:34

I a 100%. I mean, I feel we said this when we again, ringing the chime for unknown episode. But I feel we we said this when we first started talking about LLMs and Rust that, like, actually, Rust is gonna be a really good fit for these things because you get the it get I mean, Rust I'm I have something I've said from the beginning that Rust shifts the cognitive load to the developer in development, and it forces the developer in development to consider a lot of issues that historically you wouldn't see until some code is deployed into production. And I loved that shift. I think that shift is really important.

Bryan Cantrill: 01:04:08

I think that, like, that tacks right into what LLMs can do, and I think that it's that it they reinforce one another. So I think, like, the I LLMs, I think, are and Rust are a very good fit for one another. It's what which I don't think is that hard to take. I don't think that that that's that spicy.

David Crespo: 01:04:25

And just say, like, more you know, if a more elaborate type system lets lets you put more of that work in upfront to sort of constrain the program further, you could say that LLMs, like, allow you to tolerate an even more elaborate type system. Like, maybe now dependent types are gonna be feasible to for people to learn and and work with. Maybe you can see interest

Adam Leventhal: 01:04:44

Like like

David Crespo: 01:04:45

know, take off.

Adam Leventhal: 01:04:46

Like, if there's a diesel, error, like, you'll be able to understand what it means.

Bryan Cantrill: 01:04:50

Yeah. Easy. Easy. Easy. Easy.

Bryan Cantrill: 01:04:52

We're not like, listen. We're not living in that kind of a future yet,

Adam Leventhal: 01:04:55

pal. That's that's good. Yeah. Artificial superintelligence required for that.

Bryan Cantrill: 01:05:02

No. I think the ASI is gonna be like, I've I've actually I actually don't know what this that or messenger. I can't make sense of this thing. It's it's like it's a two k long. Yeah.

Bryan Cantrill: 01:05:12

That that is really interesting. When I just think it's in general having the, great type information. I mean, the the code that I would be scare that to me would scare me the most would be just like pure JavaScript. I know it generates a lot of it, but we use TypeScript. I mean, Dave, correct me if I'm wrong.

Bryan Cantrill: 01:05:29

We we use TypeScript from more or anything everything. It would really terrify me to use because it's just so easy to have a an issue that doesn't show up until you get into runtime.

Adam Leventhal: 01:05:40

So my blog is uses a static site generator written in JavaScript, and I and I don't really know JavaScript. I mean, like, I I know I can a few bars, but but that's kind of it. And I used LLMs a lot to sorta get things the way I wanted them. And part of it is like, I don't give a shit. Right?

Adam Leventhal: 01:05:59

Like, it's a static site generator. Sure. Yes. It's gonna it's gonna generate it statically. And there isn't some, like, runtime edge condition I need to consider.

Adam Leventhal: 01:06:08

So it's like, go for it. So I I just be in in some cases, like, it's gonna be you know, depending on the context, you know, I think that and I think that may be true in many JavaScript contexts. And that's why in the cases where people are writing front end code and they have additional rigor they wanna apply, they're using TypeScript or or more robust languages. Yeah.

David Crespo: 01:06:30

Yeah. Yeah. Just to defend the JavaScript world a little bit, I I think it's a the spectrum of, you know, rigor that you might need, it it applies in a lot of different situations. Like, you might make a one off Rust CLI as a debugging tool in that same situation. Like, you can tell if it works by running it.

David Crespo: 01:06:46

You know? The sort of the depth of static analysis, you you don't really need that because you run the thing and it does what you want, and you can tell that it worked. You know? So there there's a lot of situation. I think that's kind of an underrated point that, you know, people assume it's all or nothing.

David Crespo: 01:06:59

That the code needs to be perfect or or it doesn't work at all, which is, like, ridiculous. You know? Our most rigorously engineered code is still gonna have some bugs in it. So, obviously, there's sort of a spectrum of of amount of, bugs that we can tolerate or amount, you know, of leeway that we have.

Bryan Cantrill: 01:07:13

Yeah. Yeah. No. That's fair. Rain, do you want to talk about some of your more recent experiments with LLMs?

Bryan Cantrill: 01:07:21

Because you've really kind of gone nonlinear with some of the the things that you've been and and I think in in particular well, because, like, I mean, getting past the, like, okay, these things are kind of experimental and getting into the, no. No. Like, actually, we're gonna at some level, we're going to I mean, I I I don't wanna say assume them because we're not really assuming them, but we're but we we kind of acknowledge that, like, these things can actually it can be used as part of software engineering. What do you wanna describe some of the things that you've done recently?

Rain Paharia: 01:07:52

Yeah. So this is kind of a project that, you know, kind of a bunch of us were discussing, and I decided to take on sometime around early December. And and so the project here is that, you know, we as as some of I'm sure some listeners may have heard of, like, we have done a lot of work in building automated update for our system. Right? So we have, you know, the self-service update, thing now.

Rain Paharia: 01:08:17

And one of the things it has to cope with is the fact that, you know, you are not gonna be able to update everything atomically. You're not taking the entire system down and back up again. And so you need to deal with, like, how do you manage this and this kind of skew while an update is happening. So, you know, like, colleague, Dave Pacheco, kind of has done this, like, genuinely brilliant design where, like, you know, there's kind of the server side APIs, which is the idea is that this is an API that can talk to multiple versions of clients. And so, you know, you update the server first, you have, like, this DAG of dependencies that you update.

Rain Paharia: 01:09:01

It's just like this, you know, really well constructed system. It's pretty great. Right? So one of the issues we'd ran into is that as we gained experience with the system, we were having trouble figuring out how, like, you know, you have all these different versions. And so you have like a type, right?

Rain Paharia: 01:09:19

And that type and like has the same name, but it has different like fields, for or like, you know, maybe one of the subfields is different. And so how do you actually like store those in the repo? Right? Like, and it sounds like a simple problem, but this actually blows up and comes like this incredibly complicated problem with many, many, many, different, factors involved. So, you know, kind of I did again, like, this is one of those things where it's like this combination of human and LLM work where, kind of, you know, I spent a bunch of time, you know, prototyping a bunch of things and like, you know, kind of coming up with a with with like, you know, an approach that works and that satisfies all of the hard soft constraints as we can.

Rain Paharia: 01:10:08

So this was like a lot of work. And then one of the interesting things that I found really useful for LLMs is that what I did was I ended up essentially compiling the set of, things that, you know, the final state we wanna get to and, like, writing it out as a set of instructions that both a human and an LLM can follow. And so in this guide that I dropped a link to, this is RFT six nineteen. And in this guide, it is in section 5.1. And so this 5.1 is kind of this initial migration, right?

Rain Paharia: 01:10:47

And so again, like, you know, spent, like, a couple of weeks working on this on on, like, this whole RFT. And then what I did was, like, okay. You know, I'm, I just kind of fed this guide into the LLM. And, you know, I told it to, like, migrate a small repo. Right?

Rain Paharia: 01:11:03

One of this one of our smaller APIs. And it just did that in one shot. So this was, like, you know, not a very big API. It just did that. Right?

Rain Paharia: 01:11:12

I found that, okay, there were a few things I was unsatisfied, but so I went back and like changed the guide, I updated the guide, I kind of started from scratch. So, you know, like iterated on it. I want to say like overall, like this guide kind of went through maybe a couple dozen iterations of like me looking at the LLM output and being like, okay, you know, this is great or this is not good and so on. And, you know, kind of basically ended up converging on something that is like this clear, very reproducible set of instructions that are simply way too complicated to capture in any deterministic algorithm, right? So this is like, you know, there's like enough judgment here, and it's just like this really complicated set of things that, I mean, you know, there is no way I can write, like, a migration tool to do this.

Rain Paharia: 01:12:00

I mean, maybe someone smart in me could do that. I I don't think I can. But what what the LM let me do is it kind of, again, like, let me, like, this guide once and then, you know, apply it, like, everywhere. So it was funny because, like, there was one morning where I just, like, rapidly put up three PRs where, like, you know, the first one was like a thousand lines of code, the second one was 2,000 lines of code, and the third one was 3,000 lines of code. And I got like all three of those done in like an hour.

Rain Paharia: 01:12:29

And that was just wild. Like, you know, and this is like one of those things where like it turns out that LLMs are really, really good at following instructions that are clearly written and are written in a way that the LLM kind of works well with. So I had a This is, again, one of those things where it sounds so mid priority, right? It's like, how are we going to migrate 40,000 lines of code and rearrange the types? This is the kind of thing that just falls, people just don't do, or we might do it in the future and there's this long migration period, or this is the kind of thing that we do in tech debt week, this would be more like tech debt month.

Rain Paharia: 01:13:12

But like, you know, and LLM, as I said, just nailed three different APIs in one hour. And that was just like it just blew my mind. It's like, oh, you can you can spend two weeks carefully designing a thing and then just have the LLM just, like, repeat that pattern over and over again. It was also really helpful for, like, the process of iterating on the guide itself because, like, I would just, like, you know, it's like if there's something I'm not satisfied with or, like, you know, maybe maybe one of our coworkers had some feedback on something, you know, I could, like, very quickly, like, update the guide. Right?

Rain Paharia: 01:13:45

And then I would be like, okay. Run, like, JJ diff on, you know, the changes that I made and, like, replicate those changes into, you know, this, like, prototype that we're working on. And it just, like, did that. And it was amazing. Like, it was one of those, like, wow.

Rain Paharia: 01:14:00

You just you could just you could just do that.

David Crespo: 01:14:02

Something people have mentioned in the chat, and we haven't talked about too much is that, you know, something that really helps the LMs in these kinds of loops is they have a signal, like a verification signal that can tell them when when they're done and how far away they are from it. And, like, types passing, obviously, test passing is is one of those things. But I'm I'm curious, like, how how you think of what the verification signal is to the LLM as it's doing this. Like, is it just does it satisfy these plain these, you know, natural language requirements?

Rain Paharia: 01:14:30

Yeah. So so this is yeah. This this is an interesting question. Right? So in this case, you know, we kind of had, like, a couple hard verification signals.

Rain Paharia: 01:14:38

So so the first one was just that, you know, what what you described, which is, like, you know, like the code compiles, right? That is kind of the most fundamental requirement and then the test pass. And we have a lot of deterministic validation. In fact, a bunch of this actually uses the work Adam was describing on like comparing, being able to compare open API documents to make sure that if there are changes, those changes are only trivial ones. And so, you know, we put a lot of work into that.

Rain Paharia: 01:15:05

And so having all that deterministic validation was really helpful. What I ended up doing for for, you know, some of the more, like, the fuzzier signals here was that I basically kind of, you know, after kind of did this work, I would, like, start a new context window. I would feed the guide again, and I would ask you to carefully review the, you know, the the the current ER for performance with the guide. Right?

Bryan Cantrill: 01:15:31

And Claude's like, who the hell wrote this? This is like

Adam Leventhal: 01:15:34

this I right.

Rain Paharia: 01:15:36

Yeah. And it's like And and so that ended up finding, you know, a bunch of degrees of freedom, some of which I wanted and some of which I didn't. But that was a good experience. I would just do that two or three times. Then obviously I would go through and manually review and make sure that everything kind of aligned.

Rain Paharia: 01:15:52

But again, that felt like a very quick process because I was just able to maybe spend five minutes doing the migration and then another five minutes reviewing it. And that was it, right? It's wild.

Bryan Cantrill: 01:16:06

And this is kind of just a much more elaborate example of really your the the IDD QD example that you had that where, like, look, I've done this once. I need you to kind of do it in these subtly different but important ways that are kind of tedious. I mean, is just a a, in many ways, a much more elaborate version of that where it's like, okay, this is the the we've designed this RFD very deliberately with and a lot of engineering has gone into WIC the way we think about doing this. And we have that has come out from actually, like, doing it by hand and so on for a and now we actually need to seem to, like, kinda knock this out for a bunch of these different services.

Rain Paharia: 01:16:50

Someone in chat described it as, like, using English as a programming language. And, yeah, mean, that this is basically, like, you know, using English as a programming language for programs that are just too hard to write in a deterministic computer language, that's what it felt like doing. I think it's actually, you know, it's kind of remarkable. Like, these are the kinds of things that you would absolutely have humans do before the advent of the stuff. And, like, it feels like, okay.

Bryan Cantrill: 01:17:21

Or not Right? They they like I mean, just to your point, it's like the work that is, like it's just the the work is just, like, not done. And you have been like someone's like, hey, I was in this service and it has a different like, what's going on over here? It's like, oh, we just haven't gotten to that one yet and go into our this dashboard from two years ago and we're waiting for the next, you know, tech you I mean, tech debt week. I'm just like, oh my god.

Bryan Cantrill: 01:17:44

I can I can feel like my the tech debt flu coming on for tech debt week? So, you know

Rain Paharia: 01:17:49

I mean, it's And and, like and for me, like, you know, I think there's a there's a way David put it that was really memorable. Like, a code, like, that uses, like, LMs extensively had better be the best freaking code on the planet. Right? Like, if you're if you're doing this, like, you like, all your, like, code should be extremely tight. You know, you should, like your your, you should like put all the work into refactoring, like good documentation, like all of these things that I think, you know, many of us feel like are, are, you know, maybe, maybe kind of slipped down our priority list there it is very helpful to think of these tools as not ways to improve the velocity of what you do, but ways to improve the quality of what you do.

Rain Paharia: 01:18:31

And so I'm like, you know, if there is one thing that I think I want people to take away, is like slow down. Right? Like, don't just, like, you know, spit out as much code as possible. Instead, like, use the LLM, right, which is a tool there to be like, okay. You know, maybe let's refactor this.

Rain Paharia: 01:18:47

Maybe let's, you know, split this up. Like, there's so many things you can do to improve code quality along the way that will lead you to higher code quality than you would be able to do in the same amount of time.

Bryan Cantrill: 01:18:57

Rain, I I just cannot emphasize enough how important it is that if you're if you're listening to this as a podcast where you like, please go back and re listen to what Rayne just said because I think this is so important. And I think it is so important to realize that you've got this power now to go deliver a higher quality artifact. Like, yes, the world emphasizes like the the the the velocity, which a term that I, again, don't like because it makes us all sound like projectiles. But the this is what what it allows us to do is do things that we simply never would have gotten to before that allow for more rigorous artifacts. And I think that you can make an argument that that the world that the software rewrite is gonna kind of bifurcate.

Bryan Cantrill: 01:19:45

Adam, some of it is gonna be your JavaScript in your static site generator, which which to quote your own language back to you, you quote do not give a shit about. Am I am I understanding that correctly?

Adam Leventhal: 01:19:56

I stand by that. Right.

Bryan Cantrill: 01:19:59

And but then in order for like underneath that is now these rigorous artifacts that we actually in a world where we're doing much more software, we actually need these rigorous artifacts to actually work much better. And, you know, I I think that, like because I don't I mean, I and I think this is, with the the I I for, this is, the fossil time hour, Gen X fossil time hour where I would maybe we can knock down some things on people's bingo cards. But when you talk about, like, software in the nineties sucked, and operating systems had bugs that you would hit frequently. Compilers had bugs that you would hit frequently. And the I mean, what ultimately, like, the day I put c plus plus down is because I was dealing with two different compiler bugs simultaneously.

Bryan Cantrill: 01:20:43

And just like the the and then getting basically random results. I was just and that was common in the nineties. And Yeah. Man, like, go click go have a compiler bug to really, like, take the wind out of your sails, let alone two of them. We and we needed to get to a world where we had open source artifacts that we could make much higher quality.

Bryan Cantrill: 01:21:05

And the quality of software went way, way, way up. As a result, we could do more of it. And I mean, I boy, do I see that happening vividly here.

Adam Leventhal: 01:21:14

Yeah. No. I think I think you're right that that, you know, the reason why I don't care about the quality of my blog is, like, yeah, that that that's not a foundation on which I'm gonna build, you know, decades worth of technical innovation. That's like Yes. One and done.

Adam Leventhal: 01:21:30

And I think there's lots of software that kinda fits that model. And I think that's where you kinda get the slop, you know, slop where pejorative term. But for some of this code and, like, it it's sort of fine. Like, if you're if you're building something that is a one off, it is associated with, like, some time and place and whatever. Fine.

Adam Leventhal: 01:21:52

Like, whatever. I don't know. And, yeah, there's gonna be a lot more of it, and that's frustrating. But on the other hand, for the stuff that is foundational, that has always been rigorous and the rigor is increasing, this becomes a lever by which the rigor continues to increase.

Rain Paharia: 01:22:08

Yeah. I mean, for me, it's just like I there are so many things that I feel like I've been able to do with this to increase rigor. Like my interest, I've, you know, I've quoted a couple of things here and there, but like my interest as a professional is really focusing on rigor. My background is in dev tools where like correctness is like absolutely essential and nonnegotiable. And for me, it's like, okay, there are so many more tests that I'm writing now.

Rain Paharia: 01:22:37

The other day, was like, I wanna learn how to use Kani, which is this model checker for Rust. And and I wanted to use that. Right? And I'm like, I there's always been this activation energy. We have to go read the documentation and stuff.

Rain Paharia: 01:22:51

And so what I instead ended up doing with this is that, you know, I took an existing project that I had, which I felt like was a good fit for Kani. And and I just asked CloudOpus four dot five to, hey. Like, you know, come up with a few properties that we can can verify that way. And it just did that. And I'm like, now I understand how this stuff works and what the limitations are and stuff.

Rain Paharia: 01:23:09

And just like like, there are so many ways you can kind of use this stuff to to go, like, increase the level of rigor in your software. And, honestly, it really bothers me that the dominant narrative is the whole, like, slop, like, good stuff. Right? Because Right. Like Yeah.

Rain Paharia: 01:23:24

Our infrastructure engineers, there is so much more you can get out of it.

Adam Leventhal: 01:23:28

Yeah. But is that is that always been the case for the kinds of code that we care about, Rain? That, like you know, one of the things that's beautiful about Oxide is we go to a demo day where, you know, we show off you know, Ray, you show off this 30,000 line change or whatever, or I show off, like, this library that compares one thing to another thing. And it's like people are hooting and hollering as opposed to you know, systems demos are traditionally seen as boring. And the thing that's wizzy is when, you know, you can demo something cool and graphical and whatever.

Adam Leventhal: 01:24:02

And, know, rigorous is not to everyone have the same kind of sex appeal.

Bryan Cantrill: 01:24:09

Yeah. Totally. I though, I I think and, Rainey, but you're also right about the dominant narrative. And I was trying to think about I mean, because clearly, it is truly a dominant narrative and that it's dominating kind of everything. And Adam, I was trying to think back in terms of our careers.

Bryan Cantrill: 01:24:22

When have you had these kind of like big narratives where it feels like it's reductive? You know, one thing I was thinking about was the rise of Java was that way. Where the rise of Java was really suffocating because there was this idea and it's, like, very different. So I don't I don't wanna be too productive here. But with the rise of Java, there was this idea of it's like, it's the end of every other programming language.

Bryan Cantrill: 01:24:43

Like, this is this is what we're gonna do. And it and, like, this is kinda crazy to think about that the because this is the has I mean,

Adam Leventhal: 01:24:53

right,

Bryan Cantrill: 01:24:53

it is. It's like it's a it is it's it's humorous now, but it was at the time there was this idea that everything's gonna be in Java. We're gonna do the operating system in Java. The microprocessors are gonna execute Java byte code. We are lit and and I mean, at Sun at the time, it was it was like, I know this is not right.

Bryan Cantrill: 01:25:14

And I'm like, Java is like really powerful and important, and it's gonna allow many more people to write software. And I remember thinking at the time, like, well, at least it's the death of C plus plus. But what took a while for people and some, like, some failed experiments. Right? It took it took Nano Java and Pico Java inside of Sun and a bunch of I two different OSs inside of inside of inside of Sunrun and Java.

Bryan Cantrill: 01:25:40

So there were a bunch of, like, where we got and then people were like, okay. No. This thing is, like, it's important and it has a role, but it's not everything.

Adam Leventhal: 01:25:48

And It wasn't just all languages. It was operating systems and operating environments. Right? It was like the right ones run anywhere meant. Right.

Adam Leventhal: 01:25:57

Don't have to worry about the the details of Mac and Windows and Unix and and all the different flavors of Unix. No. You just write it once, and you're ready to run it anywhere. And and it it meant all of that other stuff was just gonna become meaningless. And the only thing that that was gonna matter with Java, you're totally right that it it it took all the air out of the room for, like, a big chunk of, like, the the late nineties, maybe early two thousands.

Bryan Cantrill: 01:26:26

Yeah. Totally. And if you were implementing in c, it's like, well, I I hope the past is working out for you. I mean, this is the whole idea of, like, you are you are actually a living fossil, and Java is actually gonna come to replace you. And, you know, and in some ways, it was like, I actually I really do think it was kind of worse because if you were doing as what we were doing, like, you know, we're in the operating system developing this thing in C, it's like Java didn't really have anything for us.

Bryan Cantrill: 01:26:51

You know? It was not like, oh, I mean, we did it around the margins, but not like our tooling. I I mean, even the the kind of the value that Java legitimately delivered, we didn't really realize any of that. And, you know, ultimately, we we we ultimately had a good relationship with Java, but it wasn't like whereas I think with, like, LLMs, like, no. No.

Bryan Cantrill: 01:27:11

You can actually everybody can kinda up their game with this thing in a way that's really exciting and uplifting.

Adam Leventhal: 01:27:19

Yeah. Spot on. Well,

Bryan Cantrill: 01:27:23

David, Rain, anything else that we I know there there's obviously a lot to talk about here.

David Crespo: 01:27:29

I think we covered everything there is to say about LLM.

Adam Leventhal: 01:27:33

Finally, finally.

Rain Paharia: 01:27:34

I mean, the thing I will say personally is like having a culture where writing things down is valued is like, you know, it is like a real multiplier here. And so mean, our side, I'm very happy that all of this work that we do, we now have a new way to gain leverage from all this writing work that we culturally do. If you're at a place that maybe doesn't have as strong rigorous requirements or isn't as committal as oxide where we ship hardware or whatever, I would still consider doing work to write things down and produce good documentation, good design documents because, like, l at least the current generation of LLMs, like, really like that. And so, you know, like, kind of, you know, get get a little more disciplined, right, with some of these things. Right?

Rain Paharia: 01:28:25

So, yeah, that's that's what I would say. Like, write things down.

Bryan Cantrill: 01:28:28

That's a great advice. And, Ewren, let me ask you to expand on that just a half a beat because I do feel as as part of of Deep Blue Uh-huh. You do have especially I and it's unclear to me, by the way, if this is truly young people of, like, undergraduates versus a a kind of a more mid career malaise, and maybe that's just like maybe Deep Blue cut cuts across all of it. But people who are wondering, like, what is, you know, how can I what is my role in this kind of this new LLM age? Yeah.

Bryan Cantrill: 01:29:01

What what would be some advice that you would give to an engineer that's early in their career, looking at this stuff?

Rain Paharia: 01:29:08

Honestly, like this is kind of the advice I'd tip. Like I would say like practice, you know, like writing. Like for me, like writing is not a natural skill for me. This is something that it's taken me many years of work to kind of get where I am now. I would say like if you're starting out, like practice writing, don't have the LLM write things but like have feed it into the LLM and see how it behaves when you kind of do that and like practice do you know that is the one bit of advice that I think I think this is the kind of advice that, you know, is, like, timeless in the sense that we have always written things down, and we will always keep writing things down.

Rain Paharia: 01:29:45

There's always a lot of value to that. But I think the in the LLM age, like, this is one of those ways where you can really multiply the amount of rigor you have.

Bryan Cantrill: 01:29:54

Yeah. That's great advice. I think the advice I would add is like, hey. You can now you've got the ability to pick up a new language, pick up a new system much more quickly than before, and you should use that as a way of getting into something you maybe would have been intimidated by. Mean, I do think that like I mean, look look, kernel development feels intimidating to people.

Bryan Cantrill: 01:30:13

Lots of people don't pick up kernel development because they're intimidated by it. And if you view an LLM as like giving you the opportunity to jump start you in kernel development, go for it. That's great. Like, it that that's that is has got a very, robust basis. So hop in there and, you know, hop into to a Lumos or or something that you wouldn't do otherwise, maybe a database, what have you.

Bryan Cantrill: 01:30:37

And use that LLM to get you jump started and to get you mastery over this thing.

Rain Paharia: 01:30:42

LLMs don't judge. Like, ask all the questions that

Bryan Cantrill: 01:30:45

you'd be you'd be embarrassed to

Rain Paharia: 01:30:46

ask a human.

Bryan Cantrill: 01:30:47

Like us.

Adam Leventhal: 01:30:49

If anything, they could judge just a little bit more. Be like, that is kind of a bad question,

Bryan Cantrill: 01:30:53

but I know. This okay. This is why pair programming never really worked out for me is because you always have someone being like, you used what? You don't use Dvorak? Like, I thought like, no.

Bryan Cantrill: 01:31:03

I don't use or, like, you're like, do you actually know that there's actually a faster binding? That's like, no. Can can we aren't we, like, trying to work out this problem together? Like, why are you company? Like, you don't use syntax highlighting.

Bryan Cantrill: 01:31:11

What am I even here for? You know, it's like, okay. We're just now having fights over things. And, you know, it's just like, we don't you know, you don't have those those the l unfortunately, I don't think the l maybe I'm a little maybe I should confide to Claude code that, like, by the way, I don't use syntax. What do you what do you think about that?

Bryan Cantrill: 01:31:27

Let's see what it's but, yeah, it's it's free of judgment, which is really terrific. Well, thank you all. I I I you know, I know this is a hot topic, and, I I I think that I'm hoping that we can show that big moderate middle and, really show that there is a third path that, by the way, is the most likely path, which is that we actually use these things as tools. They're not coming to replace you, but they are actually gonna allow you to do a lot more. And, the the one that should be most worried about LLMs is stale bot.

Bryan Cantrill: 01:32:00

Stale bot

Rain Paharia: 01:32:00

Oh. For you.

Bryan Cantrill: 01:32:01

Oh, I hope so. Death to stale bot, I say. So, Adam, thank you for for, stoking that rage. Thank you all. I think this is really great stuff.

Bryan Cantrill: 01:32:16

Thank you for for coming in on a hot topic. And thank you all in the chat too. I think this is this is really important. This is not gonna be, our last LLM episode this year. I don't think Adam.

Adam Leventhal: 01:32:27

So that's a great prediction.

Bryan Cantrill: 01:32:29

Exactly. It feels like a walk. Alright. Thanks, Ray. Thanks, David.

Bryan Cantrill: 01:32:34

Thanks, Adam. Take care.

More episodes

Chapters

Creators and Guests

What is Oxide and Friends?