Oxide and Friends

Andres Freund joined Bryan and Adam to talk about his discovery of the xz backdoor. It’s an incredible story… so great to get into the details with Andres. We started by ranting about the coverage in the New York Times… coverage that explicitly refused to dig into the details! It’s all the more shocking because the big story here is how Andres’ penchant for digging into the details is what saved us all from what would have been a pervasive and damaging attack!

In addition to Bryan Cantrill and Adam Leventhal, we were joined by special guest Andres Freund.

Our research for this episode:
If we got something wrong or missed something, please file a PR! Our next show will likely be on Monday at 5p Pacific Time on our Discord server; stay tuned to our Mastodon feeds for details, or subscribe to this calendar. We'd love to have you join us, as we always love to hear from new speakers!

Recorded April 8th, 2024

Creators & Guests

Host
Adam Leventhal
Host
Bryan Cantrill

What is Oxide and Friends?

Oxide hosts a weekly Discord show where we discuss a wide range of topics: computer history, startups, Oxide hardware bringup, and other topics du jour. These are the recordings in podcast form.
Join us live (usually Mondays at 5pm PT) https://discord.gg/gcQxNHAKCB
Subscribe to our calendar: https://sesh.fyi/api/calendar/v2/iMdFbuFRupMwuTiwvXswNU.ics

Bryan Cantrill:

This New York Times piece on on our, our hero, Andres, is, is not good. It's bad. No. It's terrible. It is really, really bad.

Bryan Cantrill:

And I'm like, Who the hell wrote this thing? And I'm like, Kevin Roos. I've got I see. I know you fall. I mean, I've got higher expectations for Bruce's work.

Bryan Cantrill:

I gotta tell you. Kevin, if you're listening to this, this thing is bad. Do it again.

Adam Leventhal:

I I gotta can I read aloud the, like, the part that I found was? Hold hold

Bryan Cantrill:

hold on. So are we gonna, like, the passage Yeah. Right. Right down flip it over. Go for it.

Bryan Cantrill:

Read it aloud.

Adam Leventhal:

Well, I already tweeted out. So his job involves developing a piece of open source database software known as PostgreSQL, whose details would probably bore you to tears if I could explain them correctly, which I can't.

Bryan Cantrill:

100% brain blew up on tech.

Adam Leventhal:

It's like You can't. I mean, far like, why I mean, you you are just a journalist. Like, it's not like we expect you to understand and explain details. Like, let's not journalism has very little to do with the comprehension and explanation of details as far as I understand it.

Bryan Cantrill:

Also, like, how is like if this would bore you to tears, if I could explain the details to you, which I can't. This can't you could what would you not apply that to? You could apply that to anything. It's like the the the the the the the 2024 presidential election, which would bore you to tears. You understood that the total

Adam Leventhal:

I mean, it's just what Kevin McCarthy ousted. I mean, details sound really tedious, so let's just, let's just skip to the end of this whole thing. Right?

Bryan Cantrill:

Oh, man. It that it is that is absolutely atrocious. And I hope, I mean, I hope to God you had an editor demanding that you that you insert that parenthetical Kevin Roos, because it is you know what it is? It is like I it's how it was funny that the same thing set you off, Adam. It is it is irresponsible.

Bryan Cantrill:

It is saying that there's something wrong with digging into the details. And that, by the way, things that you don't understand are boring. It's, like, no. No. They they're no.

Bryan Cantrill:

They're not, actually. Things that you you understand are actually really interesting because you actually don't understand the details. Did you keep reading this article after? Because I have to say, when I tell you when I read that article, like, when I read that, then it's it got worse. I feel you keep reading.

Adam Leventhal:

I did. I did read the whole thing. Yes, I read the whole thing, actually. I mean, I embarrassed to say multiple times because, like, I needed to quote parts of it back to my mom who was, the editor of her college newspaper. She had read it in print on the front page of The New York Times and found it just shocking.

Bryan Cantrill:

That's it's great. God bless your mom.

Bryan Cantrill:

So you What's your question?

Bryan Cantrill:

Called you up so you you too could read passages aloud.

Adam Leventhal:

Well, I talked to my I talked to my folks about it because I thought it was a really interesting one and because I was excited to have Andres in the show. And, my dad has listened to an episode of the show. So, one thing the heads up that there was so much excitement coming down the pike.

Bryan Cantrill:

Oh, you right. Exactly. And this led to, obviously, discussing the story, which, again, is not very good. And so because the other thing, the they they have a twist fit for Hollywood. Tech leaders and cybersecurity researchers are hailing mister Froond as a hero.

Bryan Cantrill:

It's, like, is the twist that they're hailing him as a hero? Means they're writing with this bad, Andres, are you are you with us?

Andres Freund:

Yes. I am. I do not know.

Bryan Cantrill:

Welcome. Oh, Kevin Roos thanks you for showing up with working audio because this just was not

Andres Freund:

an end. We were just

Bryan Cantrill:

Adam and me just taking swings at this terrible New York Times piece, so I'm glad you're here.

Andres Freund:

I actually wasn't but that bothered by it. So

Bryan Cantrill:

That's great, actually. That is great to hear. No. Honestly, that is great to hear. I'd I'm I'm glad because this is, you know, it's big news.

Bryan Cantrill:

It is great to have you with us for what is a truly extraordinary story. And I it's it I mean, there are folks who have likely heard aspects of the story, and but there's there's a whole lot of depth to it. There are a bunch of interesting angles to cover. But, Andres, I wonder if if you might just start with with what your role is in Postgres and kind of the walk up to this, because I think your diligence here and investigating aberrant behavior is such an interesting part of the story, obviously. Describe your your kinda your role with Postgres, and, what's your background with respect to Postgres, and what were you doing that kinda walked you up to this?

Adam Leventhal:

Sure.

Andres Freund:

I've been working on Postgres for, I don't know, 15 years or so about. And as part over time, more and more contributions got more and more technically involved or deeply technically involved And I've a lot of work around performance, replication, scalability over the years, and a lot of, I don't know, just finding. And at at some point, I became a committer in the project and I've been at for about 10 years, I would guess. And yeah, so I've done a lot of reasonably sized pieces of work in Postgres and probably one of the people in Postgres that works most on performance and particularly around micro benchmarking or benchmarking and profiling and seeing what's causing scalability issues and stuff like that. So I guess that I had a good, starting point for finding any issues like this.

Bryan Cantrill:

Okay. So and and let me ask you if you don't mind. Because when I actually first came to Sun years ago, I I came into the Solaris Performance Group, which used to being an oxymoron, but I don't think it was. And the but our, I I remember I was driving to lunch with our colleague Kevin Clark, who was optimizing every aspect of his route. And he had a line to me that's like performance is not just a job, it's a lifestyle.

Bryan Cantrill:

Andres, do you have that kind of or do you have that kind is that a lifestyle for you in terms of the way you think about a system and optimizing it? What what drew you to performance in particular?

Andres Freund:

I'd actually no. I think I started out doing a lot of postcards consulting and ended up doing consulting for a bunch of bigger users of Postgres. And they had issues because they were, like, the bigger users of Postgres out there and, they hadn't necessarily optimized for very large machines. And at the time, like, the number of cores were exploding, and they started to to to base for those kind of use case and learn more about how to do that. I think I started out with a lot of knowledge around, any of this.

Bryan Cantrill:

And Interesting. So you as people are are hitting new levels of scale, I mean, that's a great way to discover performance problems because the system is being pushed into a new domain now, and you've got and and so what were some of the the performance issues that you were that you were hitting and resolving over the years?

Andres Freund:

Many. I think I the first really scalability oriented, work was around, because there's a internal lock implementation for that it calls lightweight locks. Not actually that lightweight. So the name is a bit of a Right. Far on that.

Andres Freund:

It's always It's like a basic rewrite a mutex lock, and it didn't scale that well to many, concurrent readers, because there was a spin lock around the lock state, which obviously would not work if you have a lot of concurrent accesses and that I rewrote that to a smart implementation and, do that in a reasonably portable way. Postgres tries to be portable, so we couldn't easily rely on few texts or any anything like that.

Bryan Cantrill:

So you're effectively doing your own synchronization primitive, in shared memory, and this is to allow for because I I think what you're highlighting is a very important point that the that people think of, oh, I'm gonna have, like, a couple of readers and only one writer. So they'll share this RW lock at least in in it would not be Lumos kernel, and it's, like, actually, if you need to modify that state, it's a read to own on that cache line. So it's, like, really no different than a mutex if your old times are really short. But it sounds like you you wanted to build a data structure that would allow for true concurrent sharing? In

Andres Freund:

it's still like a cache line with, state. So Right. The end it like, a simple lock operations are, a single, x add or, like, atomic sub, like a lock release and stuff like that. But, other than that, like, there's still, like, operations that are just a compare exchange, in that cache line. So it's not a completely it doesn't remove, like, log contention.

Andres Freund:

And we definitely could do better. And, like, there's few projects in the works to lock scale better for, like, many waiters at the same time and stuff like that by but it's become much less of an issue over time, compared to when we didn't have the optimized locking. Since then, there have been not many other parts of performance work. I've done a lot of execution performance improvements, and it's also, again, lots of more to do, because it turns out that if you take a university project from the early nineties, to eighties, it's not necessarily written to be the fastest query executor. Yeah.

Bryan Cantrill:

I feel that's that's true. The the part of a great makes software systems great is I feel like it's true of many software systems. It's just it feels like with performance, there's always more you can do. And especially as the demands of the system shift over time, it feels like it just it's kind of an infinite supply of interesting problems.

Andres Freund:

Yes. And there was also just nobody else that was really focusing on that. So I think, actually, I migrated to avoid and what people were caring about. So and that allowed me to become, like, learn more because I was doing it, and, I I wasn't, like, stepping on somebody else's toes and so on.

Bryan Cantrill:

Yeah. Interesting. You know, the the other great thing about performance work is it is as it's work that you can do totally asynchronously. You can just it's just you in the system. You don't have to feel like you're, which which can be, really helpful, I think, if you're leading a project, you're on the kind of post cross core team there.

Bryan Cantrill:

So is it just your habit in terms of, like, the actual just to kinda walk us into this backdoor that you discovered. Were you, looking at your own work, or were you working on benchmarking a release, or was there someone else's work? What performance were you were you measuring here?

Andres Freund:

Just a patch of a colleague, at in some cases, per for improved performance, but in other cases, you knew from theoretical analysis that it would likely reduce performance a bit. We wanted to see how large that regression would be, and the regression was in, like, a fully cached workload. And we didn't want to make that slower because that's not an that's an important workload. So I was profiling, like, before or after the change, to see what the performance difference was. And the performance difference was very small by the time I was actually benchmarking.

Andres Freund:

Like, I think it was more in the 1 to 2% range. So what do you

Bryan Cantrill:

need to or 1 to 2% improvement?

Adam Leventhal:

Because I've

Andres Freund:

seen Regression.

Bryan Cantrill:

Regression. Okay. And then

Andres Freund:

We did Yeah. Because we couldn't the the that patch was making IO path faster, so we couldn't potentially realistically improve the non IO path, but we wanted to make sure that the IO non IO path was not, like, harmed measurably. So he set on to micro optimize to make sure that we click done.

Bryan Cantrill:

Yeah. Okay. That's interesting. And and what was the nature of the work that was improving the IO path? What what was it doing in particular?

Andres Freund:

Work that a few colleagues and I a large project that a few colleagues and I have been working on over the last couple of years is to add proper asynchronous IO and direct IO support into Postgres. And Yeah. One part of that is to make the system aware of, when you read, like, a table or something else that which order blocks are required and to so that we can do, like, a smart read ahead or, something around along those lines and read the data into, shared buffers before we are blocked waiting on that. And the we don't haven't gotten all the necessary prerequisites into, this release, but we what we've been work we have been working on at that point was infrastructure for, like, a helper that makes it very easy to, provide all the necessary information to do smart to read ahead from some reader, like some place needing all those blocks.

Bryan Cantrill:

Yeah. Okay. Interesting. And that makes sense why you're really concerned about, like, okay. We wanna add these hooks that are gonna allow us to actually be much smarter with respect to IO, but we really don't wanna slow down the cached path.

Bryan Cantrill:

Like, we really need and this is where, I mean, I'm sure this is the kind of thing you're looking for where it's like a boy, like, a couple extra cash line misses. I mean, you can you can really easily person whom we perturb that cash path.

Andres Freund:

It's surprising actually how when the number of cache line accesses were, it was pretty much the same. Like, even a few cycles showed up, which

Bryan Cantrill:

Oh, wow.

Andres Freund:

Somewhat surprised me because there's enough other expensive, things in that path that wouldn't have expected to be quite so sensitive.

Bryan Cantrill:

Yeah. I kind of feel like my intuition over to the effect of this stuff is, like, always wrong. When I'm convinced that it's, like, you know, it's, like, this has gotta be a caching issue. It's, like, no, no, it's not bad at all. And then when I'm, like, not even thinking about it, I'm getting my butt kicked by, these kinds of things.

Bryan Cantrill:

I don't know.

Andres Freund:

I've been there.

Bryan Cantrill:

And and what are you using it to, so are you looking at kind of, are you starting I guess you're starting out with a stopwatch, not by looking at hardware performance counters as you're doing this.

Andres Freund:

Yeah. Once you start recording performance counters, like, to be like, particularly if you do it, like, in a sampling based, to the profile, like, that already disturbs the profile enough that I find Yeah. The timing is pretty much, not at all anymore. So I started out by just running NVIDIA and see what the total time is and use profile to see profiles to see what is causing that slowdown.

Bryan Cantrill:

Okay. And so is this where you begin to see, like, wait a minute. I'm seeing I'm definitely seeing some aberrant activity on this. Like, where where do you get kind of the first hint that that something is something odd is going on?

Andres Freund:

In that moment, I was the first hint was that I was seeing, like, repetitive repeated runs of the benchmark have a bit of noise in it. And so, like, results varied a bit too much for my liking. So I was checking whether the benchmark system was actually, like, in an idle state or whether some background process didn't too much CPU. And for that, I looked just at top and saw occasional SSHD processes using a fair bit of CPU, and it's, like, this is odd. Like, it was directly exposed to the Internet.

Andres Freund:

So, like, you'd expect, like, the usual probing, but, like, that should all be rejected, like, very quickly. Doesn't take

Adam Leventhal:

That's super fishy. That's really fishy even on its face.

Bryan Cantrill:

Yeah. But also, it also reminds you of this discussion we had last week in terms of, like, this is the kind of thing we were like, hey. What's going on with those SSHD SSH processes? Like, that that's that can't be important right now. Like, that's not it wouldn't, like, pop up.

Bryan Cantrill:

I assume, Andre, it's like you would pop up and then, like, disappear. You're like, okay. That's not is that relate? Wait a minute. Is that related?

Bryan Cantrill:

I mean,

Adam Leventhal:

are you

Andres Freund:

We know I don't like not understanding what's going on. So I think it's very easy to distract me with, random things popping up that have performance effects because I like if it's, like, something that I understand why it's using CPU, then it's fine. But if it's, like, something I don't understand and bothers me. And it wasn't like the me the first time I was seeing it, like, this like, the benchmarking was going on for, like, several hours. So but just after a while, it's okay.

Andres Freund:

This is something I should need I need to look at.

Adam Leventhal:

Yeah. You know, Andres Brian and I spent years years going around giving DITRIS demos. And and, Brian, I don't know how often you do this, but often my demo would just be, like, what's going on right now on my box? In fact, I I was giving one of those demos maybe a a couple years ago, maybe 3 years ago at Oxide. And I found this bizarre behavior in our VPN software where it was like waking up execing every second.

Adam Leventhal:

So like you it under some of these features. Yeah. Yeah. Yeah. It turns out actually they said it was a known bug.

Adam Leventhal:

So I I should That's a known backdoor. Yeah. That's right. Working as intended. But yeah, like when you start looking at the system at large, you see all kinds of weird stuff that each one of these things can merit its own incredibly deep dive.

Bryan Cantrill:

Well, I think it's it's also I'm so impressed that you because I feel like we you see these things and you're like, okay, this is very strange behavior. And you start to dig into it and it gets kinda stranger. And, like, there's a point, like, okay. I I've gotta, like, go back to what I was trying to do here. I need to and that hitting that balance of, like, no, wait a minute.

Bryan Cantrill:

What is going on here? There's the how did you I mean, you sure you must have had that moment of just like, okay. Is that is this what I wanna go investigate? And then realizing, you know, wait a minute. Something actually very strange is going on here.

Andres Freund:

Yeah. Certainly. I I think I stopped investigating and started investigating a couple times.

Bryan Cantrill:

Right. Interesting.

Andres Freund:

And I'm not sure that I would actually have investigated it if I hadn't seen some weird symptoms, like, a couple weeks earlier in some automated testing of Postgres where suddenly what will grind was after a few package updates was reporting some, think, rights that were not permissible, like, to supposedly inaccessible memory and couldn't figure out what was going on at the time. There was definitely something right at the time of, but I traveled and so I had to stop looking. And then when I'm not, like, doing the benchmarking, we're seeing that the profiles were showing time spent in lip l d m a, which should should really not have any part of this workload. An SSH login should not be any, LDMA because, like, SSD does not use l LDMA directly and, no meaningful journaling or anything like logging going on. So there is also no reason for, like, journal the system need to log anything, to, like, to use compression.

Andres Freund:

So that was where, like, okay. There's definitely something wrong because I'd seen the same symbols in the logrant reports that I now was seeing in the profile.

Bryan Cantrill:

Oh, interesting. Yeah. It's because there's always that moment when you're debugging a problem where you're, like, kind of drifting around a little bit and, you know, you're trying to, like, think about it from first principles and answer questions and so on, but then you get something where you're just, like, blood in the water. Fair.

Adam Leventhal:

You know

Bryan Cantrill:

what I mean? Where you're just, like, no. Wait. What the heck? This is I I don't know if this is even related to what I'm investigating, but this is definitely aberrant.

Bryan Cantrill:

And it sounds like, Andres, is it reasonable to say that seeing LCMA, there was, like, wait a minute. That does not make sense.

Andres Freund:

Yeah. I did not make sense. And I was first thinking it could just be, like, I don't know. The profile is doing something weird.

Adam Leventhal:

Totally.

Andres Freund:

Everyone knows things that bug symbols or I don't know. But, like, all those didn't really seem hidden, like, when I was entering the debugger attaching with the debugger, which was it was just about long enough that I could script it to patch shortly after fork. And I would sit with c, like, okay, this is in somewhere with a corrupted stack or the there's no whatever what's the debug symbol, frame, information to for the debugger to know where are the frame at frame, the stack frames. Right. Didn't make sense and was just showing somewhere, like, a bunch of unknown call, things, and it would occasionally lose track of the stack at all.

Andres Freund:

Like, you wouldn't need know where main is anymore and stuff like that or not main because it would never reach main, but, like, start. And that seemed just wrong. And then I just dug in deeper and deeper, and I definitely, like, lost track a couple times because, they ejected the backdoor into the released tarball. But the first time I well, the released tarball, I actually, incidentally didn't do that, but loaded the generated tar g c or tar, whatever it was, that a source, not a search for GitHub generates. And that is just, like, the normal source, the git checkout, and that doesn't have the problem.

Andres Freund:

So I was trying to produce it with my own build.

Bryan Cantrill:

In That's a joke. At this point, like, what do you think you're trying to reproduce? Are you like, there is, because, I mean, you're obviously going into this thinking this is just a software. It's a very weird bug of some flavor or another where we're pulling in this thing that we shouldn't be pulling in.

Andres Freund:

But that didn't take too long to think that, oh, is there something bad going on? Because

Bryan Cantrill:

Really? That's a

Andres Freund:

There's no if you ever like, I don't know. But it's very rare to see, like, actual CPU usage be, like, before you reach main. You can have that if you have, like, some gen if you're linking, I don't know, to LibLVM or something with which is ginormbus or, I don't know, LibreOffice. Okay. Then you could perhaps expect, like, meaningful time in the dynamic linker, but that couldn't be the case for SSHD because, like, the libraries that it links to aren't that large.

Andres Freund:

And

Bryan Cantrill:

there was a

Andres Freund:

really good explanation for any of this. I thought for a while it could just be a bug and, like, I know the the thimble that, is injected is called, I don't know, underscore get underscore CPUID, which I think is just named to similar to some, I guess, gilipsc function, which is underscore underscore gore get CPU ID. And, that that should never take long. And it was very quickly visible when single stepping that it was, like, way more than just, do a CPU ID computation. And then it was clear that there's something weird going on.

Bryan Cantrill:

Okay. So and so you're seeing just to to kinda replay that. So you're seeing this CPU utilization heavy CPU utilization before main. I think I I would still assume that you've got a just like a in a net section of some shared object that is that is being, accidentally quadratic as our their colleague, Matt

Andres Freund:

Peters. Absolutely b, but it was actually before any of the init callbacks that called.

Bryan Cantrill:

It was before okay.

Andres Freund:

Yes. So that can I mean, that took me a while to figure out because I didn't actually know how the part of ELF works? Okay. But, like, the I set a breakpoint in the in I've seen for the point where it calls the in it callbacks, and those are actually after it calls all like, loads all the shared libraries into memory and has performed relocations because we could have cross calls across all the shared libraries. And so you have to first perform all those relocations.

Andres Freund:

This part and the way the I the the backdoor injected itself was that it this are like a special kind of relocation, basically, where the target address is computed. And that's was where it was, like, doing the initial start of the injection. That was what was so suspicious.

Adam Leventhal:

Yeah. And when you when you say it was taken a a while before we even got to main, like, how how long was it taking?

Andres Freund:

That's those 500 milliseconds.

Bryan Cantrill:

Gotcha. Crazy.

Andres Freund:

A while I figured out that.

Bryan Cantrill:

Before man. Crazy.

Andres Freund:

It it got a lot easier after I figured out that I could reproduce this without a running SSHD.

Bryan Cantrill:

Yeah. Interesting.

Andres Freund:

At some point, I forked my investigation to see why I couldn't reproduce a problem or critical problem in running SSHD from a console. Then everything was normal. Only when I started starting it with phone system d, Problem was, it was also clearly suspicious.

Bryan Cantrill:

Okay. I was gonna ask. Because, like, is it like, when is that kind of first moment of, like, I think I'm seeing something nefarious here, versus something that's just very odd and just a very strange bug? Is it w when you were running it differently? Or, I mean, w when does that kind of that, because that must have been a wild thought.

Andres Freund:

I think there was a single moment.

Bryan Cantrill:

That's Interesting.

Andres Freund:

It was, like, gradually increasing. To be honest, like, there's also some aspect of, like, after that once I figured that out that it was nefarious, like, such a whirlwind, I kinda don't have the timing Oh, of like, down anymore. Like, because Oh,

Bryan Cantrill:

oh, I absolutely. Time would stand still. At that point, once you are, like, I am looking at a backdoor, I can only imagine that that you would not. I you'd be so locked in at that point that, I mean, I think how many, like, what is the timeline here from first starting to to experiment with this patch and benchmark it to, wait a minute. They've got this extra CPU utilization.

Bryan Cantrill:

I mean, like, I I could imagine that happening over days or weeks or maybe hours, but it feels like this is a overall slightly longer period of time.

Andres Freund:

Yeah. So the first time I saw, like, those wall grind errors, that was, like, about a month before I looked at the CPU utilization. Yeah. And I had kind of forgotten about that. And even like the first bit or like half an hour, hour of looking into the CPU utilization, I was like, this all looks very familiar in some way, but I could actually remember what even though it was only, like, a month.

Andres Freund:

But after a while, like, it came back and luckily, I had, like, a shell history of what I had looked at. And

Bryan Cantrill:

saved by the show history. God, so many times I've been I've my my butt's been pulled out of the fire by my own show history. Thank you, past Brian. You, interesting. So okay.

Bryan Cantrill:

So you

Andres Freund:

And I think when Yeah. It was a couple hours of investigation that I was sure that sounds something was weird. Wasn't that sure whether that where the problem was, like, that it was, like, actually in LDMA, that it was, or any of that was at all clear. And from, like, figuring out that there was something weird to the report that was date, like, a few days.

Bryan Cantrill:

A few, that was sort of, like, that must been some pretty extraordinary days, I would imagine. It's pretty

Andres Freund:

pretty pretty focused days.

Bryan Cantrill:

I don't know if you have, if you this is where I you know, I understand that my kids tell me that if you die in your house, your cats will eat you within within a matter of days. I don't know if that's true or not. But I I I've never exactly looked at the cats the same way. Again, Andres, I don't know if you if you live alone or with the pets or anything, but it just feels like you'd be, like, so locked in once you are are realizing, like, I am on the path of something that's really big. I mean, and extraordinary.

Andres Freund:

A typically, I try to have, like, a few days a week where I don't have any meetings. I think the 1st day I I looked into it, I took in a no meeting day. And so, like, I spent most of that day, and then the next day is, like, a meet was a meeting day. So I remember sitting in a bunch of meetings and, like, not really being able to concentrate because it feels like, like, I should I need to continue looking into this. Also, I can't be I say why at this point.

Adam Leventhal:

Yeah. I have this I have this open can of plutonium in my kitchen that really I should be attending to.

Bryan Cantrill:

Yeah. I'm sorry. Exactly. I need to actually save humanity over here. But, no, this sounds like no.

Bryan Cantrill:

The state we have for the coming month, it sounds important, but the wow. So, so you your, and your 500 milliseconds also should be said. This is an extraordinary long amount. And this is, like, 500 milliseconds of compute time, it sounds like.

Andres Freund:

Yeah. That was, like, nearly completely, CB round.

Bryan Cantrill:

CB round.

Andres Freund:

There's a, like, a bit of, very few cache line misses, but, like, as, like, nearly all, like, just number of hits. Like, I don't remember the exact APC. Like, it was, like, clearly bound by, like, the number of instructions, nothing else. Like, the reason that it was actually this expensive is that it's, it's all, like, a job obfuscation, not because it was doing expensive crypto or anything. They wanted to remove the

Bryan Cantrill:

Yeah. Oh, the irony.

Andres Freund:

There's no strings visible in the binary in that injected object file. Instead, there you have some weird obfuscated try of, like, various strings and string comparisons, it walks basically and this is, like, where I am starting to not really understand. Oh, this is, like, largely from all the people looking into it after I publicized, the report, matches, like, the the simple names, via the this these obvious k to strings. And that turns out to be really expensive the way they implemented it. And they do that for every symbol in the, like, I don't know, the various shared libraries that are loaded.

Andres Freund:

That's where all the time is spent.

Bryan Cantrill:

That's weird.

Andres Freund:

If they have It's not done less obfuscation, I probably would not have noticed that anything was wrong because, like, few hundred, like, if the microseconds, I wouldn't been able to notice it.

Bryan Cantrill:

That's it. I mean, that that's I mean, I think that, like, again, if, you know, I for and I think for software engineers, like, you basically know this, but for for Kevin Roos, for whom the details are apparently too boring, the I mean, that's an extraordinary amount of CPU time, and you almost certainly have something that is either compute intensive, like you like crypto or what have you, or you've got something that is just iterating over a lot of data. And it sounds like that's in this case, it's like going over every symbol, obfuscate it, doing, like, an expensive string comparison. It's like, yeah. That's gonna that is gonna add up if you have a lot of symbols.

Andres Freund:

Yes. And, like, it's not even that there are that crazy amount of symbols and, like, in this HD compared to, I don't know, some but you're hairy, but, like, the this string matching routine, takes a few good cycles for each individual character. Like, it does add up.

Bryan Cantrill:

So at what point do you so you're, like, this is something nefarious. I don't know what it's doing. And where do you kinda turn that dial of, like, hey. I need to it is time to let other people know about this versus, like, I need to check my own work. I mean, that that must have been a balance in there as well.

Bryan Cantrill:

Like, this is some this is, like, serious, and I need to, like, make security teams aware of of this. What what was that kind of inner dialogue like?

Andres Freund:

I think I was trying to like, initially, I was wondering whether I should report it just at that stage. It was so unclear where the problem was coming from that I felt like I didn't know where exactly to report to and who to let know.

Bryan Cantrill:

Yeah. Interesting.

Andres Freund:

So I waited until like, wanted to understand the I know. This is the source of the problem. And at some point, I figured out that it was, like, l z or and, xe, I guess. And, usually, I thought it was the DBM package maintainer that was where the source of all of this was from. It's like it was visible in the DBM package sources, like the injected the texting.

Andres Freund:

It wasn't visible, in, like, the downloaded, or in the git checkout. So Right. Yeah. Until I figured it out, I was like, okay. Just built it wrong.

Andres Freund:

To DB. Yeah. Right. Right.

Adam Leventhal:

Right. Because they must have built it wrong because when you build it from source, it's all fine.

Andres Freund:

I mean, I I was I was seeing that there was something malicious in the tarball or in the green sources. It's like I was seeing that it was executing this weird thing.

Bryan Cantrill:

So so you're thinking is is like a a a Debian build server has been rooted or the the they've got their build process has been somehow corrupted. Is that kind of what you're thinking at that time?

Andres Freund:

It was in the like, they have a git repo for the maintain from the maintainer of, XE. So and it was visible in that git checkout because they import the toggle walls. So it wasn't it was clear that it wasn't just the build process. It was, like, either the maintainer committed something intentionally malicious or they didn't notice or, the system was hacked. And then oh, so I started writing up a report just to deviant because I thought like, oh, I will need to let them know that something is that is going on.

Andres Freund:

And then at some point, I figured out, oh, this is not actually DBAN specific. This happens. Like, I or it's not in the DBAN package alone. It's in the upstream. I think as soon as I knew, like, how to realize that, I reported it.

Andres Freund:

And people were working on coordinating, I don't know, what to do about it. I was, like, doing more investigation of what was happening. And at at the initial report, I didn't yet know that it was redirecting any symbols or any of that that was that was coming later.

Bryan Cantrill:

Oh, interesting. And so then when do you start to because at some point, like, the, you start to think that that auto tools, that LibTool is, is to blame. Is that right? Or is, or,

Andres Freund:

there was a weird thing that way they inject, themselves into the build processes that they add a few lines before the main location that triggers like building the library via the tool. And they're basically just adding something before, a semicolon and so it gets executed. And that then recursively there's loop tool again because, like, part of the commands that it has, like, gets by environment, environment variables is like the normal lip tool location that would also happen during an backdoor, build process that, that, like, rebuilding the same file, like, the same some of the files multiple times. It was, like, confusing me because it did not normally happen. And the way the output was that they added, like or in some parts of the build, they were, like, make, what's the add at the beginning of a make rule so it wasn't printing it out.

Andres Freund:

So I didn't even see that it was doing it twice. I was seeing, like, ECCE locations for the same file with different PIDs. I was very confused for a while because I was I haven't been able to figure out how to properly make make actually also output the silenced tools. Turns out that it prints more if you do make dash n, and that happened to show the problem. And there and I at some point figured out that it was like double evocation of the tool rule going from somewhere.

Andres Freund:

And, yeah, that made me realize that where what the modified part of the make file was.

Bryan Cantrill:

And, I mean, there there are I mean, consuming 500 milliseconds, in their their auditing code, like, I'm not sure how clever that is. This part of it feels really particularly clever. I'm not sure. Is that is that a fair read on it? Or they've taken effectively as I understand it, they have taken one of the, like, the the corrupt data that they have to test it, and then they've perturbed that and then use that as the payload.

Andres Freund:

Yeah. And the code to do that is not actually in the git sources. That's just in the release tarball. So if you just look at either the release tarball or the git sources, their own, you can't actually figure any of it out. The modification to, the build system, it looks plausible ish.

Andres Freund:

If you just are used to looking at auth conv loop, right, for, like it's just, like, weird and for ease. And only the test file really shows the problem. And But as as as

Adam Leventhal:

as as part of the same thing, the test data, it's all this weird fixture data that doesn't look like anything anyway. So it it is just a perfect place to to bury this kind of stuff.

Andres Freund:

One more thing that made me really suspicious relatively early on was that I was seeing that in if the 6 to 0, they added those test files, but there was not actually a test using them.

Bryan Cantrill:

Oh, interesting. It felt

Andres Freund:

like a weird, in a way. I don't understand what like, they added the test that actually used those files, but only in 6 dot 5 dot 6 dot 1.

Bryan Cantrill:

Whoops.

Andres Freund:

I was like, why would you add test files, 20 hours before the you release and then only add a user after the release? That's off.

Bryan Cantrill:

That is off.

Andres Freund:

I think they must have been, like, hurried because of the removal. Like, upstream system d was removing the dependency on liblzma at built, at, like, every load time. They were going to dynamically load, the shared library only when they needed it, and they wouldn't have needed it in this context. And it would be pretty much harder to do any redirection if you during d l open. So I think they very hurriedly started to release it and and do the injection.

Andres Freund:

I think they were not prepared to for life at that point.

Adam Leventhal:

So this is that's a really interesting part

Andres Freund:

of the

Adam Leventhal:

story because this is Yeah. This endeavor, And we can link we'll link to some timelines in the notes, but this endeavor been going on for multiple years. So then to see that, you know, both things were coming to fruition, and there was this closing window, as you're saying, Andres, of system d changing, this this PR being merged, and the opportunity being missed. I mean, seems like tough deadlines over in the state sponsored exploit land.

Bryan Cantrill:

State sponsored exploit land. It's like, oh, god. A schedule over quality again around here.

Andres Freund:

I also think that the the beta backdoor works. There's a lot of really well done work. At the same time, it has, like, this it's using the same like, the really good obfuscation techniques. It's using way too to get for too many things that where it actually isn't required. I understand.

Andres Freund:

And if it it feels to me like somebody developed, like, a lot of the techniques to, hey, how can you redirect something during, shared library load time where you can redirect symbols because later the that live the memory for g got I don't know how to pronounce this. Global offset table is mapped read only, so you can't actually redirect anything anymore. That's like mb m protect or something. And that and m protect from coming from some injected code. That's pretty suspicious.

Andres Freund:

So Yeah. Wanted to do it at that time, but, like, that required some weird complicated hang ups because it is when doing the ifunc execution, you can't actually do any function calls. It's like none of the the the reader the relocations haven't yet been performed completely. So you can't do easily call, like, a libc function or anything like that. Right.

Andres Freund:

And because of that, they needed to do, like, lots of complicated stuff. And Interesting. If they had just done the minimum part, I think it would have been fine Because, like, that would have added, like, tens perhaps of milliseconds. But then the they added more looked more and more symbols up. And, like, there's a list of, like, somebody wrote, like, a obfuscated list of strings and, like, the list of symbols it looks up at that time is very large, There was actually no need to do that.

Andres Freund:

They could have just some heuristic and connect, like, s h connect time after they injected the backdoor and done more of the library initialization at that time, more of the symbol lookups, all the symbol state only needed, to call themselves and whole overhead would have been gone, would have gone and be present. And it kinda feels like there was just some, yeah, less experienced people using the techniques that other people built for them and didn't overuse them further further. And it has a bit of a corporate product, filter at the start. It's like, oh,

Adam Leventhal:

let me people just go through the motions of like, I gotta use Bob's obfuscation tool because he's so proud of it. Now he's the VP of of these exploits, and just gotta do it his way, I guess.

Bryan Cantrill:

You know, I I've said around that this nation state, we do not we we need more mentoring. We need a better mentoring program, and just no one wants to listen to me. It's always, you know, gotta get this thing out today

Andres Freund:

because they're closed the door. Like, can we be

Adam Leventhal:

a little more agile folks? Well, like, what's with the waterfall schedule here?

Andres Freund:

Well, if you are not the one making the timing, but, like, the products control the timing, it's a bit harder to find any new And so you think

Bryan Cantrill:

so the the the this backdoor because it was it clearly was important to them to backdoor SSH in particular. This all was a very, a long chain to get through to SSH. And, I mean, pretty creatively to get to SSH through a library that it does not link to. I mean, that is, I,

Andres Freund:

not the first to end this. They must have had a lot of like, somebody on the team must have had, like, a lot of Unix experience or multiple people. That's, like

Bryan Cantrill:

Thanks.

Andres Freund:

Stuff they knew they they know about weird, AutoCAD built shares and how to, like, plausibly and they, like, had, like, weird comments in the eject like, added code to build to host m 4 at, on the, like, reasonable ish, like, as good as justification as I've read in many other parts of autoconf. And let's not say it's good, but just, like, it sounded like possible. And then, like, knowing about, like, weird, the way the the injection works, you needed to know about ifangs. And then that was not actually enough to do anything because, like, it happens but the So, like, it's not location happens just in the the team a is loaded, but not when the main binary is not yet loaded, so you can't redirect anything for that. So they need they needed to know about the audit hook mechanism, which is like itself like a copy from so, Lars, I don't actually remember.

Andres Freund:

Like, it's old

Bryan Cantrill:

It is. A 100%. Yeah. No. No.

Bryan Cantrill:

It is. I when I saw that, I'm like, oh, no. I'm so sorry. Because there was a no. That that the the that is the plate auditing that that Rod and Mike added and, Rod and Mike added.

Adam Leventhal:

These are these are colleagues of ours. Yeah. From the Solaris team.

Bryan Cantrill:

Yeah. And there was kind of an error. Actually, honestly, I appreciate it because there was an error well, because this is not true for all Linux facilities. I kind of wish that more Linux facilities would literally, like, just look at people that came before them and, like and the the the new linker was happy to be, like, no. Like, we are Solaris linker is basically state of the art, so we're just gonna implement everything they've got, which I think was a a good approach because there's a lot of good stuff that they did.

Bryan Cantrill:

And then and then we've got put out of it, which I actually don't think is I I don't think is bad, actually. It's kind of a and, like, I mean, because Adam's so trust, I think you used it used put out of me if I remember correctly.

Adam Leventhal:

I don't remember I don't remember I remember so trust, but I remember the specific mechanism there.

Bryan Cantrill:

Yeah. It it was kind of a a slightly different idea about how you might debug a system. And, I mean, I do not that Detroit's put it out of business, but it definitely made it, it took away a use case because they were kind of thinking of a debugging use case. And, Mike, if our memory serves, had built this thing called, Sotrust as a shared object trust, trust being the kind of the, the the old, UNIX.

Andres Freund:

Kind of.

Bryan Cantrill:

Yeah. The the s trace, but but using proper, Roger Faulkner's, trace system calls and signals is kinda where where trust comes from. But the, so that was kinda but, yeah, I think you're you're right, Andre. So I think that this was coming from, the new LD, I think, took this from Solaris. So I was like, I don't know.

Bryan Cantrill:

I'm sorry.

Andres Freund:

I think might have made more sense back in the Solaris days, but number of symbols are actually so large these days and very common use cases that amount of information it gives you, unless you add, like, a lot of filtering, is just kind of too large. And yeah, so not something that a lot of people still need to look at and, intrusion vector it provides seems like a problem matter to me. And I'm actually seeing any like, a lot of other tried to look whether there's other exploit or backdoors or exploits or whatever using, deal audit, but I didn't find much. But I was kind of surprised that there wasn't more because it seems like kinda made to defeat, like, a lot of the security mechanisms that were reduced introduced to make some, exploitation harder.

Bryan Cantrill:

Yeah. And it also, like, it it it also I mean, I I I can see, you know, the kind of the genesis of the facility and why they've got it. We definitely used it. I have got I've used it for things where you you are able to because you're able to interpose on using a mechanism that isn't just LD preload. The problem is you're also, like, allowing people to kinda monkey patch programs and there's you're not necessarily encouraging a sovereign sharing.

Bryan Cantrill:

So the is so you you discovered this thing is using, the and you know I kinda know that this is coming from upstream in XC, which is super surprising. Do you, at this point, realize that, like, there's actually a crooked committer here? When does that kinda light go on?

Andres Freund:

I think that was, as soon as I discovered it was coming from upstream

Bryan Cantrill:

Yeah. Oh, interesting.

Andres Freund:

I found that it referenced those test files. So I looked at who added the test files. At that point, it was pretty clear. It's, like, either the system was hijacked or that they were involved. Yeah.

Andres Freund:

Like, and it seemed suspicious enough that and if was leaning towards them being involved in some way, because it not just a single day. There's multiple days. I hadn't did, like, found the like, looking later, I found that frame where they did weird things was a lot larger, but, again, I said at that point, but it clear that they were involved. So we couldn't like, I couldn't really, like, port it to the security contact, which was g I at that point, because that was clearly not gonna be helpful for anybody. It was just going to, like, tell them that, hey.

Andres Freund:

Look. You got a all the machines that you can find in a scan now, we better exploit them right

Bryan Cantrill:

now. Right. Wow. Okay. Because, you know, that didn't really occur to me, but of course that makes sense that the security contact for the library is Gia.

Bryan Cantrill:

And, fortunately, you know enough to realize, like, I don't know if this person even exists, but this person is definitely involved in the nefarious activity. So I'm not going to it's gonna be normally, the responsible protocol would be to obviously get the security folks involved for a particular project. And and now are you at this point working with Red Hat or Debbie I mean, Debbie and Cheers? I mean, who who's kind of has a has a team kinda grown at this point working on this, or, are you still doing it during while you're in meetings?

Andres Freund:

K. I'm not entirely sure what the exact precise timeline was because I was ported it up, like, to various security lists that are, like, private security lists and then Yep. Continued as soon as I had sent those emails, I continued investigating and then back and forth discussions. But I don't exact timeline there is a bit hard to

Bryan Cantrill:

Yeah. Of course.

Andres Freund:

Together, but think I did the until the re the release of, the the public email, I did had most done most of the clinical investigation of happening. I got to the point of, like, figuring out that it was detouring, RSA public decrypt.

Bryan Cantrill:

Yeah. Wow. So, I mean, at this point, I mean, and and maybe and presumably earlier, you've realized, like, this is a very big find. This is a very, very, very big story, a very big vulnerability. I mean, at this point, are you thinking, like, I'm definitely deal I mean, I guess we don't know for certain that this is a state actor, but I think the consensus is that it is likely someone who is doing this for their job.

Bryan Cantrill:

They're doing it over a some number of years. It feels like it's an intelligence agency. I mean, are are you I don't think I got any of

Andres Freund:

that part. I saw that I didn't analyze what was happening upstream that carefully or anything. I was just seeing that there was still really those changes and were introduced by one of the maintainers. And I didn't see at that point much need to look into anything further than that because today it was, like, their system being compromised or I'm playing a long con or I don't know. It didn't really play a role for the reaction.

Andres Freund:

So I pressed forward to it. They're getting all the necessary information to actually be able to make a public closure of what's what's happening.

Bryan Cantrill:

And so now that the timeline really goes very, very quickly because you send that mail on on I mean, it's more or less public when you I mean, it seems like the latency between you sending the the the kind of the the mail to OSS Security, maybe I I or maybe that is at that moment, it is public, effectively.

Andres Freund:

I mean, that's a public list.

Bryan Cantrill:

Public list. Okay. So at that point, when it goes over to security, that is the public announcement of this thing. And, so this was not something that had been going on though for months behind the scenes. I mean, you are

Andres Freund:

No. No. No. We're talking about a few days in total.

Bryan Cantrill:

A few days.

Andres Freund:

Yeah. Oh, okay.

Bryan Cantrill:

Well, this is important because I think that, like, one of the things that was surprising certainly to those as I've talked about before in terms of spectrum meltdown, is that it had been spectrum meltdown had been going on for for an extended period of time in an

Andres Freund:

because, like, one is the backdoor. So somebody clearly intentionally introduced it and knew how to utilize it because they don't. As, like, with the Spectrum Meltdown stuff, it was, like, researchers finding something a complicated where the deal exposure of, like, how to exfiltrate data was, like, really complicated. A lot of noise associated with it. So I think it's a lot more defensible to take a long time.

Bryan Cantrill:

Oh, no. I

Andres Freund:

don't think that year like like, the time frame that it was taking, it was good. But, like, I think the trade offs are very different than from something that might actually be being exploited today.

Bryan Cantrill:

Or No. It's a very, very good point. In that, when you're finding, like, a potential vulnerability that, like, oh, this is a theoretical vulnerability, there's not a known exploit in the wild, this wasn't nefarious, and what we really wanna do is we wanna be sure that everyone has the opportunity to secure their systems and that you wanna be if anything, you wanna proceed kind of as slowly as you reasonably can without divulging the vulnerability such that someone nefarious could take advantage of it. And this is really the exact opposite. It's like, you've got no like, the nefarious thing has already happened.

Bryan Cantrill:

There in the backdoor is there. We really need to be biased towards getting this thing public very, very quickly.

Andres Freund:

The fix is much easier. Right? Like I the vector meltdown case, like, you needed to build very complicated compiler and or it says system mitigations. Here, you just needed to, like, go back to an already released version of a library. So it wasn't I think, like, the trailers where you were kinda clear.

Andres Freund:

And it's worth noting that some people were kinda unhappy even with, like, the very short disclosure, like, private to public disclosure window. So they were saying I kinda understand that, that once it's in backdoor that is in actual use, like holding back just event, gives like the attacker more chances and everyone like less. Yeah. Well, it's to have a chance to defend themselves.

Adam Leventhal:

So it's the nature of this where my understanding is that it was attackers with a particular private key. So this was not, like, opening the floodgates to anyone who knew the secret combination or whatever.

Andres Freund:

This knew this at this time. It was Okay. Gotcha. A week. Like, about a week or 4 days later or something till anybody got to that point because gotcha.

Andres Freund:

I, I think I was like really the only one early on look at the, what it was doing and was taking all my skills to figure out that we got a rectangle, like, one one symbol. I had to, like, I'm not I'm not gonna be able to, like, reverse engineer some obvious catered code reasonably well done. Like, that's just way out of my outside of my skill set. Like, it also was just not necessary. Like, once you know that, like, it's in the path of a normal authentication.

Andres Freund:

Yeah. You like, doesn't really matter what the technical details

Adam Leventhal:

of the

Andres Freund:

quotations are.

Adam Leventhal:

That's right.

Andres Freund:

That's right. Like, that's enough. I was very worried that it was, like, visible for everyone. And because it actually weighed towards, like, not immediately, the small disclosure window to give the distributions a chance to have a packet ready because I didn't know that whether, like, everybody like, some other bad bull could quickly figure out what it was doing and and just exploit everyone.

Bryan Cantrill:

And what is your kind of emotional state during this time? Because you gotta be like a combination of I mean, it's obviously excitement. You discovered something really important, really big. Maybe a little bit of, like, god, I what what do you a little bit of fear for the damage that could be done using this? I mean, what what what are you kind of thinking in the kind of the the moments leading up to the public disclosure?

Andres Freund:

That has changed a lot over the time, and Eric was really being annoyed because I didn't actually like, I was like I didn't want to spend time on Yeah. Right. Once I started seeing that was what it was like, there was no way I could like responsibly say like, hey, okay, I'm just going to defer this for 10 days. Right. Right.

Andres Freund:

It was like but it not wasn't a responsible option to that so it's, like, partially, it was really impatient to get this over with. And the other one was, like, I don't know, a really nervousness. Like, what is the right move?

Bryan Cantrill:

Right.

Andres Freund:

Yeah. And, some of, like, panic giggling. I don't know.

Adam Leventhal:

That's great.

Bryan Cantrill:

And did you, like, seek I mean, it's not like you know, you did you seek counsel from someone in terms of, like, what should I do? But I guess it's I mean, the path towards public disclosure does feel pretty clear. But, you know, did it feel clear, actually? Did it feel clear that, like, what I need to do is is disclose this as broadly as possible, as quickly as possible?

Andres Freund:

I think there's some let me just share. I don't quite want to talk about. Sure. It felt very clear that it needed to go out very broad. Yeah.

Andres Freund:

That's true. Yeah. Mechanics of that, I was, like, wildly unsure about.

Bryan Cantrill:

Interesting. Because

Andres Freund:

who do you inform? Like, at that point, I hadn't even figured out that it was just, like was it, like, in all the BSDs? Or was it, like Is

Bryan Cantrill:

it a Lumos? Come on. That's not your next question. I mean, they're like, what what are us over here? You don't know.

Andres Freund:

Sorry. So much of a hate associated for you with Lumos. I kinda don't care. It's fine.

Bryan Cantrill:

It's fine. Fine. We're used to it. It's fine. It's fine.

Bryan Cantrill:

It's fine. We were doing

Andres Freund:

we we Although we had built some problems that we occasionally or, like, the emulated system calls occasionally fail and stuff like that from the postcards. That's mostly my angle or contact my contact.

Bryan Cantrill:

It's fine. Look. A Lumos is the latchkey kid of operating systems. Like, we'll look after ourselves. We'll let ourselves in.

Bryan Cantrill:

We're fine. We're good. We're gonna make ourselves dinner. We're used to it. Our no.

Bryan Cantrill:

Our parents haven't been here for weeks. It's fine. So yes. Okay. So you don't know, but it's a and I got it.

Bryan Cantrill:

I mean, if you you actually must have a a legit, like, god, if open BSD is vulnerable, that is actually would be a bit scary, honestly.

Andres Freund:

So do you I think are you that part like, I don't remember when exactly was at some point, it was become game clear that the path was via system d Yeah. So that at some point, you used it to be like, to Linux's. Although, like, I didn't know whether there was some other dependency on what he made. Like, I don't know. It could have been patched in, like, some other pulled in some other way Yeah.

Andres Freund:

Too. So I didn't think like, I didn't want to wait till I knew the all those details to report. But between the public the private and the public disclosure, there was enough time to figure out that there was this if Linux, interested actually did something. Initially, when I did the private report, I hadn't even figured out that that same script is called twice. So that's why it has, like, an if that's test for, I don't know, sends up some files, and then in that branch, there's, like, Linux conditions.

Andres Freund:

But then in the on that builds, the the injector backdoor, there's no best conditions. I hadn't figured any of that out initially.

Bryan Cantrill:

Yeah. Wow. Alright. So then you but now increasing confidence. Okay.

Bryan Cantrill:

This is this is Linux, and, the it's publicly disclosed. And, I mean, you must know, like, this is gonna be a big deal. Is it has the reaction how has the reaction met in terms of your expectations of what you thought the reaction would be?

Andres Freund:

Didn't quite match it. Kinda thought the the initial reaction would be faster. It would be noisier in the first couple hours, and that it would be way less noisy after the first few days. But like the, it got noisier and noisier and noisier over time. It's like and it never expected to reach, like, you know, yeah, audiences or anything like that.

Andres Freund:

I was kinda thinking, okay. It'll be, you know, an l LWN post and, some security sites will list it and stuff like that, but I didn't think that general media would write about it and that.

Bryan Cantrill:

Right. Interesting. And, of course, part of the reason that people are writing about it is because it's tacking into a bunch of different bigger narratives. It's certainly tacking into a narrative around open source and how we I mean, in a lot of different angles, how we maintain those projects. I mean, Adam, the the sock puppet bullying that was happening.

Bryan Cantrill:

Oh, man. I mean, and and because Andres all that I assume you you all discovered along with us after the disclosure. I assume that you didn't see any of that before you disclosed.

Andres Freund:

It's a, what was your reaction?

Bryan Cantrill:

What was your reaction to seeing some of that?

Andres Freund:

Oh, pretty cool in a way. We're not just in a way. And I initially I already suspected that whenever I made the report that original maintainer might not be involved.

Bryan Cantrill:

Yeah. Interesting.

Andres Freund:

And, like, I think that I saw that, like, okay. Now I understand the path of how that g actor, like, evolved and got all those, like, became a maintainer reasonably quickly. I was, like, paid by that and, like, pretty bad to be part of something that that kinda to have caused a lot of hurt for, laziness. And it felt like been a maintainer for a long time now. And I know the feeling of, like, you're overwhelmed, you can't keep up, and people are, you know, hiding you for, like, not doing enough in their eyes, not jumping through all the hoops that they want you to and not famil very familiar in a way.

Andres Freund:

Like, even though, like, I don't think any of people that have been passing me out have been, like, as like, really big organized attacks, but it still felt like failure.

Bryan Cantrill:

In in the super dark is you're saying, in just the terms of the it for whatever reason, I don't know, Adam, I mean, you just felt the same way that the engagement of these sock puppets and where they do good cop bad cop with the sock puppets. Totally. Where no. This they they've they've invented, Jigar Kumar as this just like absolute asshole who

Adam Leventhal:

but but totally believable character by the way. Like

Andres Freund:

Oh, yes. Totally so many many of them.

Adam Leventhal:

You would I mean, I've encountered that kind of person in, you know, the issues in PRs, you know, dozens of times. Right? So you at no point are you raising red flags of, like, wow. This person seems over the top. Like, this person seems over the top in a normal kind of way.

Bryan Cantrill:

Yeah. You gotta wonder that, like, inside of the the nation state brainstorming session. They're like, this is isn't the one a little over the top? Like, no. Trust me.

Bryan Cantrill:

This one is not Sadly, it's you'd think so. You'd think this would be

Andres Freund:

over the top,

Bryan Cantrill:

but this one is not even gonna move the needle. Now they're people like this Really?

Adam Leventhal:

You're saying this person is yelling at this person who's not getting paid for their work and that is the norm? It's like, yeah, pretty much. Yeah. No. That this will fly.

Bryan Cantrill:

Well, And also like

Andres Freund:

Less nice interactions.

Adam Leventhal:

Yeah. You're saying there you're right. You're giving a lot of good notes, but one of them is too respectful. Okay. Good.

Adam Leventhal:

Right.

Bryan Cantrill:

Right. Exactly. This is too respectful to be be plausible. And I just

Andres Freund:

I wouldn't quite say that, but, like, I've seen people behave worse and, like They behave worse. With, like, long term email addresses and stuff, like, just big entities. Well, you if if one

Bryan Cantrill:

outcome that comes out of this is that, like, people are a little less inclined to be an asshole because they may be, people may assume that they're a nation state actor. Like, that would be a good outcome if, like, actually, you know, hey. Why why are you pressuring me to get this release out so badly, on and because, like, the emails were they're also, like, devoid of technical content. It's just like, hey. Like, integrate these patches.

Bryan Cantrill:

It's like, what what are we talking about here? Like, what are we make it more concrete. Like, there's not a, it it it just it felt very manipulative, but it obviously it was very manipulative. And you can see

Adam Leventhal:

them latch it, like, the moment Lassie says, you know, I'm having these these long term mental health issues. Almost immediately, the the pressure campaign ramps up.

Bryan Cantrill:

Yeah.

Adam Leventhal:

I mean, it makes me wonder, like, how many more irons are in the fire? Right? How many other because, like, you know, x v x c can't be the only you know, that's not a full time job for this team, I assume. Right? Like, how many other are in the fire?

Adam Leventhal:

Are they waiting? Are they kind of gently putting pressure and then waiting to see where softness merges or where cracks emerge?

Bryan Cantrill:

Absolutely. I I mean, I just you must have had the same thought of like, this is this can't be the only one out there like this. There's gotta be some other activity out there that is I mean, would did you have that same thought?

Andres Freund:

I think it was seems like big enough effort that you wouldn't try to find your way into only one product. So clearly, at the beginning, they must have, like, had multiple let's become a maintainer type efforts. Yeah. I don't know whether after doing that, like after getting access on XE, then they said like, okay, we've achieved our goal because once you have door at SSH, like, that's a and really need much many other backdoors at the same time and particularly not from the same team. So could it have been that they just stopped the other efforts?

Andres Freund:

And, like, I think if they stop the other efforts, like, I'm not sure that realistic to ever find, find that out because, like, there are enough other people that are being assholes online. Like, how we do detect that. Right? Okay. Like, some Or some start.

Andres Freund:

Being very unfriendly that has no, email identity, that can't be a smaller pool. So

Bryan Cantrill:

Right.

Andres Freund:

Didn't know whether they got the concurrent efforts, but it's certainly something to be afraid of. Yeah. And I'm not sure that, like, it's only small projects that are addressed here. Like, it was one of the early themes, again, the online reaction to all of this, that there's so many products that are just a single container or quarter maintainer or something. It's not that hard to get become a committer in some big products either.

Andres Freund:

And if you look at the complex projects are in important parts of the chain, looks like building an open source OS, injecting yourself into a compiler or something like Yeah. Or or or Traditional horrors are made of. Right?

Bryan Cantrill:

Or LibTool, honestly. I mean, not to, sorry, I'm talking right now, but, like, that would be because it's something that's kinda convoluted and scary and unloved and has a bunch of architectural issue. I mean, it just feels like, boy, you could do a lot of a lot of damage by infiltrating some of these things. And so then so, you were surprised, I guess, as we all were, about, like, story. Were you surprised as, like, the story kept getting kinda deeper and deeper?

Bryan Cantrill:

I think still. Like, Adam, I didn't realize the the private key business that they've so they, Adam, they've discovered that or or tries to say they've discovered that there's a single private key that is that is can basically gain access to these to compromised systems?

Andres Freund:

That's I think that took, like, 3, 4 days or something. Wow. Investigation that somebody found that there's, like, some quick elliptic curve key that can be appended, I guess, to, an RSA certificate. And that is how, like, their backdoor decides, hey, I'm going to the backdoor path or the normal path.

Bryan Cantrill:

And

Andres Freund:

Oh. Kinda grossness of it all. Like, you send an RSA, public key certificate, then somehow contains in, like, some load data, like, the curve key that then actually another set of payload, like, actually contains the commands. There's, like, a lot of investigation of how all of this stuff works in SSHD, and it's, like, bit tricky.

Bryan Cantrill:

Yeah. And then so and then obviously another narrative that people were tacking into. I mean, whatever you previously thought about open source, this confirmed your view of it. It's kind

Adam Leventhal:

of what you see.

Andres Freund:

It's a very common online discourse thing, I think.

Bryan Cantrill:

Yeah. Right? I just felt like I there were people who were just like, thank God it was open, which I think is my own probably. My own personal bias is like, no. It's actually the openness here was a huge win.

Bryan Cantrill:

I think it also I mean, Andres, I'd love to get your take on this, but I think the fact that it's all open required this attack to be way more sophisticated. I mean, like, my read on this is there was a lot of mental energy spent on making the repo not match the artifact, and it would be a lot easier to do that if that were not open. The fact that that repo is open requires you to spend a lot of energy on that and that obfuscation and maybe leads to the CPU footprint that you see. I don't know, Andres. What do you what what what do you kinda make of that?

Andres Freund:

I don't think the way they injected the build into the build, like, had any impact on the runtime overhead.

Bryan Cantrill:

Oh, interesting. Okay.

Adam Leventhal:

Because we

Andres Freund:

need the backdoor code purely stuff. They just need it away from m from lib l d m a to SSHD, and that was where was course of the challenge. Yeah. I think they could have actually committed it to the repo, and nobody would have noticed, to be honest.

Bryan Cantrill:

Yeah. Interesting. I mean, it's I mean, you look at the file, you're like you're looking for a straight period here. So, you know I

Andres Freund:

mean, that yeah. I don't think that actually played any role in the main past expectation. I'm kinda actually confused why they did any of that. KDNA feels like it's noisy, but not really useful. Because they can actually make the whole authentication path, like, legitimately allow access.

Andres Freund:

So all the sandboxing is not very interesting as they can break out of the sandbox by just permitting access. I'm Not sure that there's a good explanation yet why they tried that and whether that perhaps was just parallel path towards exploitation. They didn't yet know how to utilize, LZMA access to trigger. And that's why they tried to inject it, like to break out of the sandbox, because it's actually not that easy to figure out how you can get to the RSA function checking part without, being passed some other checks first. It only happens in the SSL certificate based, not, not SSL, sorry, s h certificate based authentication, which is not, like, the most common way of doing it because in the normal key just key based authentication, like, s h first checks whether key is in authorized hosts.

Andres Freund:

So you could totally provide a fake key at that point, but you would need to know which the keys are in authorized host authorized keys. And that would have been much less attractive, obviously. And the way they what they figured out at some point is that you can do has all of that with the certificate based authentication. Suspect that they just figured that out later or concurrently.

Bryan Cantrill:

Interesting. And so you think that some of the sandboxing is kinda like, they have some tech debt they didn't bother to pay down? It's like, oh, yeah. That was an old project that didn't go anywhere.

Andres Freund:

I don't know. It's it's just confusing me. I don't Yeah.

Bryan Cantrill:

I don't think

Andres Freund:

Not I I don't think we know yet.

Bryan Cantrill:

And so how so as I mean, this was I mean, it was only 10 days ago. I'm I'm sure this has been this has been a rather, intense, 10 days for you. And of course you're trying to get a release out as well. Congratulations on that. Did the release go out by the way?

Bryan Cantrill:

Did you get the release out there?

Andres Freund:

The feature freeze. It was the release. Right.

Bryan Cantrill:

But so so you're

Andres Freund:

scrambling some stuff in, but not quite all that I wanted. It turns out that it's fairly distracting.

Bryan Cantrill:

Well, yeah. So and then, were you expecting kind of see, the the kind of the noise after the fact what or the fact that it kept growing was surprising to you? I did think it was great that, I mean, Sadia had the, you know, your boss's boss's boss's boss. However, now, how many levels up? I'd be the CEO of Microsoft.

Bryan Cantrill:

Or I

Andres Freund:

think something. Yeah. Exactly. Right? Something like that.

Bryan Cantrill:

15, 25, however many. That must have felt good, though, to see that, just to see that it wasn't that that it that your and I love the fact that in particular praised your craftsmanship and curiosity, which which felt like it was awfully accurate. I'm not sure. Oh, wait a minute. Did you have to write that tweet for yourself?

Bryan Cantrill:

Did he this is this is

Andres Freund:

like No. I didn't know anything about it. I actually I only learned about it a couple hours later. So I hadn't actually checked Twitter at that moment. And, like, I think somebody sent to me a picture, and that's how I, realized it.

Andres Freund:

Okay. Yeah. I think it was, already, like, I don't know, overwhelmed a bit. Like, overstimulated, I guess.

Bryan Cantrill:

Right.

Andres Freund:

Right. So is that good? What is happening? No. Yeah.

Andres Freund:

It was also it was nice to see that it also, like, feel that oh, working in open source teams at Microsoft, getting that recognition is just, like, a thing I couldn't have imagined 50, 50 years ago. That's Yeah. That's very interesting feeling.

Bryan Cantrill:

Yeah. Yeah. That you have a company that was that had actually kind of explicitly under the Ballmer era had pretty much positioned itself as diametrically opposed to open source. And now, that, you know, 20 years later, as you say, not only are implicitly supporting this effort because they're employing you and part of your job is is working on the post course core team, but and to see them really being explicitly encouraging it. I mean, as the reaction I assume the reaction of Microsoft has weird Satya's reaction, and people have been pretty positive.

Bryan Cantrill:

Is that a is that a fair guess?

Andres Freund:

Yeah. I overwhelmingly. Fusion about it, exist, but I think oh, well, I mean, yeah, like, like, kinda doing from lots of parts of Microsoft I've never had contact with because I've, I mean, I've been doing open source stuff at Microsoft. Like, our team has grown up at the time, but, like, it's still like a like a odd corner, I guess. And, like, some years hearing from parts of Microsoft that I never even existed.

Bryan Cantrill:

Well, hopefully, there are parts of Microsoft that you need things out of because that's the oh, hey. Look. Oh, thanks. Thank you for the high price framework. You know, there's the email that I sent you 2 months ago that you've been sitting on.

Adam Leventhal:

So if

Bryan Cantrill:

you could actually please, now is the time, Andres. This is the this is the opportunity. Your window is open to get to get the things that you, that that that's great to hear. I mean, it's always, I think, because I I think, you know, people rightfully feel, and they should feel proud of your role and Microsoft's role in this because I think it's very important. One question I've got for you is the the because I think this is something that people are definitely curious about is, it feels lucky.

Bryan Cantrill:

And and my reaction is this is like, well, I this is like I'm not sure this is luck so much as it is someone who's really digging into aberrant behavior. What what what's your your kind of view? And I but I guess you you you had this kind of the the fact that you had the Valgrind artifacts and their works. Obviously, it feels like you could have easily not as investigated as deeply. What what is your view of kind of the role of luck in this?

Bryan Cantrill:

And I think a question a lot of people have is like, would this have been discovered in the limit or do you think this could have hidden in perpetuity?

Andres Freund:

Good luck. Certainly. By the role. So I in other releases, I had more of my own work, and, like, that's often then more, like, doing code review to stuff, polishing, I don't know, doing the eyes and stuff like that and not performance work. So I might not have looked at the time.

Andres Freund:

And then, if I already had installed or if, actually, if the package maintainer had already integrated 6 5.6.1, not sure I would have figured it out either. And the fact that Walgreens had warned me beforehand, I'm not sure that I would have figured it out without that. And I actually looked back and looked at the port that we got from wall grind. In hindsight, I that would have been enough to figure it out because there was clearly, like, some call stacks that were, like, underscore start somewhere into the, inflowered into, liblzma. It didn't look right and should have been in could have been enough to figure out that their problem was there, but, clearly, they didn't.

Andres Freund:

And I don't know how many other people had, like, enough bounces to hit this multiple times.

Bryan Cantrill:

And Yeah. Interesting.

Andres Freund:

If I had a past to see like, this was a server, CPU, so where the individual cores are a bit slower and I didn't have turbo boost enabled. So like, the CPU usage was really exaggerated compared to my laptop example, because, like, that has a faster per core performance, and how long we have tools enabled or whatever the AMD equivalent is. So there would've been a much shorter bike. So I think there was definitely lack involved. How long it would have not been caught?

Andres Freund:

I've, I go back and forth, I think.

Bryan Cantrill:

Interesting.

Andres Freund:

Lots of people like, lots of very high CPU usage in that nobody seems to care a whole lot about. Also, SSH is something that I think people are more tuned to being weirded out by. But at the same time, it's also so normal to have all these, random probing attacks that everyone, like, in a public IP, press from. And I had disabled, password authentication. But if you if I didn't have password authentication disabled, it would have totally been plausible that the sort of encryption, is taking some time, like running a lot of cycles and decrypt or whatever.

Andres Freund:

And that that would have been source of the CPU usage or something.

Bryan Cantrill:

Yeah. Interesting.

Andres Freund:

So I I really, like, I don't think it would have been years because at some point, somebody gotta have must have figured it out. But this on the on the other hand, like, what if in 5.6.3 or 2, they would have, actually optimized the code to only take a few milliseconds? And the story gets is very different. I'm not sure that whether that would have been physically found anytime soon. Because, like, if somebody analyzed, actual intrusion, I think that's the most realistic path is that they started to Yeah.

Andres Freund:

Utilize the backdoor, and then that it's something that has to become noisy.

Bryan Cantrill:

Yeah. Yeah. Yeah. Yeah. So someone begins to see so have you read the cuckoo's egg by Cliff Stall?

Bryan Cantrill:

I,

Andres Freund:

Lucky funnily enough, I've I saw a reference, like, a while ago, like, not weeks ago, where I'm, like, didn't read original book, but I read about it a few weeks ago.

Bryan Cantrill:

So it's a great read and, I I don't know if you read it. I I guess you decide that you can't remember if you read the Cuckoo's egg or not, or you and I were talking about this earlier, but the the, you know, the cuckoo sack has merits a reread, and in in part because that is just as you're saying, Andres, the where Stol is an astronomer but is also a computer scientist, and, I love his line about the computer scientist thought he wasn't much of a computer scientist, he was a great astronomer, the astronomer thought he wasn't much of an astronomer,

Adam Leventhal:

he was

Bryan Cantrill:

a great computer scientist. But the, that he he isn't detecting on the intrusion side. And because the the intruder, like, just trips up a little bit. But then he's also he's following his own curiosity, and I think I get because under this would be it kinda dovetails my next question about, like, how, has this whole episode changed you as a technologist?

Andres Freund:

It might have changed me more as a person in the world than as a technologist.

Bryan Cantrill:

Understood.

Andres Freund:

I always, I always thought, like, important to understand the system and bug problems, even though they're not necessarily like in my direct and see change to whatever I need to do. At the same time, like, I don't think as a technologist, there's, that much of a change, but it also depends on what you define a technologist as. If I think just make me change my had has made me change my thoughts about, like, project maintainership a bit, about the threat model is for being a product maintainer. And new, like, unused to anybody who have it that has done open source for a while that we need to do something more about supporting some projects that are very crucial, like a beta or a parent. I think it has I don't know.

Andres Freund:

I don't know where they just really changed me that much, but what's been very, I don't know, overwhelming in a way is that suddenly, like, I don't know, school friends called me and said, I read about this. That, like, and those school friends has, have zero connection to tech and that has never happened in that way.

Bryan Cantrill:

Right. Right.

Andres Freund:

That's the, clearly being like somewhere in a organization where that is very far outside of what I ever thought I would be in. So I think what's, like, I wonder whether it's just changing me.

Bryan Cantrill:

Well, I I think it but in showing you, at least to me so I'll I'll tell you that from my perspective. I mean, I think that there's that shines light on a bunch of different things. I think that one of the things that shines a light on just strictly positively is the the value of digging into an aberrant system. And this is something that that that I've always believed strongly in, that when you got these wisps of smoke, that it's worth it to take the time and really investigate it. And take that that the the the no meeting Wednesday or whatever you have, your your meeting free day, Andres, and really digging into this stuff even though it's like, oh, I've got a, you know, I've I've got a deadline and so on, but, like, you know, when you have aberrant behavior in front of you, it may be your last opportunity to really understand that.

Bryan Cantrill:

And I've always kinda thought of that as, like, it's your last opportunity to understand what is just misbehaving software rather than nefarious software. But I can tell you, Andres, that you've already had an impact on me because I had a very strange bug over the weekend where we did a rust tool chain update and it induced an error that it should not have induced. Just I squared c error talking to a serial presence detect on a on a dim, and I'm like and I kind of, like, addressed the I squared c issue, but I'm like, I why did the why did the tool chain induce this? And I was kind of I was listening to some of of of the podcasts you've been on, Todd, in terms of talking about this. And I'm like, you know, what would Andres do right now?

Bryan Cantrill:

Andres would like it's like, it's time to dig into, like, what is going on? We gotta completely understand this. And, ultimately discovered that it's like, yeah, the power chain update caused a, caused some stack growth, caused stack overflow in an unrelated thing and drop a link to that. Actually, it's all open source. I could drop a link to the ticket in the chat if people are curious on that.

Bryan Cantrill:

But it was a but it really was a very, like, tangible moment of, like, I think we engineers need encouragement from one another that the you know, you're it is worth digging into the aberrant behavior. We did a day we did the episode with Dave Adam on the CockroachDB data corruption issue. Yeah. And I think that that took a long, long time to debug. Obviously, not nefarious at all, but very much just in the much more garden variety, but but very, a a very gnarly data corruption bug.

Bryan Cantrill:

And it like, engineers need that encouragement that, like, no. No. You should like, it's a computer. It's supposed to be working the way we designed, and you should be investigating this aberrant behavior. And I think, Andres, I gotta tell you, I think you're a real inspiration to people who are at that decision point of, like, do I investigate this further or not?

Bryan Cantrill:

It's like, Andres would do my Andres would investigate. I think that that is, and I think that there's a real, potential to be an inspiration to other engineers in that regard.

Andres Freund:

Hopefully, that's I'll

Bryan Cantrill:

try that.

Andres Freund:

One thing that

Bryan Cantrill:

Tee you up for your speaking tour. You know? I'm proud of I wanna hear I wanna I wanna

Andres Freund:

book deal.

Adam Leventhal:

I wanna there's a movie here somewhere.

Bryan Cantrill:

It's a I'll just have a movie. Oh, for sure. We can get a movie.

Andres Freund:

One thing that is, like, made me wonder is, like, I figured that out if I started later, as an engineer because there's so many low level systems pieces to this, like knowing how the dynamic linker works Yeah. And, like, knowing some performance tools that, like, really hard to get started with if you start out, like, at the current complexity rather than at the complexity 20 of the years ago. And worries me that, like, seeing, like, how much more to learn these days to get to a very efficient,

Bryan Cantrill:

So I think it's never it's never too late to learn from first principles, I think.

Andres Freund:

Yeah. I I agree. It's just, like, it's just harder, I think, than it used to be.

Bryan Cantrill:

I think it's I'm glad you mentioned the tooling because you used perf to debug this. Obviously, Valgrind was very important. I mean, our tools are really important to help us understand our software. And they it feels like they played an important role here.

Andres Freund:

Yeah. And, like, interestingly enough, I had, like, I had used to use perf, not just a normal perf, but, like, Intel PT per processor trace. Yeah.

Bryan Cantrill:

Yeah. Yeah. Yeah.

Andres Freund:

Because it actually, like, like, you normal perf isn't going to be granular enough for this. And the problem was that profiler like, our debugger didn't reliably work because they have some anti debugger measures. Yeah. So the pros like, being able to use that accessor trace feature was it could've been powerful. And I've used it, like, a handful of time before.

Andres Freund:

Pretty sure that I'm gonna use it more now because it's to be able to, like, see the exact path through a program at very low overhead in execution time, that's really powerful. And I think I was a bit surprised how little tooling is around it.

Bryan Cantrill:

Well, this you know, we've discovered this though a lot of about a lot of different aspects of DTrace that sometimes when there are things that are a bit esoteric, but when you need them, you really need them, you you they're they're just not as broadly known as you might think. I think they should be. It sounds like this might be an example of that, but obviously critical to help you kinda crack this case and do it relatively quickly.

Andres Freund:

Back to the question earlier where where you said, like, where they changed me as a technologist. I did think now of something. Like, I used to be way more suspicious of, like, soft security boundaries. Yeah. But I we we mentioned that more, like, things that could give you just, like, of a heads up that something is going on, more likely is it is that stuff is get getting noticed.

Andres Freund:

And given that we realistically want to vent this kind of attack happening, there's just too many products and too many interdependencies, Having these kind of, creating boundaries that were like that that aren't intuitively hard to get over, like, add the which for noise. I think that's more powerful than I believed beforehand.

Bryan Cantrill:

And so you like what do you mean like what are some of those boundaries concretely?

Andres Freund:

If example, the that, that we can tell the linker these days to resolve all, the dynamic linker to resolve all symbols at per per program start. And because of that, we can remap the the global offset table, read only. At that, they wouldn't have had to rely on any of this complicated machinery. And issuers have started to default, to dash z now dashrelro, dash grelro for a while. And well, that's fine for long running programs.

Andres Freund:

It's actually not great for short running stuff because, like, most of the time, they're not gonna use all the dynamic symbols that they're referencing. It actually adds overhead to the common case. Yeah. If they hadn't had to hoop jump through all these hoops, it wouldn't have been better, I think. So, like, the complexity of providing the overhead of or defaulting to the overhead of this protection, I was more suspicious beforehand.

Andres Freund:

Now I'm less. And I think there's other kinds of, like that. Like, on the kernel level, doing more always been annoyed about, like, mitigation measures that cause a few overhead because that's Yeah. I see in proposal. I know.

Andres Freund:

Same time, I, like No. A bit clear why it's perhaps worth doing that by default.

Bryan Cantrill:

Yeah. I know. And I feel like on some of these spectrum mitigations in particular, it's like, who do we have to, like, what are we we're not gonna spec out of execution at all. You were I mean, I know I I know what you mean. There's this, and then what about the, someone asked in the chat about f two bit frame pointer, which did play a role a kind of a a strange role in this.

Bryan Cantrill:

Right? Because Yeah. The, yeah. Can you describe the role that that played in in the second?

Andres Freund:

I don't think I understand the backdoor code quickly enough to technically, tightly pointed, but my understanding is that it assumed a stack layout that have been the case with, without the frame pointer there. And, I used that to figure out like, jump to fish stack fish information from the stack somewhere for the app because I was compiling with f no omit frame pointer, and and it, like, the access to wrong offset. And because of that, there was the Morgan run warning. And And it I was seeing that because I was have been running my tests with, I've known my train pointer just because it makes profiling easier. And I've been doing that for a while and, like, just but it was the reason that it was noticed on Fedora was that they just changed the default to it.

Andres Freund:

So I think that was also made it, like, easier for somebody else to disc discover this because, like, 5 to 66 or 0 on new Fedora s was suddenly, like, failing. Yeah. What you can see so far, nobody had picked up on that being a problem and said, like, this warning was silenced just like I had silenced the warning initially. I think that this perhaps helped, figure this out.

Bryan Cantrill:

Which is great. I mean, I I as even though I don't like the double negative of no omit frame pointer, I definitely Yes. I'm I I'm very, pro frame pointers. And I do think it's interesting, Andres, for people for people to hear from you that someone who takes performance very seriously and is looking at micro optimizations has frame pointers because it is impossible to understand your software without frame pointers. Please use your frame pointers or you need Yeah.

Andres Freund:

That you need No. That's necessarily use it for,

Bryan Cantrill:

you know,

Andres Freund:

optimized build. Is it disable like, it's noticeable, the overhead. I think it's totally worth the overhead by default in most programs. I'm not sure that every program needs to run by frame pointers in an optimized build because, like, just, like,

Bryan Cantrill:

I this is Russ. The if you don't wanna have a frame pointer in light it, which is what Russ does for the the the, so we're, but I think it's it's kind of interesting that that that played a little Rosen Rosenkants and Guildenstern kind of a role in this kind of f no omit frame point are kinda wandering by, having a having an impact on this story.

Andres Freund:

Yeah.

Bryan Cantrill:

Well, Anders, this has been amazing, and we really, really, really appreciate you coming here to I know you've had a lot I mean, obviously, you're talking to the New York Times. You've got the you're a celebrity now, rightfully. So you've got, Satya praising you as you should be. So I know that you've got, and and I know you've got a a a national speaking tour scheduled and, you know, with with Oprah and, you know, I know you're gonna be here. So we we really, really appreciate you coming on coming here.

Bryan Cantrill:

It's been so great to get your perspective. This is such an extraordinary, story in so many different dimensions, and, obviously, you know, your role here is a real inspiration to anybody wondering if they should dig into that aberrant behavior, try to understand what's going on. I I think you can really serve to inspire other technologists. So thank you very much.

Andres Freund:

I hope I don't destroy too many artists in a curious.

Bryan Cantrill:

No. It was it was great. And, thanks everyone for for joining us. And then, Adam, we I know you've got a national championship you need to run to, so we're gonna, let you do it. Adam is a is a is a Connecticut native.

Adam Leventhal:

That's right. Big Huskies fan.

Bryan Cantrill:

So Big Huskies fan. Sure. I can you know, Connecticut having its day because, you know, you get the Connecticut Yankee, King Arthur's Court. I always think of whenever there's a total of corruptions there today. So I feel I feel this is a big this is a big Connecticut day.

Adam Leventhal:

And you need Big day for us nutmeggers. Exactly.

Bryan Cantrill:

Big day for the nutmeggers. We'll let you get to it. The just a reminder for folks, that, or actually maybe not a reminder because we may have not publicized the date for the book club discussion to be did I neglect to do that?

Adam Leventhal:

Not. That's right. This is this is gonna be the moment.

Bryan Cantrill:

Alright. So on May 13th so May 13th, we will be discussing how life works, by, and the, this is, hopefully folks have have you and you've started to read this or you found the audio book?

Adam Leventhal:

I've started. Yeah. I've started. No. There's no audio book I was always planning to read.

Adam Leventhal:

There is an audio book available only in the UK, I've discovered. I'm sure folks who need it can pirate it from their UK friends.

Bryan Cantrill:

The but the I I'm really enjoying it. Adam, I hope you're enjoying it too.

Andres Freund:

It's definitely Yeah. Yeah.

Bryan Cantrill:

I I think it's gonna be a totally different direction for a lot of folks, hopefully in an interesting way. So, that is gonna be on May 13th, and we're gonna have, my friend Greg Cost is gonna join us, and he, he's gonna read the book, which is great. He's a microbiologist, and he's, like, I gotta warn you, like, I've read the blurb. This book, I like, I'm this book might be upsetting to me. Is it okay if I'm like,

Adam Leventhal:

I have to like the book?

Bryan Cantrill:

I'm like, no. Definitely not. Like, that's even better. Like, come in hot. Like, we'll, invite Kevin Russo.

Bryan Cantrill:

Treat him as a punchy bag as well. We'll we'll we'll, so anyway, hopefully, folks, if you if you haven't started reading that, definitely read that. Join us for discussion, or you'll be able to obviously catch the recording there. And, and then I think we've got a, we've got a wild episode coming up in a week, Adam. I'm gonna tease that a little bit.

Bryan Cantrill:

You'll look for more details. We've got a big, big crossover episode coming up next week.

Adam Leventhal:

Stay tuned. Yeah. Very excited.

Bryan Cantrill:

Stay tuned. I am extremely excited, probably a little too excited for the episode we're gonna have next week. Yep. Andres, thanks again. Amazing story.

Bryan Cantrill:

Thank you, everyone. And, until the next backdoor, take care.