Oxide and Friends

Bryan and Adam reminisce about the DTrace journey 20 years after first integrating the code into Solaris back in September 2003.

In addition to Bryan Cantrill and Adam Leventhal, we were joined by Josh Clulow.

Some of the topics we hit on, in the order that we hit them:
If we got something wrong or missed something, please file a PR! Our next show will likely be on Monday at 5p Pacific Time on our Discord server; stay tuned to our Mastodon feeds for details, or subscribe to this calendar. We'd love to have you join us, as we always love to hear from new speakers!

Creators & Guests

Host
Adam Leventhal
Host
Bryan Cantrill

What is Oxide and Friends?

Oxide hosts a weekly Discord show where we discuss a wide range of topics: computer history, startups, Oxide hardware bringup, and other topics du jour. These are the recordings in podcast form.
Join us live (usually Mondays at 5pm PT) https://discord.gg/gcQxNHAKCB
Subscribe to our calendar: https://sesh.fyi/api/calendar/v2/iMdFbuFRupMwuTiwvXswNU.ics

Bryan Cantrill:

Okay. Let's write down how the other sounds on a piece of paper and both flip lower at the same time.

Adam Leventhal:

Well, you sound terrific like a millennial podcaster.

Bryan Cantrill:

That's what I had on yours. That's what I had for you. Alright. We're good. Everyone's Wow.

Adam Leventhal:

Alright. Perfect. Nailed it.

Bryan Cantrill:

So here we are 20 years later, 20 years after we integrated Detrace in, it's a may because you know what's amazing? Yo, I because I got married shortly after we integrated Adam.

Adam Leventhal:

Yes. That is amazing.

Bryan Cantrill:

It okay. Wait a minute. That wasn't I I don't think that's actually where

Adam Leventhal:

I was going. What I don't remember we're

Bryan Cantrill:

not actually after test.

Adam Leventhal:

What I remember is

Bryan Cantrill:

that I actually managed to, like, actually made and He's been

Josh Clulow:

hold he's been holding it in for 20 years, to be fair. Exactly.

Adam Leventhal:

You know, I I think you had you had both of these deadlines looming, both involved a bunch of work. I remember someone at the time kinda commenting on it, and you saying that the detrace deadline was much more stressful. I think sort of because the the, like, the wedding was going to happen one way or the other. Like, it didn't matter what kind of shambles it was. Like, if if everyone was getting takeout Hardee's at our or the Australian equivalent for dinner.

Adam Leventhal:

Like, it was gonna happen as opposed to teachers integrating. Like, it was only gonna happen if it was done.

Bryan Cantrill:

First of all, I believe the Australian equivalent would be Hungry Jacks.

Josh Clulow:

Josh, correct me if I'm wrong. Now now look. Hungry, Hungry Jacks is, was a spin off of Burger King. Is spin off the right word? Well

Bryan Cantrill:

Is that the word you're looking for?

Josh Clulow:

There was a franchise licensing thing that happened. Because eventually, like decades later, the Burger King people showed up and started trading under their own name, and I feel like that,

Bryan Cantrill:

caused Did Hungry Jacks go away? I don't know that

Josh Clulow:

it went away, but it may have actually by now. Like, they may have figured out how to fix all that up. But for for a long time, it was Hungry Jack's, certainly.

Bryan Cantrill:

And for an American, Australia, you're like, does is Burger King aware of Hungry Jacks? This is this looks like No.

Josh Clulow:

They definitely they definitely were. Because that was it was the the home it was our home of, what you would call the whopper. I believe we also called it the whopper.

Bryan Cantrill:

Yeah. What did you call it? I think you called it the I think

Josh Clulow:

it was the whopper. I think I don't think there was anything substantially different about the menu.

Bryan Cantrill:

I think the Wappo is actually I think the Wappo feels like a it's almost a name that that Australia would come up with. I think you'd be very proud of the Wappo.

Josh Clulow:

Now we'd call it the we'd call it the WAPA. No. WAPA.

Bryan Cantrill:

That's the Right. Exactly.

Josh Clulow:

Be less let's roach your cars in in the

Bryan Cantrill:

Fair enough. So, yeah, Adam, I thought it was gonna be all Hungry Jacks at the wedding. Fine. Who cares? The meanwhile, Detroit's not integrated.

Bryan Cantrill:

It would be a real problem.

Josh Clulow:

So you're saying one one of these projects was was date driven?

Bryan Cantrill:

We pretty much. Did I verbalize this relative anxiety around Bridget? I can't remember if this was

Adam Leventhal:

No. But, like, around a lot of colleagues. And I think, actually, one of your one of your helpful colleagues, not me, did mention this to Bridget, and she was like, absolutely. Like, date driven. The other one was quality driven.

Adam Leventhal:

Look what I've ended up with.

Bryan Cantrill:

You know, I'd like to say, we've been married for 20 years in October, so it's we we're we're we're doing alright. We're doing alright.

Adam Leventhal:

Yeah. Exactly.

Bryan Cantrill:

The merit. Both doing well, actually. Pretty amazing. So, I mean, I think it's amazing that it was 20 years ago because it still feels so recent. And I feel it's like we're really lucky to have software.

Bryan Cantrill:

I mean, how many people get to have 20 years of experience with their own software and get to, like, look back and, you know, what do we do right? What do we do wrong? You know, what were some of the things we regretted? I don't know how many of those there are, but there are a lot of things we got right, actually. That was that was fun.

Adam Leventhal:

Yeah. And remarkable to literally still be using it and and privileged to see it living on in systems that we didn't build. You know, every time I fired up on my Mac, I feel very grateful that that has continued, in their, you know, mostly true to the principles fork of it, and it's been great.

Bryan Cantrill:

And do you feel slightly less gratitude every time you have to go into SIP or whatever to go, to look through

Adam Leventhal:

is that they're like, oh, good, a new update. Oh, like, a new problem. Oh, I have to do the SIP dance to be able to use this thing again. There's that.

Bryan Cantrill:

I am grateful. I am grateful.

Adam Leventhal:

Yeah. For those not familiar, Apple has DTrace and has had DTrace and we'll we'll get into that. But, it is locked down in ways where you have to break a bunch of warranty voiding, you know, glass in order to get your way there, but it's not so bad.

Bryan Cantrill:

It's not so bad. And it is, and we are I do think we feel very lucky to have a technology that people, are willing to root their backs to use.

Adam Leventhal:

Yeah. It's That's right.

Bryan Cantrill:

It's terrific. So, how did how did I think we wanted to, reminisce a bit about Yes. Some of those early days, for sure. And especially with, like, the things that kinda benefit from 20 years of hindsight, of, because there is a lot I mean, there it's actually we we have had to extend it. We had to rewrite remarkably little of it.

Bryan Cantrill:

Mhmm. Yeah. Josh, did you say beetroot in the in the chat? Is that a reference to that's not a reference to beetroot? No.

Bryan Cantrill:

Never mind. I'm I'd

Josh Clulow:

no. I was we were we're still on Hungry Jack's.

Bryan Cantrill:

Okay. So on Hungry Jack's. I actually okay. This is where my brain went. The because you Bertrand and I met online.

Bryan Cantrill:

This was it but it was and I now know that this is how everyone meets. But this is, and, in order to, to this was at a time when you couldn't see other profiles. Like, you could only see, you know, if you were a man seeking a woman, you could only see women, not men. So I wanted to scope out the competition, so I created a a fake profile. Did I tell you this, Adam?

Bryan Cantrill:

Yes. Oh, I did tell you this. The the on on one of the lab machines named beak, and I had the the the the account that I created. I'm sorry that you do this. I this is where you got nothing to do with the Choice Butters.

Bryan Cantrill:

Right? It has it has a lot to do with getting married. So I'm sorry. We're just gonna we'll be here quickly, then I'll get back in the car. The, so I created this account.

Bryan Cantrill:

I used the root account on Beak. I named this profile Beakroot. It was very important to me that it get no interest from because this is I'm going to I'm gonna create a, basically, a a fake woman to to see what my own competition was, and I I wanted to be sure that that this got no actual interest. So I wanted to create the and I I you don't actually I didn't have to upload a profile photo. I basically didn't answer the survey questions or the questions I answered with, like, the just minimal amount of detail.

Bryan Cantrill:

I did have to put a height and a weight in there, which was a nice item. Like, turns out it doesn't need any more than that to get an overwhelming amount of interest is what I learned. But I I did need to put a tagline in there. I I you it won't let you continue without an actual, like, you know, this little, like, one sentence synopsis. And, Adam, do you remember this?

Bryan Cantrill:

I remember do you remember

Adam Leventhal:

No. No.

Bryan Cantrill:

So this is the height of the dotcom boom. So I wanted, again, make this revolting to anyone, to any would be suitors. So my my tagline was Uberhip.comer seeks pre IPO bow. And I I mean, of course, like, it all makes sense now. Like, I got buried in interest.

Bryan Cantrill:

Like, I got I got I got that that thing got so much more interest than the actual, like, real me. It was a little bit like, okay. Alright. Well, and so I had people who were pouring their hearts out to this thing that the only and they're explaining, like, what their vesting schedule was. It's like, oh my god.

Bryan Cantrill:

Oh my god. This is so embarrassing. Anyway, so yeah. Sorry. I when you said beetroot, I thought you're making a beetroot reference, Josh.

Bryan Cantrill:

I've I I Josh, I'm I'm sorry if I burned you with oversharing detail. Actually, I don't think I've

Josh Clulow:

heard that story before. So let's

Bryan Cantrill:

You don't have to use the word actually. Actually. There we go. There's one.

Adam Leventhal:

Brian, in terms of where we start on DTrace, not to distract you, but I'm not sure you've I don't know how broadly you've shared this, but I've always found it interesting, like, where the idea from dtrace came from, because that was well before your time at Sun. As I recall, like, this is these some of these early ideas were from when you were an undergraduate.

Bryan Cantrill:

Yeah. That's right. Well, I wanted in particular I had actually, it's funny because I was just seeing someone on Hacker News experimenting with what you can do just with LD preload and how you can, like, load your own shared object, and you can interpose on an arbitrary shared object. And I wrote this little goober called that I called Sift that allowed me to basically do what what shared object interposition now does and SoTrust. Remember SoTrust?

Bryan Cantrill:

Mhmm. And it was just shining a bright light onto, like, oh my god. This is what the software is doing. And I just felt like there were you know, I I was working on the, trying to make sense of the end to end scheduling model. And it what I realized is just, like, what the software was actually doing and what people thought it was doing were 2 very different things.

Bryan Cantrill:

And I'm like, why don't we have a better way of figuring out what the software is doing? The only way of figuring out what the software is doing is, like, putting a break point in the debugger. Like, why do we do it that way? And why don't we I just don't understand. I didn't understand why we don't dynamically instrument program text to

Adam Leventhal:

when when

Bryan Cantrill:

we wanna understand what it's doing. And then once we understand what it's doing, I'd uninstrument it and put it back, basically. And which doesn't feel like super deep thought, but the I I did I did ask our OS professor, at university about this, and he's like, yeah. You know, I don't know. I think it's like I think there must be some reason that it can't be done because if if they well, this is, like, kind of important that it's like he's like, this is like a reasonable thing.

Bryan Cantrill:

I didn't even question this at all. So if there were if you could do this, they would have done it by now. And I thought the same thing. I'm like, I think you're right. Yeah.

Bryan Cantrill:

There must be some reason that this is, like, impossible to improve instrument program text. And when I was interviewing at Sun and, had, was talking to Jeff Bonwick and Barb Smulders, and I remember going to lunch with everybody, and I was coming back from lunch and kind of, like, hey. There's this question we mean to ask you, you know, Why don't you do it this way? Why don't you allow program text either in kernel or in use LAN to be dynamically instrumented? And and I Barnoch is like, yeah.

Bryan Cantrill:

No. There's no good reason, like, you should do that. We should do that. I'm like, oh, really? I just was not I I was I was so oriented to getting an explanation about why this was impossible, and the there was this moment where he was, like, you should come here and do that.

Bryan Cantrill:

Like, yeah, you should you could come here and do that. That sounds great. And I try to remember that moment when dealing with anyone early in their career, asking why something isn't done some way. Because it it felt, like, so uplifting and empowering to be like, oh, I'm gonna come here and do this. And I'm that sounds great.

Bryan Cantrill:

And what I didn't realize was well, you are, but there's a lot of actually, like, junk you're gonna have to do at first. I just felt like so I from coming to Sun, I had an idea of, like, this is what I I where I wanna kinda orient myself towards. But there was also a lot that had to be done in the interim, and there was a a lot that was just so that was in 1996, and we didn't start DTrace until 2001.

Adam Leventhal:

Well, you started in earnest, but you're putting down a lot of foundation as I recall.

Bryan Cantrill:

That is

Adam Leventhal:

You know, in particular, CTF. Yeah. But, like, also, you're getting a lot of miles on the tires in terms of your experience with Solaris and debugging, really complex performance problems.

Bryan Cantrill:

Yeah. That is true. And so a couple of things were that happened that were important. So one, we had some really crushing performance issues, especially on these big systems, the e ten k's. And the only Jeff had done this really interesting tool in source 26 called Lockstat.

Bryan Cantrill:

And, Lockstat allowed us to Lockstat actually did use dynamic text instrumentation of just the synchronization primitives, and allowed us to understand what the sync where we in particular, when we were blocking, why, where, for how long.

Adam Leventhal:

And Just to pause, locksmith is so fucking cool. And especially before we had built DTrace, when you start looking at the locking primitives and see that these handwritten assembly routines were built in such a way that anticipated this kind of dynamic instrumentation that you're referring to. They were built in a way where they say, replace this instruction with this other instruction, and then it'll fall through into this Xanadu of data collection. Yeah. And then Right.

Adam Leventhal:

And yeah. System back to fully functional or pardon me. Fully fully, full speed, not fully functional. It was always fully functional, but rather to optimize performance.

Bryan Cantrill:

In the that's right. And I and Jeff had actually done that. So when the synchronization parameters were originally written, that was not case. And Jeff had rewritten them in 2.6 as part of doing a lock stat work if memory serves. So that's why it looks like it was always designed with instrumentation in mind because it was rewritten with instrumentation.

Bryan Cantrill:

Oh, yeah. Yeah. Yeah.

Adam Leventhal:

I know. Yeah. Of course.

Bryan Cantrill:

And and then that what ended up being really, really important because it was our only way of understanding what's going on inside the kernel. And we had, the but it was really limited visibility. I mean, it was you can only instrument this kind of, like, one very important, but kind of thin layer of the system, namely the synchronization primitives. And you would immediately have follow-up questions that you couldn't answer or couldn't answer easily. And so we and there was in particular, there was and I remember this very vividly because it was my birthday, 1997, dealing with this very large e 10 k system, benchmarking system that, I hadn't really even seen any 10 k.

Bryan Cantrill:

We didn't have one in software. It was way too valuable to give one to the software. So we, this was an e ten k that was running a gigantic SAP benchmark, that General Motors was interested in, and it required actually 4 other e ten k's to feed it, which was

Adam Leventhal:

Holy smokes.

Bryan Cantrill:

Yeah. Huge. And it took, this machine took, like, 2 and a half hours to boot. And we knew that the system would enter a prolonged states of deep, deep sadness, and, we did not know why. And all the only tool we had was lock stat, and you so you could see during the periods of sadness that, like, we were in the networking stack.

Bryan Cantrill:

And I was writing modules. I was I was basically doing handwritten unsafe DTrace. I was basically writing custom kernel modules and loading them, and and then, like, effectively hot patching the system to jump into them. And it mostly worked, but occasionally didn't, and the system would panic. And then I would, like, go have 2 and a half hours to reflect on on what about.

Bryan Cantrill:

Yeah. It was bad. Nice. And the and through that, the whole time, I was thinking, like, this is a networking issue. And that wasn't totally wrong in that we were ultimately like, we we were super sad because we were, like, order of n cubed in in the networking stack.

Bryan Cantrill:

But, like, why were we there, and why was it so transient? Well, as it turns out, the reason it what was actually happening is system had been misconfigured, and it'd been misconfigured to act as a router. And

Adam Leventhal:

I I forgot this. This is hilarious.

Bryan Cantrill:

And there was a router elsewhere in the lab that was popping occasionally, and that thing would pop, and the e ten k would be like, you hoo. I am actually the world's worst router. If I'd like if you would like to wow this other lab routers out, I can it's

Adam Leventhal:

I I like Do you want a DHCP, please? I can I can handle?

Bryan Cantrill:

I I don't think I don't think I don't think I'm too busy. It's like, no. No. No. Are you busy with anything?

Bryan Cantrill:

Any benchmarks you should be doing right now? Right. Don't you have an if you think really hard, like, don't you remember the SAP workload I just gave you? It's like, oh, that is ringing a bell. Maybe.

Bryan Cantrill:

You know what? I'm gonna furiously route these packets in this terrible order of n cubed algorithm. And, and then is which also explained why it would suddenly write itself. And then it would be And I remember that moment was, like, very eye opening for me about the idea that these things deep in the stack could actually be manifestations of very high level issues where, you know, you a system has been, you know, misconfigured or misadministered, or there's an application running It's a surprise. And the only way we kinda get to it is with these super deep symptoms.

Bryan Cantrill:

I mean, the the Adam, this is what I I I keep trying to get traction on the on on Leventhal's calendar. That's right.

Adam Leventhal:

The the butterfly flapping its wings that you need to find that has caused the hurricane. That has caused

Bryan Cantrill:

the hurricane. You have the hurricane, you need to find the butterfly. And, the it's which is still really hard. I mean, really, really hard. And but I think that whole that became very eye opening up.

Bryan Cantrill:

Like, we absolutely have to do this. And then the foundation that you're referring to was so I, I had gone

Adam Leventhal:

Hold on. Before before you got before you take the next step, so that's 1997, which is before I got to Sun. No. By the time I got to Sun 2001, DTrace, I think you and Mike, Mike Shapiro, who who who worked with us on DTrace also you guys had a notebook, like, a a DTrace notebook. Yeah.

Adam Leventhal:

And when I got to Sun, DTrace had this sort of vaporware Pidgeot. Like like, you know, you you guys would wander into problems, and I don't know if this is actually true, but seemingly and say, well, you know, DTrace would solve that. And you had done that enough where people were like, alright. Well, could you please just build it rather than telling me that you would? The thing you have not built would solve my problem?

Bryan Cantrill:

That's right. It was it had a a Duke Nukem Forever kind of vibe. Or is it Halo 3? Am I using that right, Josh? Is that right?

Bryan Cantrill:

No. I the Do do

Josh Clulow:

you mean Half Life 3?

Bryan Cantrill:

Half life 3.

Josh Clulow:

Oh my god.

Bryan Cantrill:

Look. I'm only I I'm I'm sorry. I'm I'm not a gamer. I don't And

Adam Leventhal:

so all that fumbling is gonna look great in post. We'll clean that all the way up.

Josh Clulow:

Here on Discord.

Bryan Cantrill:

Right? Here on disc okay. Look. Half Life 3. Right?

Bryan Cantrill:

I knew it was one of those things that the kids talk about. The, it had these kind of vibes of, like, the thing that is always coming that is that you actually want right now, but does not actually, does not actually exist, and you're being told that it will be don't worry. Penicillin will solve this. Like, well, what are the penicillins? Like, wow, we haven't really started about it.

Bryan Cantrill:

It's like, okay. Well yeah. And I did I Adam, you could, I'm sure, visualize this because I caught Tim Marsland, a former colleague, a who very distinguished, accent guy English guy. And, Tim was, trying to debug a problem, and I that had been really painful for him. And I did cheerfully volunteer.

Bryan Cantrill:

I'm like, you know, Detroit solves that problem. And he was so pissed off. Detroit solves that problem. Why don't you go start giving me a detrace? Like, okay.

Bryan Cantrill:

Right. Maybe we need to stop talking about the problems it's going to solve. Maybe That's

Adam Leventhal:

right. Enough with the hype. Right?

Bryan Cantrill:

Enough with the hype. And I did think that, like, I I I think that, you know, this is also kind of an another interesting object lesson. By the way, we structured the early approach. First of all, we knew that we wanted to get the DTrace. And I think the thing that you're referring to that that was exceptional was that Mike had figured out a bunch of the foundational stuff that was gonna be required and that that I like, I didn't really fully appreciate the importance of CTF 4D trace when which is our ability to get type information in the kernel, and Mike felt very strongly just justifiably so that that type information should be available in the kernel itself, that you should not need to get some auxiliary file that contains the system itself should have its own type information.

Bryan Cantrill:

And we needed that for MDB, but Mike also saw the need for for DTrace, which I really did not appreciate at the time, honestly. And he had there were other things like this where he had and I he had a a terrific kind of connected graph of all the things that were gonna be required to get us to the point where we even earnestly start on d trace. And so we did a bunch of that work in that in the kind of 98, 99 time frame, which I think is the

Adam Leventhal:

this is, like, Solaris 8, Solaris 9

Bryan Cantrill:

kind of 8, Solaris 9. Yeah. Yeah. And but looking really for an opportunity to to actually really start on DTrace. And I and I think that a lot of engineers feel this way that especially when you become valuable in organization, you feel like you're constantly being kind of caught caught up in the next crisis, and it can be hard to get out from underneath that crisis disposition.

Bryan Cantrill:

And you and I got caught up. Well, I got caught up in a crisis, and I saw your call. I dragged you under with me. The the Cheetah plus, fast. So, Cheetah was a, microprocessor, a very cruelly named microprocessor at Sun, because the thing was not fast, had a lot of problems.

Bryan Cantrill:

It was hot. It was late. It was slow. It was expensive. It had been I mean, other than that Other than that.

Bryan Cantrill:

It had a lot of problems. Ultra Spark 3. And in particular, they had made a really grievous error, where they had taken so the recall the TLB issue with Cheetah Plus, at or with Cheetah. So they, UltraSpark 2, Blackbird, and so on had a fully set associative 64 entry TOP. And, then then some number of those pages will be locked, but fully set associative for all pages.

Bryan Cantrill:

And what they had done is taken, a bunch of instruction level traces from Sybase running on Solaris 2.4. Now the year 2.4 comes out in, like, 94, maybe 93, and now it is 2,000 or it's 2,000, 2001. It's early 2,001. And so this is like a trace that is, like, 8 years old. And but it shows you how long these the the when they made this decision, it was arguably only, like, 4 or 5 years old.

Bryan Cantrill:

But the decision they made was the operating system isn't using large pages. So we should have a large TLB for small pages, and a very small TLB for large pages. And the problem was we hadn't implemented large page support yet, and we that support was implemented in the operating system after supposed 24. So the data that they gathered was all basically wrong and was real I mean, that's this is one of these things where it's like being more data intensive was actually the wrong thing to do. And they but they shared that as, like, realized, like, no.

Bryan Cantrill:

Of course, we are gonna add large page support in the operating system, and, of course, you should support large pages from a larger TL from a large TLB. But what they did is they moved to a 16 entry, voice head associative TLB for large pages, which is down from 64, and then a bunch of those ended up being locked. So instead of so you end up, like, 5 of those being locked. So you end up with, like, this 9 entry TLB for large pages. Small pages, meanwhile, live out of a 512 entry TLB, but only 2 wayside associative, which is also just death.

Bryan Cantrill:

So when your set associativity gets that so I don't know. Adam, do you remember these workloads? Because it's on both the instruction side and the data side. Mhmm. So you you

Adam Leventhal:

must be thrashing. Right? And you'd be thrashing from compilation to compilation depending on how the compiler laid out your code.

Bryan Cantrill:

That's right. That's right. So you could easily, easily, easily have a, just a workload that is hitting, like, instruction going through a plit and into a shared object. Like, that's 3 different bodies of text you're gonna execute in a pretty hot loop. And it's like, yeah, you get unlucky, and those are in those map to the same set.

Bryan Cantrill:

And, like, literally, a shared object web by the time you got up there, you, like, evicted the, your program text that called it, and it was bad. You're like, why am I every once in a while when I compile this thing, it's 55 x slower. Oh god. And so I was I was very much called in to deal with that, which is, like, really super tough because it's like, well, we've we've we've made a bunch of decisions already, and it was a big mess. But I part of my my condition of that was, like, I'm gonna do this, but then I really need to have to be given time.

Bryan Cantrill:

I need I need 6 months to focus on Detreiz. And so I I've been kind of been talking about this enough, and I'm gonna kinda go into, like, this one last firefight, this one last crisis. But in exchange for that, I need 6 months of focus on Detroit's. And I wanted Mike and me to be able to focus on nothing but Detroit's for 6 months.

Josh Clulow:

And I have this plan for a fire engine?

Bryan Cantrill:

That's right. For next fire fire. Exactly. I wanna build a fire engine. I would like to well, and I I think a the it, and, and Matt, Aaron's in the chat, is pointing out that that, yes, the TLB misses if they missed in the what's called the TSB, they would trap into the operating system.

Bryan Cantrill:

Yeah. There's no hardware page table walk, and it was a big mess. It was really, really bad. And, I mean, this is, like, we are this is part of the reason why we Spark was losing the plot to x86. But so the I finally got the and kind of the the the I I wanted to get a thing called trap stat, integrated into the system, which allowed us actually a kinda on point.

Bryan Cantrill:

Trapstat allowed us to actually, it would instrument the trap table so we could actually say how much time we were spending in the TLB miss handler. What we discovered is, Matt, I don't know if you remember, like, the lint pass 2. We we learned that lint pass 2, which took forever when you're running on the kernel, was basically spending 60% of its time, which basically meant all of its time, filling the TLB, because the resonance set was just larger than the reach of the TLP. And we were just apps and it's basically random, and we were just absolutely thrashing on on pass 2, which is definitely it would I mean, eye opening in terms of of how brutal I can be. But once Trapsat integrated, then I was like, okay.

Bryan Cantrill:

I finally have I can go focus on DTrace, which happens sometime in the fall of of 2,001. I I wanna say in November of 2001 is when we really got to start focusing on it, and got to focus exclusively on it.

Adam Leventhal:

And and Solaris 10 had just shipped, and or excuse me, Solaris 9 had just shipped. 9 had just been. Yep. And and I don't know if this is too far, but I felt like Slayers 9 was kind of the completeness of the SVR 5, like, or the the Slayers 2 dot o multithreaded, multicore, multiprocessor vision. That is to say I think

Bryan Cantrill:

it's good right now.

Adam Leventhal:

All of the loose tent ends were sort of tied off. Everything sort of made sense. There were no awful, like, obvious glaring omissions or places where two things didn't intersect appropriately. And that's I would also say that's that's why it wasn't just Dtrace that got started then. It was ZFS.

Adam Leventhal:

We got Matt Aaron's here in the chat and zones and some other projects that all were building SMF. There you go. And building on that foundation, which was now, you know, not as rickety as it had been.

Bryan Cantrill:

That's right. Yeah. I mean, I think we everyone felt like I can go off. And it wasn't really planned for everyone to do this at the same time, but it in hindsight, it did kind of happen for the same there was a a shared reason underneath all of them, which was, okay. This system works well enough, And and if the abstractions now work, and now we can actually maybe we can think about new abstractions.

Bryan Cantrill:

And, yeah, that's that is Matt and Jeff on CFS, a Pacific, that is on that's Fire Engine, that is FMA, that is SMF, and that is Detroit's. And DTrace, I think, was probably the first one because it had been a chip on my shoulder for probably I mean, I think I wanna say that DTrace was was kinda the first one in the shoot. I mean, Matt, maybe maybe Pacific was starting I I guess Pacific was starting at basically the same time. But

Adam Leventhal:

And and so did you get to so you got to put your head down and actually the, you know, the organization didn't bug you too much for 6 months?

Bryan Cantrill:

That's right. It did definitely. And and, you know, in hindsight, like, either the organization did a good job of not bothering me, or I did a very good job of ignoring the arc the the organization. I'm not actually aware. But definitely got I mean, it really did focus during that period of time.

Bryan Cantrill:

I I actually think that, you know, this is also when as as I've kind of been thinking back on that period of time, this is the first time that I was really working from home, I think. And the, that we've kinda all started to work from home. You know? And that because I I'd not been working from home through 96, 97, 98, 99. It's really only in 2,021,001 they started working from and that that really did afford, like, new levels of focus.

Bryan Cantrill:

I think that people, forget that the you know, if you really need to go and do all this kind of return to the office nonsense. People do forget that, like, boy, when you are working when you know what you need to go solve, being being able to really put your heads down head down and have zero distraction by working from home is really powerful. And I guess it was like a sweet spot. Right? Because it's like working from home before, like, Slack and TMs and, like, messages and, like, yeah.

Bryan Cantrill:

I don't know. If you want my attention, email me, but I don't know. I read my email, like, once a day, I guess, at this point. I'm just like

Adam Leventhal:

Call my home phone or whatever.

Bryan Cantrill:

Call my home phone. Yeah. Seriously. I'm just, like, super heads down, and it was a great it was a beautiful time because I look at how much, you know, we were able to actually do in a relatively short period of time. So we had started this idea of a kernel technical discussion, and I just remember having this really big milestone around presenting the work that we had done to the whole kernel group.

Bryan Cantrill:

And my my view on that was, like, when we did that, this was kind of the referendum on Detrece, and we had to be far enough along to show that it merited additional investment. That I'd kinda won myself the opportunity to work on this for some period of time, but not an indefinite period of time. And I really need to show very tangible wins. And one of the things that was important to me is the ability to actually, debug the system and be able to actually get some real bugs and show that we could and I think that, you know, part of, you know, the the kind of the genesis of DTrace wanting to show itself by debugging actual bugs on the actual system, I think was really important. I don't know what what what your kind of take is on this, but I think that we Absolutely.

Bryan Cantrill:

I

Adam Leventhal:

I mean, both for credibility, but also to know that you're on the right path and to know what to build. And and that that was thematic and almost everything we added to DTrace.

Bryan Cantrill:

Almost everything we in fact, everything we added to Detrace was because we needed it. And we needed it in from very early period of time. So the kind of the first thing that that that kinda got working, was this ability to instrument the functions in the kernel. To instrument, so with the we'll call FPT function boundary tracing. A term that I invented because I couldn't find a term for it.

Bryan Cantrill:

I'm like, what is the term for this? So I'm sorry. If FPT is a snicker, I'm sorry. You play that.

Adam Leventhal:

It's amazing how much that is, you know, people have lapped onto that. It's I mean They

Bryan Cantrill:

have. Yeah.

Adam Leventhal:

Maybe I shouldn't actually hear you on that one, but that turned out to be a good term. Or at least nobody could think of a better one.

Bryan Cantrill:

No one could think of a better one, really. I think that that's actually much more accurate. And we so I I got FBT working, and I knew that we wanted I think the other thing that was actually really helpful is that all this one of the things I definitely wanted to do was replace LockStat as a DTrace provider. So, I mean, I in other words, I wanted to use I I wanted to separate out the way we instrumented the system from the framework that consumed that instrumentation. And I think that was a really important idea.

Bryan Cantrill:

I was thinking I think an idea was more important than I realized, actually, that that because I think other frameworks hadn't done that. Other frameworks had tied together the way I instrument the system from the thing that that actually processes the data. And, in a way that was really hard to kinda unglue, and it was very important to have a very crisp boundary, what we call the provider boundary there.

Adam Leventhal:

Yeah. Other important principle that you nailed in that first prototype or or in that first presentation was the the ability to run it in production, meaning the system was going full tilt, fully optimized, you know, no weird flags or extra code generated, and then dropping into an instrumented system where the penalty was proportional to the question being asked. And that I know that we've we've probably said that 10,000 times in 10,000 presentations, but, didn't feel obvious at the time and and other tracers at the time certainly weren't doing that.

Bryan Cantrill:

It definitely didn't have that approach. Yeah. I mean, that was a very important constraint when we actually set out is that this has this has to be available on customers, the work we had done on these benchmarking systems. And then I think it also must be said, the work that we had done, and one of the things was really important that we had done, and a decision that that that definitely predated me, and due to Roger Falcon, the late Roger Falcon, and others, was the fact that we were running the operating system ourselves in building 17. So we had an NFS server that had the whole building hanging off Jurassic.

Bryan Cantrill:

And this ended up inspiring, the other groups inside of Sun did did similar things, and we ended up with, like, prototype hardware. We ended up using this again and again and again. We've used it here at Oxide. And, boy, running yourself on yourself is both very empowering and very eye opening in terms of the constraints of the problem. You become your own customer.

Bryan Cantrill:

And the like, we had to have because we had we definitely had issues with Jurassic, and we needed to be able to debug Jurassic. And we couldn't afford to put a debug kernel on there, or a different kernel, or a kernel that actually, you know, had this kind of instrumentation enabled. We couldn't we couldn't take a reboot to go, actually debug a problem. We needed to debug it in situ. It had to be available in production.

Bryan Cantrill:

That meant it had to be absolutely safe. So that was definitely a a a constraint from the get go. It's like, we have to be safe. And as a result, when we instrument things, and the the kind of the contract that our provider has with DTrace itself is that that I'm gonna instrument the system in a way that is safe, the way that we we will not be able to actually roll the system. So it's not going to instrument the system in a way that it doesn't understand or can't reason about, and DTrace will always on the side of, like, sorry, I can't instrument this, because I actually don't have confidence about the context the thing is in.

Bryan Cantrill:

But we do have because we could figure out every function entry and return in the system, and we had a way of of instrumenting that pretty cheaply in in Spark without a trap, actually, by using an unconditional branch. We I was able to get something working pretty quickly with FPT, and I just remember running it on my desktop and being able to see all the things that were going on in what was the HME was the hack Happy Meal Ethernet, which is the the the has its own story with Tom Lyon here to tell the Happy Meal Ethernet story, but, it was the, the neck that we had at the time, and being able to see everything that was going on in the driver as it was happening, being able to do that totally safely, control c it, and have the system restored, and being able to do that, all that on my own desktop. I did that that was, like, an early moment of, like, this thing actually has legs. This is important, and I we're gonna be able to do things with this that we couldn't do with any other system.

Adam Leventhal:

That must have been an incredible moment. I mean, just to have been thinking about this problem for, you know, 5, 6, 7 years. And then I don't know. Just to I still get a kick out of seeing, like, what is going on on my desktop right now. And for to see that for the first time, like, must have been pretty wild.

Bryan Cantrill:

It was wild. Yeah. And I think it was it was wild and it was also like okay. It was it it was I mean, vindicating is almost the wrong word. It was energizing because it was like, okay.

Bryan Cantrill:

We've got a lot of work to go do, but but it was one of these things that that you you when you're engineering something big, you get these, like the omens become really important. You know what I mean? It's like Yeah. It's the seabirds showing that you're close to land or the dolphins, you know, to go to a a nautical metaphor. I mean, those omens are really important.

Bryan Cantrill:

Like, you're on the right track. This is and you you really especially when you're early in something big, you need to take those things really seriously, because they they can be really inspiring, and you need that energy to get through all the things you need to go do to get this thing done. So, yeah, that was really that was that was great. And I just remember being, like, wow. Okay.

Bryan Cantrill:

This is we can actually there's a lot we need to go do, but there's a lot we can go do with this.

Adam Leventhal:

And So you guys gave that presentation. And as I recall, I mean, it it brought down the house. Like, people went bananas for both what you were showing and the the vision that you were painting.

Bryan Cantrill:

Yeah. And I I I just dropped the deck in the in the chat for that, which actually is funny. I don't know I don't know if you looked at that deck since we basically I've interviewed got back. Yeah.

Adam Leventhal:

Yeah. Yeah. And I I I are you gonna talk about, like, the, the Sun IT naming convention? Is that where you're going? Because that Yeah.

Adam Leventhal:

I I I was on slide 1 on that deck, and I was like, oh, they dropped the joke. And then and then I went to slide 2. I was like, oh, there it is. I remember that.

Bryan Cantrill:

Right.

Adam Leventhal:

At the time, IT ops was determined that everyone should conform to this first initials, last initial, badge ID as their as, like, their identifier. We're all As your user tattoos or something. Yeah.

Bryan Cantrill:

Right. As your username, and you were no longer and I you know, this is one of these interesting cultural things where, like, you know, like, the way he was kinda listened to the wrong person on this one and didn't really understand the issue. And then Well, and

Adam Leventhal:

he was gonna be s m 3, so what did he care?

Bryan Cantrill:

Yeah. Exactly. I I don't know. Why is everyone so upset? Well, I do you remember him, like, being like, okay.

Bryan Cantrill:

I guess it's, like, really important that everyone has their call sign or whatever. Like, named your child Maverick after Top Gun, dude. Like, you were on the thinnest conceivable ice here. Okay? Like, let's rename your kid.

Bryan Cantrill:

I mean, let's yeah. Okay. Let's get rid of everyone's Paul sign, pal. But the I mean, to his credit, so I think this was right when this was being kinda handed down. And I you know, to his credit, they realized that this was a bad idea, and they actually needed to go.

Bryan Cantrill:

And so but but this was this was rescinded shortly thereafter, and we got a we we we were allowed to keep our our identifiers and didn't realize that, like, these usernames are like, alright. These these these engineers take their usernames really seriously. Don't touch their usernames. They freak out. But so, yeah, that that's that's in this deck.

Bryan Cantrill:

But the yeah. So I recall it going on for a long time. It was being I think it's the first one that we videoed. I've got the video somewhere. I don't know where, right now I need

Adam Leventhal:

to VHS. I just wanna point out. Like, even if you found it, like, what would you do with it?

Bryan Cantrill:

But I actually get it transferred to d we had it transferred to d Oh. D.

Adam Leventhal:

Alright.

Bryan Cantrill:

So it does exist on DVD. And I I should make some effort to find it because I you know, it's kinda I mean, it would be this is, like, people people think

Adam Leventhal:

I'm looking at Marvel. Even then, like, the DVD player. I don't know. But, I mean, fine. Could.

Adam Leventhal:

That's better.

Bryan Cantrill:

Yeah. I know. It is better, but, yeah, quite a reward as well on the DVD player. But the and yeah. I mean, it was it it was a lot of I mean, I Adam, so it's kind of interesting because, like, you were effectively in the audience.

Bryan Cantrill:

You were a you had just

Adam Leventhal:

100% of the audience. I I had nothing to do with at that point. I was I mean, I I was cheering it on, but, like, I hadn't used it. I hadn't worked on it.

Bryan Cantrill:

And so I I think it was I don't think people realized how kinda far along we were down south. I mean, I do love the fact that we were and actually, I was kinda surprised when they're like, man, we actually were we had done a lot in not very much time. And in particular, there were I mean, we probably that we spent a little bit too much time on our own implementation because that's what we were right in the in the the weeds of. But, you know, we had done some actually important things, like, pretty early. And, you know, we we talk about, like, the safety in there and using the type system and, you know, being able to aggregations are there and predicates are there.

Bryan Cantrill:

And then also like anonymous tracing. I kind of forgot that we'd done that that early, but anonymous tracing is one of these things I think to this day, most people don't appreciate because, like, why would you? You know, who cares? And anonymous tracing is the ability to instrument during boot. And this is one of these things that, you know, most people don't care about because most people don't care.

Bryan Cantrill:

I don't know the system boots. Like, I I don't care about how it performs or what. But it's for us, it was really important to know, like, what's happening when the system boots. And it was brutal to instrument the system while it boots, and it was always by hand. And so the ability to have a t trace be to be able to to instrument the system while it booted, and then be able to go get get that data was really important.

Bryan Cantrill:

And I think in this deck, I even refer to a bug that we found of, like, you know, we we found, obviously, lots and lots and lots of issues by shining a bright light on. We're able to, you know, shave whatever it was, 8 seconds off of the boot time on some ridiculous little spark. Well, and then as you alluded to, we

Adam Leventhal:

were about to kick off a project that reimagined how boot happened. So it turned out to be incredibly useful for Yeah. For examining that.

Bryan Cantrill:

Totally. Where you would have these things where it's like, actually, you you could do a big elaborate project. Let's actually go get gather some data on the system. Like, oh, look at this. We can actually go fix this, like, three line bug.

Bryan Cantrill:

And, actually, turns out there's as much performance there as there is that you're in total rewrite. And I feel like that happened quite a bit. And the, and so there there's a lot that's in here. And then, oh, you know, the one thing about this presentation, I don't think it does a terrific job of delineating the things we had done versus the things that we were kind of envisioning to do.

Adam Leventhal:

Using the optative voice. Yeah.

Bryan Cantrill:

Using the optative voice. Exactly. The so there are some things here that we're definitely still in the future. Some things we never did at all, like basic block tracing. But this nonetheless was like a and I do.

Bryan Cantrill:

I think that it was I think other folks saw that, like, okay. Wow. This is this is good. This is important, and I you know, the purpose of that of of that, that that colonel technical discussion was really to, get folks excited about the work, so we could keep doing it. I mean, that's ultimately and this is not from, like, a budget perspective, but really more from a cultural perspective of, like, hey.

Bryan Cantrill:

This is, you know, we think this is important what someone else think. And, yeah. The reaction was real was really positive, which was great. And then, Adam, your reaction in particular was was really, really positive, and you were in particular thinking about, like, hey, but thinking about, like, how we would go instrument applications.

Adam Leventhal:

And

Bryan Cantrill:

Right. That was great. But Yeah. But be but before that happened so that was in March, we presented that. This is 2002.

Bryan Cantrill:

And I was going heli skiing in April of 2002. And and how much did I confide in you, Adam? My fear of death from that, that I Zero. Yeah. So the same.

Bryan Cantrill:

So I feel is if, like, Matt, if you remember this, or anyone else who's there at the time remembers this. And maybe I was just, like, keeping this. So I I've been an avid skier my whole life. I've I I'm an aggressive skier. I love to skate.

Bryan Cantrill:

And would be, you know, not really it it'd be unlike unlike me to get really, you know, nervous about skiing, and I had this kind of truly once in a lifetime opportunity, because I've done it exactly once to go house skiing in in Canada, and it's a reservation. It was one of these things. We made this reservation in 1999 when, like, it was, like, the height of the .com boom. It is now 2002. It's kinda like the depth of the bust and, like, definitely, economically, it made a lot less sense.

Bryan Cantrill:

You're like, why not? Like, I've got the money. Like

Adam Leventhal:

Right. Sunstock can only go in one direction.

Bryan Cantrill:

Sun and and as it turns out, Sunstock can only go in one direction. That part was true. I got the direction wrong. So, yeah, it made, like, no economic sense by the time we were actually doing it. So I kinda had the economic apprehension of, like, this does not make sense economically.

Bryan Cantrill:

And the, but we I'm, like, you know, kind of on brand. I'm beginning to kinda do my research on this, like, only a week before we're going or something like this. And I'm beginning to realize as I'm any of this is early days of the Internet, information is not totally available, but I'm beginning to realize, like, this is more dangerous than I realized. And in particular, the avalanche risk is really pretty real here. And, you know, we were running with a very like, the absolute best of the best outfit, Canadian mountain holiday outfit.

Bryan Cantrill:

But what I discovered is that this outfit had had this, like, mass casualty event, like, a couple years prior, and where something like 11 people had died. And Oh my goodness. Right. And you're just and, of course, like, the the the kind of person who organizes this trip is, like, it's fine. It's totally safe.

Bryan Cantrill:

I'm like, I just are you aware of this incident that happened, you know, a couple years ago where they they lost an entire party, basically? He's like, I was not aware of that. I'm like, yeah. You may wanna look into this this incident because they have, like, their most senior guide out there. I mean, it was just one of the the one of these things of highlights.

Bryan Cantrill:

You know, when you are in the high country in it's skiing terrain that has not been ski like, avalanche risk, you can know a lot about an avalanche and still get caught in an avalanche. So I'm like, I'm gonna die in an avalanche. And I am gonna die in an avalanche, and, Detrace is gonna be unfinished. Because I had you know, I was working with Mike on this, but Mike, you know, to his credit, Mike was not just working on DTrace. Mike was also working on FMA, default management architecture, was also working on SMF, the service man facility.

Bryan Cantrill:

So he's kinda, like, working on more or less three things at the same time. And, yeah, this one was asking the chat. Is this before airbags and rescue beacons? It is a 100% for airbags. And in fact, I ended up skiing in this thing called the Avalon, which at the time, I'm like, this is I'm gonna, like, mitigate my risk here.

Bryan Cantrill:

And the Avalon allowed you to the idea of the Avalon is that that when you die in an avalanche sorry. We're here. When you die in an avalanche, you die of asphyxiation because the you you create this ice layer right around your mouth. And if you could avoid creating that ice layer, you'd be able to breathe through the pack and the the snowpack. And the way to do that is to breathe in air through the pack and then exhale through your back.

Bryan Cantrill:

So the Avalon would be this thing that you would bite down on, and it would allow you to when you breathe in, it was a valve that sat on your chest. When you breathe in, you'd breathe in through this this giant, this kind of membrane on your chest. When you breathe out, it would go out your back. And I can tell you that the one I think these thing these things were found to, like, not really work that well because in an actual avalanche, this thing would get knocked out of your mouth, and you wouldn't be able to get it. So as it turns out, it wouldn't have done anything for me.

Bryan Cantrill:

I can tell you that, like, socially, it definitely had a cost because Did you

Josh Clulow:

look a little bit like Bane?

Bryan Cantrill:

Okay. So one, you look different. Also, it's a little bit awkward when you are taking a safety precaution that no one else in your party is taking. You know what I mean? Where everyone's like, what's that thing?

Bryan Cantrill:

I was like, oh, I don't wanna die. But okay. But you you it's good. Whatever. You'll be fine.

Bryan Cantrill:

I don't know. I mean, it's like you know what I mean? It was definitely a a little a little awkward, made for some awkward conversation. But I

Adam Leventhal:

And this did not dispel your fear of death despite breathing out your back?

Bryan Cantrill:

No. This is not no. I well, I was just like, I think that I'm doing this is a risky activity, and I think I might die. And if I die, because Mike has all these other priorities, Detreys won't be finished. And I remember actually sitting in our apartment because we also live together.

Bryan Cantrill:

I remember sitting in our apartment being, like, having this, like, heart to heart with him. I'm like, I think that, like, if I die, like, you've got these other priorities. You're not gonna get this thing done.

Adam Leventhal:

And Right. Right. Right. Right. But what about your stereo?

Bryan Cantrill:

No. No. Totally. I just like, the answer I got out of him was just, like, this is not I'm I yeah. I don't know, man.

Bryan Cantrill:

I like, I'm coming in with apprehensions, and you're not, like, talking me out of them. So I had this idea that I, unfortunately, do not have a photo. I wrote my last will and testament on the whiteboard at Sun. So I'm going Hell's game, and in case I die, here are the 6 things that need to be done to Detroit. And you do not do you remember this Adam?

Adam Leventhal:

No. No. So,

Bryan Cantrill:

either I just felt like

Josh Clulow:

really, really macabre version of can't conbound or whatever it is. I feel like, like the the death post it notes or whatever.

Bryan Cantrill:

Yeah. Well, I I felt like if I do die, I mean, god forbid, but this will now become like sacred. And I'm actually I this is just to tell you, like, how far around the band I was. I'm like, it will be erased by, like, the janitor's real staff will accidentally erase. So I, like, had, like, save, save, save written all over it so they couldn't actually erase it, and then wrote down my last one.

Bryan Cantrill:

And testament thinking, like, people will be like, no. We must honor his memory by implementing these I know, so you just wonder, like, would that have actually happened, or would they have been like, I'm gonna implement. That. That sounds terrible. I'm not doing that.

Bryan Cantrill:

I'm not doing that guy. Yeah. Like, it was fine. Like, I'm sad about his death. Okay?

Bryan Cantrill:

Like, like,

Adam Leventhal:

what are we gonna be haunted by the ghost of Brian for not, like, I don't know, implementing tracing the way he wanted?

Bryan Cantrill:

That you would have tracing? Like, really? That's right. Wait. Who actually needs this?

Bryan Cantrill:

Nah.

Adam Leventhal:

It's like, who needs this thing? Right?

Bryan Cantrill:

It's very esoteric. So, yeah, I had my last one test, but apparently, like, didn't really I mean, I I guess I guess you don't remember it. So I don't know. I guess it wasn't wasn't that effective. Are you man, the number of times you'd go into

Adam Leventhal:

your office and write some manifesto or another no. I'm

Bryan Cantrill:

just kidding. If I kept track of every last will and testament you wrote in your whiteboard, pal. I just think I I think the one's gonna be occupied office right now. It's like you

Josh Clulow:

you Lost last for now. Right? That's right.

Bryan Cantrill:

So I think that it was and then after that, Adam, that you and I kept talking, and in particular, because you were working on something in the public and architecture. I just remember, like, us sharing, like, a shuttle ride or something. You're kinda like, you know, I'm working on this, and all I can think about is Detroit's. Like, I that's actually what I wanna go work on. And I this is another one of these, like, interesting object lessons where you really should work on the things you wanna work on.

Bryan Cantrill:

And if you have something that, like, I can't stop thinking about this, you should go work on that thing. You know, whatever that is, even if it doesn't make sense.

Adam Leventhal:

Yeah. And and I was working on the fault management architecture, which, like, in theory, I was interested in, but just in practice, I could not get fired up for a lot of the the mechanics of what we're building. And in the meantime, you know, the summer before I had interned it, so I'm working on this thing that we're calling libdis, which was about, structured debug structured, disassembly, you know, focused on spark at the time, but trying to understand, you know, kind of a baby gidra of understanding how these programs were operating. And it just got me really strongly connected with this with, you know, machine code and assembly. And in particular, I started thinking about what user LAN tracing would look like and about, you know, this this split program counter, next program counter, and could we replace a particular instruction with a trap?

Adam Leventhal:

You know, the the trick that we were using in the kernel of branching to a new location wasn't gonna work in user land, but could we use a, you know, 32 bit trap instruction, get into the kernel, kinda move that instruction somewhere else, and resume in user land. I've been thinking about a lot of aspects of that. And, you know, true to what you're saying, you know, I kinda I whether it was on the shuttle or wandering by your office, kinda pitching you this thing, you're like, yeah. Why don't you do that? I don't know.

Adam Leventhal:

That sounds like way more interesting than what you're working out with fault management.

Bryan Cantrill:

Yo. Let's go do that. And then you, as I recall, like, Mike was out of town when that happens. I'm like, quick. I realized out of town.

Bryan Cantrill:

We're gonna it's coming from one of his products to different products. It's fine. It'll be they'll be fine. Yeah. But that was really important.

Bryan Cantrill:

So many different dimensions. 1, not I mean, obviously, you brought, like, this particular an aspect of the problem that would ended up being extremely important, use level tracing. But also just, like, it just brought so much energy to have I mean, it's really what we needed honestly. Is we needed a a third person. And it it so I had survived as it turns out, you know, TRBR.

Bryan Cantrill:

I I survived Teleski. And it was really important.

Adam Leventhal:

Thing I the first thing I worked on was, you what we've been talking about this, but, you and Mike had realized that understanding what where we were in user space turned out to be really important. So the first thing I worked on was uStack, and I swear I am not I was not slow walking into DTrace fish. But you and I that's when you and I independently came up with this maybe clever way of pulling out registers through register windows, which we've talked about in, I think, every episode. So I won't go into the details.

Bryan Cantrill:

I do. I feel we have. But but but not a bit yet and yet still somehow not enough. Yeah. It was.

Bryan Cantrill:

Yeah. Yeah.

Adam Leventhal:

But that was the first thing I did. And

Bryan Cantrill:

What's your second Interesting. Okay. Yeah.

Adam Leventhal:

Yeah. Because because, and that was a totally new lens because we thought, you know, we were looking at all of this kernel stuff. And then by then, I think we even had a Sysco provider. But then to tie that into where you were in user space, that was another, like, crazy moment of insight. Just, you know, a new lens onto all of these problems.

Bryan Cantrill:

A huge new lens. Very important. And then so I I was trying to remember that the origin of the name mister Sparkle. What does mister Sparkle become?

Adam Leventhal:

So, not surprisingly, we were we would communicate by and large through Simpson's metaphor.

Bryan Cantrill:

Something that's not.

Adam Leventhal:

That's right. And, the the pit provider had kinda 2 piece of it. 1 was the the raw instrumentation of, you know, when an instruction was supposed to execute, how we arranged for it to appear as though it had after firing the probe. But the other part was identifying the location of probes. And, and we've been talking about this provider as mister Fast Trap for a long time.

Adam Leventhal:

I'm not a 100% sure why. But then, what we did in user land is sort of disassembled all the functions to understand, you know, where function entry and return and then where the discrete instructions were. Something that, you know, we sort of got differently, for through the kernel linker for FPT. But for usually, we need to kind of pick our way through because we didn't necessarily have the same, preamble and suffix as we did for for all these functions in the kernel. So we started calling that thing misterSparkle.

Adam Leventhal:

And, actually, to this day, Brian, if you turn on debugging for DTrace, it will vomit out, messages prefixed with mister sparkle as it Mister sparkle lives. As it stumbles in particular, it would get really freaked out if it found what it thought was a jump table. Jump table is, you know, what would happen is you would have data effectively sitting within the range of a function's symbol. The data was used to inform where sort of a densely patched switch statement would redirect. And without this identification, misterfastrap and slash dtrace would identify a bunch of data as though it were instructions, and, that didn't go so well.

Adam Leventhal:

When you started replacing data with trap instructions, then it meant you would fly off into outer space with very little, you know, to help you figure out how you got there.

Bryan Cantrill:

And your application would die.

Adam Leventhal:

And your application would die spectacularly. Often, like, several instructions away, you know, maybe dozens of instructions away from where where the actual incident occurred, which made it even trickier to found figure out how you got there.

Bryan Cantrill:

And so alright. So this thing we we started calling mister Sparkle after the sort of, like, Simpsons app, the fish bulb, the

Adam Leventhal:

That's right. Right. Mister mister Sparkle being the confluence of, 2 Japanese industries that whose logo looked exactly like Homer. See notes for complete details.

Bryan Cantrill:

An episode that we love so much that we implicitly named Fishworks after it.

Adam Leventhal:

That's right.

Bryan Cantrill:

It really is a great episode. Actually, we had a coworker that, studied Japanese in that's his that. That joined? Yeah. Yeah.

Bryan Cantrill:

Yeah. Elijah and I, I had Elijah watch that episode too. I because I just wanted to, like, understand more about the the Japanese seems so plausible. Yeah. The Japanese is very good.

Bryan Cantrill:

He's like, it's very formal Japanese, but it's very good. This is really interesting. It's great. Nice. But so so miss and then and I remember another big milestone was when you could instrument every instruction in Firefox.

Bryan Cantrill:

Am I remembering that correctly?

Adam Leventhal:

Yeah. Yeah. Yeah. Is it so that there was a bunch of, you know, a bunch of failed attempts at that, which involved, like, driving off into a ditch. But things like how you deal with, asynchronous, signals being delivered while you're in the midst of, executing a, you know, one of these trace points.

Adam Leventhal:

So there's lots of corner cases to to consider along those lines. But yeah. Then we'd we'd go from, you know, Firefox that we downloaded off the shelf, you know, turned on millions of trace points. The thing would slow down a ton, but it was still usable and that it was pretty wild.

Bryan Cantrill:

Yeah. I mean, was

Josh Clulow:

it Firefox or was it Netscape at the time?

Bryan Cantrill:

It was Firefox. I I I am pretty sure it was I no. No. Excuse me. It was Firebird.

Bryan Cantrill:

Firebird. Right? I think this is before would that make sense, Josh?

Josh Clulow:

That was definitely a thing before Firefox.

Bryan Cantrill:

Yeah. So we actually, if you look at the features It was Mozilla. Right? It was Mozilla.

Adam Leventhal:

Was it called Mozilla at the time?

Bryan Cantrill:

I think it was Firebird. In the Detreus documentation, there's lots of example. I actually kinda deliberately did this in the when we were doing DTrace examples. I'm like, I want to capture, like, little time capsules in the DTrace documentation. So I did this over and over.

Bryan Cantrill:

I don't ever get this, like, my little Easter eggs, you know, little the I do where I would deliberately, like, capture the date or capture the applications that we're running. So I'm like, I wanna capture, like, the applications of the day, and I definitely, remember, like so there there's a lot of Firebird in there. Now that may have been Mozilla before it was Firebird, but the it was on Spark, importantly, because this is not on x86. I remember, Adam, when you were talking about mister Sparkle. And by the way, if you go into the source base, it it is great.

Bryan Cantrill:

Like, you there's a huge search for mister Sparkle There were, like, 6 different points that still have mister Sparkle in there. Oh, yeah. I I just I hope that that future civilizations I think choke on that one, AGI. Like, hey. Go go make sense to that one.

Bryan Cantrill:

You're so smart. You mister mister AGI rewriting us all. Like, what does that mean, mister Sparkle? Of course, it's listening to this episode right now, so I that's just getting away. I was just sticking.

Bryan Cantrill:

I should have thought about my service, the Lycoming, before I did that. The, but we I remember thinking vividly, like, this will never work on x86. Do you remember that? Like, we're like, okay. This is gonna work on Spark, but on x86, we're just like a hose.

Bryan Cantrill:

We're just gonna have to do something way different.

Adam Leventhal:

Yeah. Definitely.

Bryan Cantrill:

When we integrated DTrace, we did not have fast track we did not have, paper provider support for x86, I think. Right? That's right. And but

Adam Leventhal:

did you have FPT support, for x86 at the time? Because FPT, like, I don't think you could take the branch. You had to take a trap there as well.

Bryan Cantrill:

I did take a trap. I couldn't take it to to the branch, and I did that via I I wanted to avoid the fact that so with the trap instructions, 0xcc gets you like, the debugger gets confused. So to prevent the debugger from being confused, I used I generated I generated an illegal instruction with a lock prefix.

Adam Leventhal:

Oh, that's right.

Bryan Cantrill:

That's right. Questionable decision. But, so yeah. I've and I can't remember if we had that when we integrated or not. I think we had FBT on x86.

Adam Leventhal:

To be clear, f b I mean, x86 supported sort of, but, like, clearly a second class citizen at the time, which is Yes. Insane. And, you know, perpetually, like, being announced that it was being killed, but then we would rally to unkill it. So it it always it was in limbo for a very long time.

Bryan Cantrill:

Well and I think the great darkness for x86 is, I believe, like, January 2002 to October 2002 is the that that I believe is the era in which remember it was killed. Do you remember this?

Adam Leventhal:

Yeah. Yeah. We went to, like, a all hands for the engineering organization, and they're, like, x86, Solaris is dead. And that we all went back to building 17 and talked about how we'd keep it alive.

Bryan Cantrill:

That's right. And we're like, we are gonna keep it alive because this is so obviously the wrong decision. And we we will perish if we I mean, it was so I mean, it's just I mean, it was a very bad decision. Was a very bad decision. And the person who made that decision, you know who you are, and we know who you are.

Bryan Cantrill:

So we're just we'll just leave it at that. But it was a very bad decision that was that was ultimately revisited, importantly, and it was resurrected. But the operating system had only been resurrected. Again, I think it was I wanna say October of 2002 that had been resurrected. So we I I think maybe had FPT for x a 6, but definitely not the PID provider, when we actually integrate.

Bryan Cantrill:

But then I think you did the PID provider. I think that was, like, the the the next thing that we did was That's right.

Adam Leventhal:

So so for a long time so in my I had a workspace, like, a branch called Fast Trap minus x. Do you remember this brand? And, like, everybody's laptop was running Fast Trap minus x for a while.

Bryan Cantrill:

Fast Trap minus x. I love that workspace. Yeah. So It was like a workspace being the git branch of its day.

Adam Leventhal:

That's right.

Bryan Cantrill:

And, yeah, I'd forgot. Right. Fast track mind says oh, so another thing that we, you know, we we thought that was interesting was that the and I and I know, Madison, I'm not sure if he was a hop on stage to speak, but the one of the things that was really neat was before we had integrated, projects had decided that Detroit was so important to them, that they would actually be a child of the Detroit's gate. So they were I mean, and you can again do this in in team or parlance, but you we had effectively created a fork of the operating system that we were that that we were staying in sync with. And so people had decided that that instead of syncing up with the the operating system, they were gonna actually sync up with the DTrace gate because we were synced up with the operating system.

Bryan Cantrill:

So they and then they would get kinda Detroit's for free. And I'm I I know that SMF did that. I think CFS did that as well, but a couple of folks

Adam Leventhal:

I'm pretty sure zones did that too. Anyway, there were a bunch of folks backed up, and we we really needed to stick that landing before, you know, you died hella skiing.

Bryan Cantrill:

We needed to stick the landing. And we and then and, we had a very exciting integration that we talked about recently. So that was weird. Don't need to belabor that one. That was a very, but that was a very exciting day.

Bryan Cantrill:

And, ultimately, landed. And I when is a a debug, Adam? That must have been

Adam Leventhal:

That was before. That was that was that was earlier, I think. Right?

Bryan Cantrill:

Was that earlier in 2003? Because I had kind of to you're right. I think it must have been. So we That's right. You and I had gone to Ghent.

Bryan Cantrill:

That was in oh, that was right afterwards. That was right after we had no wonder we felt like we were up against we had a narrow window that was September 8th 9th, and of course it was.

Adam Leventhal:

Oh, that's right. I forgot. Right. That's right.

Bryan Cantrill:

Of course it was. We should have definitely remembered that. And so we should also acknowledge that today is 911.

Adam Leventhal:

It is.

Bryan Cantrill:

And the and, of course, because we were in Belgium, and this is only on the 2nd anniversary of 911. And the 1st anniversary for those of you kinda, like, were not alive or were not aware, 911 was awful, obviously. But it was like rip your heart out awful. I remember and, Adam, you had just joined Sun.

Adam Leventhal:

I just joined, you know, sort of a newly minted adult sort

Bryan Cantrill:

of.

Adam Leventhal:

And it was in you know, it it the whole world changed.

Bryan Cantrill:

It was just devastating. And it was like and, you know, many many people have have connections to New York. Some lost an employee on one of the planes, and it was just, like, devastating. And then I remember the next year being so angry that that was not a holiday, that we worked on night on September 11, 2002. I remember being, like, we should this should not be at workday.

Bryan Cantrill:

No one could work. It it it was just like, that should have been a national holiday. It should have been a national day of mourning. It was really awful. But it but by the time we hit, like, September 11th 2003, it was beginning to, like the kind of the the scar tissue is beginning to form.

Bryan Cantrill:

It wasn't like it it it didn't have that kind of recency to it. And you and I are in a debug, and, like, everyone is kinda feeling a little bit calmer. And then Osama Bin Laden, it's on September 10th. You and I are both traveling on September 11th.

Adam Leventhal:

That's right. Flying I think you're you're like flying to flying to back to DC or something.

Bryan Cantrill:

I am flying from Brussels to DC. I am flying from the head of the EU to the the the the head of America, on September 11, 2003.

Adam Leventhal:

Right.

Bryan Cantrill:

And, as it turns out, that was a pretty empty flight. I had booked that because it was, like, really easy to get a ticket without realizing, like, wow.

Adam Leventhal:

This makes so cheap.

Bryan Cantrill:

Almost speed of everything. I'm

Adam Leventhal:

saving them company so much money.

Bryan Cantrill:

I'm looking to company's quite funny. You're like, I'm like, no. Get I'm like thinking, like, I'm going back on Thursday. And I'm like, no. I'm going back on 911.

Bryan Cantrill:

I'm such an idiot. And then I was feeling someone apprehensive. And then on September 10th, if I recall correctly, you and I are in Belgium together, and which was I I I well, I mean, for was all for a lot of fun for a lot a bunch of different reasons. Get to kind of the detrace consequences of that trip in a second. But Osama bin Laden is like, I'm gonna do something even more terrible on this 911.

Bryan Cantrill:

Like, I'm gonna do something. I'm really gonna blow your brains on this. Like, this is gonna be really amazing on this. And I remember being like, shit. And I remember, Adam, do you remember your counterargument to this?

Adam Leventhal:

Yeah. I remember thinking that that would be bad marketing. Thinking if someone says 911, you know, it already means a thing. So then it just gets confusing.

Bryan Cantrill:

It'd be it is like, no. This is the one day you wouldn't do anything on. You'd be like, oh, that 911 was the most spectacular of all. Like, which 9 11 are you talking about? 1st 911 second.

Bryan Cantrill:

It's like it's going to be a marketing mess. And and this is gonna be like

Josh Clulow:

like Coke 0?

Adam Leventhal:

Is that

Bryan Cantrill:

It's exactly. It's gonna be Coke 0, Coke classic. This one dies in committee at Al Qaeda. And I remember feeling, like, total solace at that. I'm like, you're right.

Bryan Cantrill:

This is good. But the the kind of the the the detrace ramifications of that by the way, I was feeling a lot less confident was actually on the plane, which is, like, very clear this plane is, like, basically 3 quarters empty, and everyone else seems to be like an air marshal. So I I I'm not sure if I picked the right date of travel or not. But the, the Detroit notifications that I don't remember, like, we we, like, we have to write a Detroit's paper.

Adam Leventhal:

Do you

Bryan Cantrill:

remember us talking about that?

Adam Leventhal:

Yeah. Yeah. Yeah. Right after hanging with these academics. Yeah.

Bryan Cantrill:

Yeah. And it's and I I think we felt like it is very, very important that we write a detrace paper, and that became a top priority. I mean, it they there was technical work to do after the integration, but writing the paper that became the Usenix paper that we would present the next year. The I I I'm not sure when that must have been submitted in, like, November or some way, that of that year in order to be and, and we and I I feel very fortunate. I mean, it was, like, the acceptance rate was getting extremely low for USENIX, ATC, the annual technical conference.

Bryan Cantrill:

But that paper ended up being really important, and I'm really glad we did that. I'm I'm glad that we forced ourselves to do that. And I you know, another kind of big lesson of Detrace is it is really worth forcing yourself to write this stuff down. I don't think, like, I I don't think an academic conference is the right thing for most practitioner author stuff. But I think, you know, whether it's it's a blog entry, or an ACMQ article, or a getting that kind of written vessel is really, really important to describe work.

Adam Leventhal:

Totally agree.

Bryan Cantrill:

So then we, I mean, I I do kinda wanna move forward in time here. Yeah. We, so Detroit becomes we we believe it's a big deal, but that's in part because we're using it. Other people start using it. It's it's pretty clear that I that it's, it was a lot of fun to go demo, which is great.

Bryan Cantrill:

The when does the when do blogs.sun.com? When does that happen? Because I feel like that happens, like, almost around the same time, like, 2,004.

Adam Leventhal:

You know? Yeah. Or yeah, maybe, yeah, right right around that time because I'm not sure when we open sourced it. But I remember DTrace being the first thing I remember some of our early blog posts talking about. No.

Adam Leventhal:

No. No. It was maybe even yeah. It was 2,004 because we were talking about the launch of Slayers 10, open sourcing DTrace.

Bryan Cantrill:

That's right. It

Adam Leventhal:

was right around that time. You're right.

Bryan Cantrill:

Yeah. And so I think we did the the the and a huge credit to Tim Bray encouraging Sun Engineers to blog. And so we started blogging in 2004, and we were really being be able to be much more transparent about what we're doing, which was really important, I think, for us to be able to, like, really talk about Detre. And I I mean, I feel like Detre is DTray's was kinda, like, born on social media in that regard, albeit, primordially, because people were not relying on kind of official sun documentation for this. They were getting this kind of from us and us talking about what we were doing.

Bryan Cantrill:

And the blogs were really important for us, I think, for for Detrace. I mean, am I No. That's right.

Adam Leventhal:

It's both getting it out there and then, hearing people's use of it. I think that was a really important vessel.

Bryan Cantrill:

It was really important. And then, like, I I do love, like, going although it's a bit hard, admittedly, on ttrace.org, which has got a a down yeah. Yes. A down Rev WordPress installation that desperately needs some attention.

Adam Leventhal:

Like, our

Bryan Cantrill:

we have to fight

Adam Leventhal:

Like, WordPress at 20. Yes.

Bryan Cantrill:

What is WordPress? At 20.

Josh Clulow:

Why would you say that out loud? Oh

Adam Leventhal:

my god.

Bryan Cantrill:

I know. It's so stupid. I know. I know. Please, the, but the comments were really good.

Bryan Cantrill:

And so, you know, this is like, we're so old, Adam, that that we remember when comments and blogs were productive.

Adam Leventhal:

Yeah. You'd like respond to people on blogs and have like a whole discussion without any Nazis.

Bryan Cantrill:

Without any Nazis. It was amazing. Like, the biggest problem was, like, people who were just trying to, like, you know, sell you things as opposed to, like, actually, so those were definitely halcyon days, but, that ended up being really important. And then the and and that that's like we started to get, like, real community rather than as you say, we open sourced it in, in January of of 2005, and that was the, you know, the DTrace was the first thing out of the shoot. And we wanted in in in part because DTrace was clean from an IP perspective.

Bryan Cantrill:

We DTrace had been wholly developed by Sun. We were not waiting for anybody to relicense anything. So we needed something that we could open source to show that we were serious about it. And, because we wanted to

Adam Leventhal:

particular show that we weren't holding back the crown jewels. I think that was one of the things that, as we started talking about opening up Solaris, that people were convinced that we would give away, you know, the the stuff that the the 20 year old crap, but not, like, the newest hotness. And so we started with what we thought was some new hotness.

Bryan Cantrill:

Which was great. I mean, it was kind of an honor to be in that position, and it was really important to us to open source it. That's when we transitioned to to the Detroit's fish, and it's and it it it's shared shared ancestry. Oh, okay. The mirrored ancestry, but the it was really important for us to to get that out there and to get that open.

Bryan Cantrill:

And I remember the time being, like, another part of the reason I wanna get this open is I I want this technology to survive the company that it's in. And I I mean, which turned out to be, be, you know, a bit too on the nose, but I it was really important to me that, like, this become a contribution writ large. And that I in particular, I remember thinking, like, I do not wanna the thing I desperately wanted to avoid was some years later in my career pining for what we had built and not being able to use it. It would just feel like it would just feel so gutting, you know, to not be able to use this stuff. Because Detrice was so important to, the our our kinda everyday use.

Bryan Cantrill:

And, you know, Jason in in the chat, it's it says, you still remember the first issue that we use Detrice on. And you remember, Adam, we used to talk about this with people, like, you could see the change in disposition between people who'd like, yes, I've looked at DTrace, and it seems neat, versus, like, Detroit's just pulled me out of the fire.

Adam Leventhal:

And Okay. You you know what? I still I love that I still get to see that. And I and and we still get to see that with our colleagues at Oxide where, you know, we show it, they've seen it, whatever. But then you get that DM, and I'm sure you've gotten lots

Bryan Cantrill:

of them because I have.

Adam Leventhal:

It's like, hey. No. I actually used it, and holy smokes. Like, how would I have done it without it? How do other people live?

Adam Leventhal:

Yeah. I know. It's it's great.

Bryan Cantrill:

It is great. It is really, really, really exciting to be able to do. And it's exciting to kinda be with someone when they they realized, like, all these things are now possible. And, yeah, it's been, I mean, honestly, has been one of the the the the things that's been a singular source of kind of pride for us is that we can that this thing, this technology still is able to deliver this kind of delight to people. And, like, wow, I now like, there's so much I can go do.

Bryan Cantrill:

It's really great. And as you say, it's been I this is up until, like, present day because we've been able to do all sorts of things with it.

Adam Leventhal:

So so 2 things I

Bryan Cantrill:

wanna touch on. So we've we, I don't wanna pat throw out

Adam Leventhal:

my shoulder patting myself on the back on this, but we did get this Wall Street Journal award. And the reason I wanna bring it up is that we beat out inhalable insulin. We beat out inhalable insulin for the top price. Turns out inhalable insulin had a bunch of problems, whatever, but it meant for the first time, I could both explain it in terms that my folks would understand where they could say, oh, inhalable insulin. That sounds really fancy.

Adam Leventhal:

Can you beat that? And more importantly, they could brag to their friends. So that's the reason I wanna I wanna thank the Michael Toddy and the Wall Street Journal.

Bryan Cantrill:

Yeah. Totally. Yeah. It was great. It was funny because, yeah, we had all of a sudden this, kind of a spotlight that we didn't imagine.

Bryan Cantrill:

That we would be kind of featured in this way. We did beat out inhalable insulin, which really that was seemed to be the headline for many people. Like, many people like like, what do you think? Like, diabetes isn't important? Like, I didn't I know we're not denigrating inhalable insulin.

Bryan Cantrill:

I but I Adam, I did like I like that you point out, like, for the record, inhalable insulin had any That's right. Problem. And is that Where are you inhaled insulin now? That's right. You're not in the podcast doing inhalable insulin at 20, aren't you?

Bryan Cantrill:

No. That was to that one. Go learn and go find that podcast. That's a dead letter, inhalable insulin. Very important.

Josh Clulow:

Definitely, like, one step up the ladder at least from, like, a local boy makes good, like, local town newspaper photo that your mother puts on the fridge or whatever.

Adam Leventhal:

Exactly. It was.

Bryan Cantrill:

It was yeah. It it was I it and and huge kudos actually to the folks at Son. I think it cleared Trevano in particular had really reached out. It had it's a nominating process. You gotta go through for that, and they had done all that stuff, which is Yeah.

Bryan Cantrill:

Great and really appreciate that. And because, yeah, it was I it it was nice to get that kind of attention on the technology for sure, because that was in 2006. When was that?

Adam Leventhal:

2006. Yeah.

Bryan Cantrill:

Yeah. Yeah. Which is great.

Adam Leventhal:

Yeah. And the other one was, that I wanna talk about is is you mentioned lock start at the lock stat at the top about being one of these, kinda early dynamic instrumentation, but kind of statically scoped tools and that you converted then to to operate in DTrace. We had the same idea in usually. And we thought, you know, what's good for kernel locks would be great for usual land locks, which are a source of even crazier, weir pathologies. And I don't remember this, but we had I had built this implementation of p lock stat, usually in lock stat, that was terrible.

Adam Leventhal:

It was just awful with all these, like, implicit symbols strewn all over the place. And we looked at it, and we said, nah. Like, we can't ship this. And but that was the birth of USDT, of usually and statically defined tracing. Where I

Bryan Cantrill:

didn't realize that that that's kinda where that was coming from, from the the Yeah. Hey. The first of Ploxtap.

Adam Leventhal:

Yeah. So so that was the first USDT provider, was the Ploxtrot provider built into libc as a way of sort of decorating these, these locking primitives with, the information we'd need to instrument them. But then totally, like, caught by surprise, that turned out to be an incredibly powerful tool for dealing with dynamic languages, something that, we did not anticipate when thinking of that thing.

Bryan Cantrill:

Yeah. And so we should also talk about use stack helpers at some point. Yeah. And so so a bunch of that was done for

Adam Leventhal:

you know, with the Java folks, because we're at Sun. That's right. So, Wizard Sun, everything was everything was Java. So they sent the Java powers that be over to hang out with us. And we thought, how can we, do meaningful instrumentation for Java?

Adam Leventhal:

So half of that was a USDT provider, the hotspot provider for examining, you know, various aspects of Java execution. But the other one was, as Brian was saying, JST. So building a, like, a bunch of instrumentation into the binary itself, which would help DTrace in the kernel figure out how to assemble a stack trace. And that was pretty wild.

Bryan Cantrill:

It was absolutely wild. Because in particular, like, you the the JVM basically downloads a program that allows us to answer the question in situ, like, which say, in a context which you cannot block, you cannot execute use level code. You're not actually touring complaint, by the way. And we need to know for this symbol, what is the string that it corresponds to. And that was, man, writing a detrace helper is because it is not Turing complete, I mean, that is programming your calculator.

Bryan Cantrill:

It is super weird to go to And and then do you remember, like, the the the tracing framework I did for helper providers?

Adam Leventhal:

Oh, yeah. There's, like, this this this kernel switch you can turn on for it to, like, emit, like, failures, basically.

Bryan Cantrill:

Yeah. So you can, like, have some, like, some way of debugging these things because they're just brutal. Yeah. To and and I remember thinking, like, is this the simplest way? But I kinda came to I mean, I think we all kinda came to the conclusion.

Bryan Cantrill:

Like, this is enormously complicated, and yet it is still the simplest way to do this.

Josh Clulow:

You're basically running a program in a virtual machine that's an entirely safe context to examine the process memory. Right? I mean, that's like I don't know how you could make that simpler and also make it cover all the

Bryan Cantrill:

assets. Did you put program in air quotes as you were saying it, Josh, to express the No. I mean, it is

Josh Clulow:

a it's a program.

Bryan Cantrill:

It has no backwards branches. So it is not a program.

Josh Clulow:

But that's but that's why it that's why it's safe.

Bryan Cantrill:

You are getting a series of statements. Yes. It it is why it's safe. And I think I do think that, like so one thing that and I think we talked about this at the top, but one thing that is really important to understand about G Trace is safety and and is is at its core. Safety, production systems, and used pragmatically on debugging problems.

Bryan Cantrill:

And everything that we have done comes from that. And there as a result, like so some people have asked, like, I don't know. Like, what's the difference between this and eBPF? What's the difference between this and system tab? Or what's the difference between this and whatever?

Bryan Cantrill:

And it's like the when you you can go find all of these surface differences, of which there are many, but there's also just a a values difference at the core. And the core of G Trace is allowing you to safely understand what your system is doing. And everything that we do is we're not trying to augment the system. We're not try it is not a vector for delivering arbitrary arbitrary software in your kernel. And it's it's not designed to be, and it won't be because that would violate the safety principles of Detroit's.

Bryan Cantrill:

And And

Adam Leventhal:

and you can't understand this enough because, like, that safety belt built so much where you could bring this to the most critical customer system and know with confidence there was nothing you were gonna do that was gonna fuck things up. Totally. The the this meant that, you know, there wasn't nervousness around,

Bryan Cantrill:

you know, when you'd use it, where you'd use it, what kinds of problems would

Adam Leventhal:

be applicable for, what kinds of customer systems. It it bred this confidence. Now it also had associated limitations, but that confidence, like, far outweighed

Bryan Cantrill:

it.

Josh Clulow:

We would and we would deploy instrumentation to, you know, to thousands of machines at Joyant without without thinking thinking about it, really. When you're looking back, it was very little, concern given to the potential downside because there basically wasn't one. Like, I mean,

Bryan Cantrill:

you would,

Josh Clulow:

you know, even if you you would at worst get back unhelpful data or something. Like, you know, but it wouldn't no one no customers would be calling up saying, like, you just destroyed a 100 of my instances all at once.

Bryan Cantrill:

Like, that Well, it it yeah. And there are so DJI is safe by design. And so we are it's designed to be safe, and we did a bunch of things that are important in the implementation to make it safe. But even as safe it as it was in the implementation. And in fact, Adam May was with the u stack helpers.

Bryan Cantrill:

Do you remember the issue that Jared hit with the u stack helpers? So he was, I mean, Jared Jensen, one of the I mean, the earliest user of Detroit outside of Sun, was just going to town with Detrice. I mean, Detrice and Jared were a great fit for one another. And so he was deploying it wherever he could on any kind to to be able to get these wins on these kind of financial applications. And in particular, one of these things are in Java, so he's using JSTACK a lot.

Bryan Cantrill:

And I'm pretty sure Jared was the one who hit the, because a big use stack could take a long time to run. Like, 100 of microseconds pretty easily, even up to, like, a millisecond to run all this. Because you're going through a lot of I mean, for every single frame, you're gonna ask the question, like, what is this thing I need to string? And you start doing that for, you know, your 500 frames deep, and it's and 2003 at this point. And I remember Jared, like, ran this, and the system just became unresponsive.

Bryan Cantrill:

And that so, like, the system was up by some definition.

Adam Leventhal:

Technically. Right.

Bryan Cantrill:

And that was a big eye opener of, like, wow. We even as safe as we have designed this thing to be, simply doing no harm to the system is actually not enough. You actually have to keep the system alive. And we there's a bunch of liveness criteria, and the you can go on some scary vacations because the you you end up with an enabling that's got a really heavy probe effect. And, Patrick, I know you and I have both set the system on some very scary vacations.

Bryan Cantrill:

But it but importantly, like, Detrace will realize the system is not making poor progress. And Detrace's assumption is, if the system is not making poor progress, I deetre's him to blame. So I'm actually gonna kill the enabling, and I will let the user know this app that this enabling is dead. And that I think was a very important decision to be able to go do that, and to give us the assurance that at worst, you're gonna get this kind of black hole moment that you actually come back from, which is very, very important.

Josh Clulow:

Like handfuls of seconds, and then things come back.

Bryan Cantrill:

Yes. Have you hit the I I assume you

Josh Clulow:

The systemic unresponsiveness? Systemic unresponsiveness.

Bryan Cantrill:

Which I supported? Air message

Josh Clulow:

once or twice.

Bryan Cantrill:

Yes. I have saved you from death, or you have been saved from death, actually. Because you're, you've saved you I mean, you caught you, like I think you said that you were almost causing my death. Yeah. Well, I mean,

Josh Clulow:

I always call it. I mean, you were the only you can prevent system, like, unresponsiveness, I guess. But yeah.

Bryan Cantrill:

Yeah. But the, the JStack highlighted the beat for that because you need to be we just felt that you're running that on a profile provider, what have you. It was just very easy to, like, really burden the system, and it was very important that we had that kind of escape valve, to make sure that the system would would would because, Adam, I think, you know, our disposition was, rightly, we had zero opportunities to screw that up. Like, Detroit's takes out your system once, you're never gonna run it again when it matters.

Adam Leventhal:

That's right.

Bryan Cantrill:

You just do not have a do over on that one. And I think in that regard, it very much mirrors, Matt and Jeff's disposition towards CFS, where it's like, no. The number of opportunities you have for data corruption are actually 0. Like, you the the thing corrupts your data, and you're gonna be talking about it in Hacker News comments for the rest of your life. And, you know, with Detroit's did not the system did not toss, and with CFS, we did not have corrupt data.

Bryan Cantrill:

It was very important for both those things to operate from those constraints, on out. Everything we do was was abiding by those constraints.

Adam Leventhal:

Should we talk about the, post Sun DTrace journey?

Bryan Cantrill:

Yes. I think so. Because I I do wanna, So we did the, in in, 2008, we did the first dtrace.com. That was not post Sun, but that was kind of post, We had left the operating system group, and we are now in this, this group at Fishworks, yet another reference to the Simpsons episode. And we, I I did love we so we had a Detroit's conference, which sounds like, why would you have a Detroit?

Bryan Cantrill:

I mean, really? Enough people for a conference here? Sort of. Yeah. Yeah.

Bryan Cantrill:

Sort of. How many how many people showed

Josh Clulow:

up to the first one?

Bryan Cantrill:

Like, a 100? To the yeah. It's I mean, same as the second and the third, about a 100. Yeah. That's good.

Bryan Cantrill:

It's been about a 100 every time. Yeah. I'll take that. And Yeah. Especially because Steven O'Grady was there.

Bryan Cantrill:

It was very Steven O'Grady is like, I'll come out Like, alright. Great. And Steven was super skeptical. He's like, oh my god. What are these people?

Bryan Cantrill:

I use I think Steven and he'll be interested to know what he said now, but I think he's like, this is by far the most technical conference I've ever been at, where it's like the old kernel level software is viewed as, like, high level around here. And and we have, like, Steve Peters there, and the, from Apple who I love love love love Steve and talking about, in particular the way Mhmm. Do you remember he did you talk about how they'd used Detroit's at Apple on No.

Adam Leventhal:

I don't remember this.

Bryan Cantrill:

Oh my god. Okay. So, no, he had, like, a q and a, and they were talking about how they had found this really nasty performance problem using DTrace. And, you know, Steve is such a great technologist, and so earnest. It was really good.

Bryan Cantrill:

A bun those videos are kind of, like, out there a bit, but they're terrible quality. Have you seen any of those videos, Adam? No. They're they're they're bad. We did have, but we, we did play the ping pong variant of our own invention that is still that that is still a omnipresent theme in our lives, fish pong.

Bryan Cantrill:

We got brew got burritos. It was a lot of fun. And everyone was great. And we had John Burnell there John Burnell, who is since who actually died, suddenly, unfortunately, a couple years after that, but John Burnell had done the previous deport.

Adam Leventhal:

Remember that?

Bryan Cantrill:

Right. And I had done so that was a lot of fun. I really enjoyed getting everyone together. And it was there that Steven was like, yeah. This is great, but, I don't know, man.

Bryan Cantrill:

Like, what do you, you don't know? I was like, well, it's like, what are you gonna do next year? Like, I don't know. What do you what do you mean you get a next year? It's like, if you do, like, there's no way, like, it's it's not gonna be this good next year.

Bryan Cantrill:

It's gonna be it'll just be a downer next year. Like, you just you can't recreate this. Like, this is so incredible. You'll never be able to recreate it. I remember being like, can you just, like, let us have this guy?

Bryan Cantrill:

Can we

Adam Leventhal:

talk to someone? Please, Rick.

Bryan Cantrill:

Yeah. Can we just, like, enjoy this right now? Like, maybe we could, like, be a downer tomorrow. Like, today, this this is pretty great. But I really feel like, god, he's right.

Bryan Cantrill:

Right? And I we're like, okay. We need to not do dtrace.com every year. And in fact, we ended up doing it in Olympiad cadence. So did Digimistic.com in 2008, and then we did another one in 2012, and that was at the Children's History Museum, I think, in San Francisco, and then another one that one?

Bryan Cantrill:

You were that one. Yeah. Yeah. And then we did another one in 2016,

Adam Leventhal:

And

Bryan Cantrill:

I think wearing

Adam Leventhal:

that shirt right now. Yeah.

Bryan Cantrill:

I am also wearing that shirt right now. I was I Oh, yeah. The you know, I saw in our earlier meetings, but I'm like, you know what? I'm I'm I'm also gonna wear that shirt today.

Adam Leventhal:

There you go.

Bryan Cantrill:

And, that was a lot of fun. And I think

Adam Leventhal:

it'd be And then I remember when I when I joined Oxide, I said, you know, 2020, Olympiad year, we can start to sponsor it. But little did I know.

Bryan Cantrill:

Oh, did you know 2020 canceled. Right. And then we kind of, like, and, you know, then the pandemic wore on and, but we I Adam, I am here to pledge. Dtrace.com 2024, baby.

Adam Leventhal:

That's right. You heard it here first.

Bryan Cantrill:

Unplanned. We are gonna do a dfish.com for 2024. We're gonna do it ad oxide, and it's gonna be fun. That's what it's

Adam Leventhal:

gonna be. And there'll be fishpong. We we promise all of these things.

Bryan Cantrill:

There'll be fishpong. There'll be food. We'll have a t shirt. It's gonna be great. Yeah.

Bryan Cantrill:

How many oh god. We so 2024, we've got, okay, we we we We got some time.

Adam Leventhal:

We got a minute. Yeah. Yeah. We got, like, a year.

Bryan Cantrill:

We got a year. But we got we're gonna do it, and it's gonna be a lot of fun, and we're gonna get the band back together. Because there are a couple of things I wanna talk about that are are recent innovations. So do you wanna talk about Adam, this is a good segue to get to USDT on ROS? Because I think this is where you get to, like, our modern day.

Adam Leventhal:

Absolutely. I mean, I think that, it's been great using DTrace for all these years, but then you find yourself wanting it in these places where you don't necessarily have easy access. So one of the things it's kinda is it, like, 2 years ago now that that Ben Nacker and I worked on this?

Bryan Cantrill:

Or yeah. Maybe more, but Yeah. But we built this

Adam Leventhal:

USDT crate for Rust so that we could have statically defined probes within the the programs that we've been building at oxide, because we're, you know, mostly building Rust stuff. And it's been fantastic. And I did, however and and Josh, Patrick, I don't know if you guys seen this. I sent this to Brian earlier. In the earliest days of, like, Rust ideation, Graden Hore has this this bug that says, wouldn't it be great to have d trace probes built into the language?

Adam Leventhal:

Of course, that didn't come to be, you know, not not natively. But we've got this USDD crate and and, you know, are are using d trace probes in all of the components that we're building, and it's been incredibly useful.

Bryan Cantrill:

It has been, it's been huge for us. And the ability to add new probes to this thing, it is so and you and Ben did such a good job, Adam. It is so easy to add probes to this thing. And and you've

Adam Leventhal:

been using it a ton in, like, looking at storage performance, and I don't know, for for understanding the systems, it's been super useful.

Bryan Cantrill:

It's been really useful. And I think that we

Josh Clulow:

I think in in particular with Rust because it's the the stack the the symbols are mangled and the deep inlining often, like, you don't end up with frames for, like, real like, you don't end up with, frame pointer frames for for a lot of the the source visible frames. So, like, being able to put the USTT probes in with unmangled names

Bryan Cantrill:

Yeah.

Josh Clulow:

Is very is very helpful.

Adam Leventhal:

That's right.

Bryan Cantrill:

It is very helpful. And I think that that's gonna be I mean, there's a lot of work still to do, and I think a lot of it is around making it I mean, it is it is possible. In fact, actually, Adam, we just used this on this really gnarly data corruption problem that we had, that as it turns out was due to a well, a bug that we had compounded by a disagreement about what correct microprocessor behavior is. I think it's the most generous way to phrase it. But the, and really, in order to be able to nail that, use DTrace a lot.

Bryan Cantrill:

It was really fun with with Rain Rain Bahari here at Oxide, and we're using DTrace to debug it together. And I think she was like, wow. This is amazing with all the stuff you can do, but we really were using your ability to instrument an arbitrary instruction at user language. We've kind of forgotten about it because we so frequently use USAT. And then when we're not using USAT, we're kind of using funk the the the function entry and return.

Bryan Cantrill:

It's like, no. No. You actually can instrument any instruction in a in a process.

Adam Leventhal:

You can even instrument things that aren't instructions if you're not careful.

Bryan Cantrill:

Not including the instability to distribute drop drop tables. But we that was actually really clutch for us in that problem because Josh says, like, you do get, like, really rampant and aggressive inlining. And, you know, this is one of these things where it's, like, Detroit makes these things possible that were that are literally impossible. And then, I mean, it's like, once it's possible, you can make it easier to do, but, boy, the the that delta between impossible and possible is a big one as it turns out. Yeah.

Bryan Cantrill:

I would I I guess another, like, favorite feature that I feel does not get that was absolutely clutch for us at Joint. I kinda hope it's never clutch for us at Oxide because it's a little you you know that you're in deep when you need this is postmortem tracing, Adam. I'm not sure if you how how frequently or if you need

Adam Leventhal:

I've definitely used it, but I I it's it's been a minute probably back to the the fish work stays, or maybe some of the work, I was doing on ZFS after that. But, post warning tracing is pretty wild. It's like this flight data recorder, this customizable flight data recorder that you dump in and then let your your kernel execute until it crashes, and then you pluck it out and you can see the the events from that flight data recorder. It's very cool.

Bryan Cantrill:

Yeah. So this is using Coleco in Detroit State MDB, and it allows you to actually so you you use the ring buffer policy to, to to, actually just constantly roll over your buffer as opposed to kinda pulling them out to user land. And then you can, instrument a system. And so in particular, we had this nasty, nasty, nasty data corruption problem at at Joy on Knockingwood right now, because I wanna show adequate reference to the gods. Please do not punish me with another one of these where, we I mean, this was a preexisting bug in the operating system where, it would effectively steal pages from underneath you.

Bryan Cantrill:

And so, the a it would incorrectly consolidate pages effectively. It's

Josh Clulow:

OS OS 1028 as I recall.

Bryan Cantrill:

OS 1028. Yeah. OS 1028 was our internal tech. You know you know you can retain the ticket numbers for the rest of your life with the firmware revision or the OS revision or whatever it is that caused the misery. But, yes, OS 1028, and this was absolutely brutal.

Bryan Cantrill:

So, Adam, this is one of these things where it's like, you know, you're you're panicking because, like, the internal data is is corrupt, and you look at the corrupt data and you're like, this is a customer's Apache log. You're like, okay. Time to, like

Adam Leventhal:

Where did you get that?

Bryan Cantrill:

Oh, yeah. Be right back. I'm gonna throw myself in the traffic. Like, I am actually, like and this time, I'm not writing anything on my whiteboard. I'm I'm actually just kidding.

Bryan Cantrill:

I I think it's just for the best that I just, push myself onto the ice flow. I need to be done. Super scary. And we, we use that to be able to to and we get it, which is kinda happening randomly effectively. And being able to just bifurcate the search space by instrumenting the kernel and then leaving this instrumentation running, and then be it it is actually great when you have this kind of instrumentation running out there.

Bryan Cantrill:

In in our case, Detroit public cloud, When a system would roll on it, it was good news, not bad news. It's like, alright. We're gonna get, like, more data about this. And, ultimately, that is what we we absolutely needed that to crack that case, and it was chilling. And so that that's that's one of those features, I think, Adam, that is just, like, not well known and not something you're gonna find in any of the system.

Bryan Cantrill:

Probably doesn't matter to that many people, probably doesn't matter that frequently, but when you need it, you absolutely need it. Yeah.

Adam Leventhal:

No. Totally agree.

Bryan Cantrill:

I kinda put speculative pricing also in that bucket, by the way. I love Yeah.

Adam Leventhal:

For sure. So there's

Bryan Cantrill:

Has there was

Josh Clulow:

anyone has anyone ever really used that?

Adam Leventhal:

That's that's become a meme for like the last 15 years, but, the feature that Brian wrote to so you can record data into a speculative buffer and decide later on whether that was an interesting sequence events and you wanna trace it out or a boring sequence of events and you wanna toss it. So when you're looking at some very low probability kind of pathology, it lets you not have to postmortem or after the fact, diff through tons and tons of data, but rather get the concise answer, like, right away.

Bryan Cantrill:

And, Adam, I'll tell you, I still use speculative tracing for, debugging, performance outliers, latency outliers. Yeah. Yeah. So Yeah. Yeah.

Bryan Cantrill:

Where you have, like, I wanna actually, like I'm gonna instrument the beginning of this operation and the end of this operation, and I'm gonna instrument some things that I'm very suspicious of in the middle of the operation that may be inducing these p 90 nines. And then if our latency is longer than this, I wanna commit that buffer and see if we're hitting this. And it can be very useful to kinda quickly explore one of these hypotheses. Yeah. So, yeah, do you what are some other Adam, do you have any others?

Adam Leventhal:

Well, so, I mean,

Bryan Cantrill:

there's first, I'd say, like,

Adam Leventhal:

I think we've been talking a bunch about, you know, you and me and Mike and some of the others, but, you know, there's, it's been ported to we mentioned macOS. I've seen a demo in the back of a dark auditorium on the iPhone. It's on the PlayStation Portable. It's on Linux. There's a Windows port.

Adam Leventhal:

I don't know if you've played with that at all, but, I got excited about the Windows port and kicked the tires on that. And then, there are tons of people like like Josh and Patrick who are who were, on earlier or Patrick who was on earlier. Joshua is still with us. Robert Mustaki, Dave Pacheco, who have built, like, tons onto Detreus, extended the capabilities. Patrick's has fixed a bunch of critical issues.

Adam Leventhal:

So it's it's taken a whole village. Yeah.

Bryan Cantrill:

It has. The, Josh, in particular, you added the the JSON action, which I thought was really you would do you wanna talk about that a second?

Josh Clulow:

Yeah. We had, so we had an early, structured logging thing, Bunyan, back in the node era at Giant. And we we, we would emit log records that could contain quite a lot of, JSON properties, some of them nested. And, and then we had produced a USDT provider where we would emit all of the log records even the trace and the debug ones. If if the probe was enabled, we would emit those, log records as probes.

Josh Clulow:

And the one of the parameters to the probe was the object that contained all of the structured properties from the log and the message and stuff. And we want it to be able to pick out and in order to do an aggregation in DTrace,

Bryan Cantrill:

you have

Josh Clulow:

to have the the string or the number or whatever it is like available in in the probe context. So you need to be able to pick it apart in the kernel before it eventually makes its way out to use the land for post processing. So I had added a subroutine like in the d language to do that with like a CSS selector style or like miniature JSONX path sort of thing where you could name a particular property in a JSON blob string that we pulled out of a probe and it would pick out just that value. And so we could we could pick apart a bit, like, a, you know, 2 kilowatt JSON object and lift out, you know, the the 8 characters or something that we were interested in.

Bryan Cantrill:

Well, and in particular, it means that you can actually, put that in a product kit.

Josh Clulow:

Right. Yes. And and aggregate on it. And and, I I think I also added at the time there was no like string to number routine because we wanted to pick out we wanted to be able to take numbers that were really strings and turn them into something that we could, like, quantize, which was was handy as well. I think they both went in together.

Bryan Cantrill:

Yeah. And, some of the chat is asking about the, the Oracle port of Linux, which actually is is, you know, I know it's off brand here for me to say anything even vaguely possible with Oracle, but this is actually the Oracle port is actually good, Adam. I think Yeah.

Adam Leventhal:

It it got it got better. I I I don't remember this, but I I I tested out I kicked the tires on the first version. I I I sort of a history of this, but kicked the tires on the first version, and I was tracing a bunch of stuff, and then I couldn't SSH to the box while the tracing was enabled. So but it is definitely improved over time. But but same thing with, you know, Apple had some problems early where, you know, Apple, like, didn't want it didn't want you to be able to detrace things like iTunes, you know, because of DRM stuff.

Adam Leventhal:

So a lot of the ports have had some, some, you know, foibles,

Bryan Cantrill:

but they they've got ironed out.

Josh Clulow:

Feel like you had a customer early on that did not want their software to be visible to date trace.

Bryan Cantrill:

Yes. We did. I I did not deal with that well. I I dealt with that. I I think everyone after the fact can agree that Brian dealt with that situation poorly.

Bryan Cantrill:

I was surprised. I was surprised it was not Veritas. It was Reuters, actually. And if you are a Reuters customer, you may be like, I knew it. I knew it.

Bryan Cantrill:

Reuters was not Reuters does not, Reuters makes financial services software. Oh, Reuters does a lot of things. And this is at least back. I'm not sure. I'm sure they've divested themselves with this business, but I'm actually maybe not.

Bryan Cantrill:

But, yes, Reuters insisted that the we not that we must disable DTrace for their application, or they would not certify on Solaris. And this was immediately after, by the way, us doing a bake off of Solaris versus Linux, which was actually spark versus x86, and then they're really add insult to entry as it turns out it was unoptimized on Spark and optimized on x86. Like, we really cannot win this one. We can't win even spark versus x86. We definitely cannot win this if, like, this is unoptimized.

Bryan Cantrill:

Anyway, it was bad. Like, no. No. We can't recompile it. That's changing the rules.

Bryan Cantrill:

Sorry. We I'm like, oh god. And then we of course, we were they they we're like and by the way, this then we showed them Detrice, and they were upset about Detrice, which was the first time that that had ever happened. I think, Adam, I don't think that I'd Certainly first I knew of.

Adam Leventhal:

Yeah. No. That was that was early days of DTrace and early days of folks getting upset. I think there was some upset that I hear about from time to time at Apple, about, you know, it being too, transparent, perhaps. But they've kept it in which I'm, I'm delighted and a little bit surprised.

Adam Leventhal:

That it's stuck for as long as

Josh Clulow:

it has. Feel like you have

Bryan Cantrill:

to turn on

Adam Leventhal:

Yeah. Yeah. The If it disable safety is

Josh Clulow:

If it's disable safety. That's right. That's the one.

Bryan Cantrill:

Yeah. Yeah. Right. Which, and then the other out of it, we had, Detroit's got definitely got some attention from other folks that either wanted to partner with us. And there's this IBM meeting that I feel we should get on the record.

Bryan Cantrill:

This is a luxury IBM meeting that I was not in.

Adam Leventhal:

Yeah. I mean, the the this was, you know, I think it was like the the IBM tools group. Like rational, is it was that an IBM product anyway?

Bryan Cantrill:

Rose. This is like the Rose action.

Adam Leventhal:

Yeah. And so so some of these folks come to building 17, you know, now now Facebook met up, but then Sun, And want to you

Bryan Cantrill:

you you ever been in

Adam Leventhal:

these meetings where you're not really sure what the outcome that anyone has in mind was? So anyway, we're we were there listening. And the the only the most memorable thing about this meeting was a guy from IBM, you know, they they rolled about 6 or 70, but a guy from IBM, as, our colleague Mike was presenting, slowly and obviously, he's falling asleep and snoring. And and Mike and me And absolutely snoring. Oh, yeah.

Adam Leventhal:

Like, loudly snoring to and then Mike and me doing everything we could not to make eye contact with each other knowing that we would come completely unglued if we did. And then a the program manager in charge on the IBM side, you know, interrupting and saying, hey, Tim, could you, wake up John, please? And John wake him up.

Bryan Cantrill:

Like, oh, god. It always falls to me to wake these guys always startle when I wake him up.

Adam Leventhal:

Get him up, folk. He startles, and I swear to God wakes up and shouts Solaris 9. And Mike and I could not get out of that room fast enough and just had giggle fits for about 30 minutes after that.

Bryan Cantrill:

So and, unfortunately, I was on and this this meeting only exists in my mind in your retelling, but it's very vivid. I I I I really feel like I was in the room. I just I I just love the the the the Solaris 9 being

Adam Leventhal:

Oh my

Bryan Cantrill:

gosh. I

Adam Leventhal:

don't know what dream he was having. What beautiful dream about the completeness of the SVR 4 vision. But

Bryan Cantrill:

Well, I'm honestly talker too, so I get it. It's where you definitely, like don't don't let me go, like, if I if I drift off in a meeting, just just feel me straight. Would you mind? And just let that point. Yeah.

Bryan Cantrill:

Do us all a favor. Yeah. We we we that'll be one of the detrace at 40. The detrace at 70. Well and so then and I just wanna I know we've gone super long here, and I wanna be mindful time and ask your time, especially the any does like, things that we got especially right or especially wrong?

Bryan Cantrill:

I don't know that we actually You know what I'm saying? Yeah. You saw

Adam Leventhal:

I I mean, on especially wrong, I don't think especially wrong, but, you know, there there were some CVEs here and there. But I I do think you, either die the hero or live long enough to see the CVEs. So

Bryan Cantrill:

I'm grateful for those CVEs, actually. I those CVEs were extremely helpful for me because they were eye opening with respect to c and the integer on safety of c.

Adam Leventhal:

That's right. That's right.

Josh Clulow:

Really appreciate it. Ben Ben, what what was his surname? Ben, the the guy that Murphy? Found Murphy. Ben Murphy.

Josh Clulow:

That's right. Yeah. Found the eight integer things or whatever at the time. He was he was, he was fun to work with.

Bryan Cantrill:

He was. I think he came to detrace.com 2016. And he I think we had him actually. I think

Josh Clulow:

so. Yeah.

Bryan Cantrill:

And it it we I think I I didn't really appreciate how much of that vulnerability finding those kind of vulnerabilities is a lot more perspiration. It was not like it wasn't doing he was just, like, banging on the thing over and over and over again in weird ways.

Josh Clulow:

That we asked him a question about some aspects of like, it seemed like it would be obvious that he would have understood this fully in order to be able to find the problems that he found. He's like, the what now? I

Bryan Cantrill:

don't know. I don't know. I don't know what you're talking about.

Josh Clulow:

I just found the bugs. I don't I don't know how it works. It's like, that's amazing.

Bryan Cantrill:

It was amazing.

Josh Clulow:

CVE locating Savant. Like yep.

Adam Leventhal:

So there's a bunch of work that we didn't do that I'm grateful of. Like, we didn't kind of create a JIT, execution virtual machine for decode. There's a bunch of stuff that, like, seemed like obvious next steps that we didn't take, out of skepticism or laziness or waiting to see if it was an actual problem. I I'm grateful for a lot of those, but I think mostly USDT turned out to be an extremely happy accident where, you know, solving plot stat turned out to be the gateway to let all languages participate in this statically defined tracing. I I feel like we that was just total happenstance and very fortunate.

Bryan Cantrill:

Also, is enabled probes. We're a huge breakthrough.

Adam Leventhal:

Oh, yeah. Is enabled probes. That this is one where, we generated some code to say, well, if the probe's enabled, then slap down a one. Otherwise, leave it as a 0. So kind of very hacky, but allowed us to do tons of stuff where otherwise prohibitively expensive operations.

Adam Leventhal:

You know, just couldn't we we wouldn't, design the tracing for it, but then allowed it to pull all of that out of lines. That that was very fortunate as well. Agreed, Brian.

Bryan Cantrill:

But how how about for you?

Adam Leventhal:

Any any any missteps or any any ones that you feel like we really nailed?

Bryan Cantrill:

Well, so I think we I mean, is enabled probes, I definitely you know, I've always told people, like, you know that you back when we were wanting to patent ideas, and I think that, like, we're kind of past software patents completely. But I always tell people that, like, if you can't remember where you were when you thought of it or when someone told you about it, it's not patentable because it's not it it was obvious effectively to your doppelganger and is enabled probes. We're not that like, I know exactly where I was when you and I were talking on

Adam Leventhal:

You know

Bryan Cantrill:

what I'm right?

Adam Leventhal:

Yeah. You know where my regret on those is that the name sucks, and I have never thought of a better name for that thing. And

Bryan Cantrill:

I think that I

Josh Clulow:

I feel like that means it doesn't suck. Like

Bryan Cantrill:

Yeah. I feel like society is place.

Josh Clulow:

20 20 years and you haven't been able to find a better word. I think it's probably a good word.

Adam Leventhal:

Fine. Nailed it.

Bryan Cantrill:

Yeah. And so no. I think his nail products are amazing. And I, again, I I think that was a great I and, again, I Adam, I remember the the phone conversation in which that one came out. I think it was, like, that was I I that was I feel like that was both of us.

Bryan Cantrill:

Is that correct? I'm I mean, I don't know. Am I am I overstating or is it

Adam Leventhal:

is it give you No. No. No. I I think if I I'm gonna take credit for USDT, and I remember where I was, which was the People's Republic of China. But, I'll I'll give you, at at, the 2,003, networking guidance.

Adam Leventhal:

97th, like, the subnetwork or whatever.

Bryan Cantrill:

When we were sharing a hotel room, strangely, not to make it weird.

Adam Leventhal:

Yeah. No. I mean, sure.

Bryan Cantrill:

Out of our own sense of, like, beautiful thriftiness to our corporate overlords, we were sharing a hotel room. And I just remember you you had a I I got, I remember your line when there was like, it was just like Shanghai, 2003 Shanghai is just exploding. And, you know, they had, they had, for a period of time, had the tallest building in the world. And IU had that great line. I was like, man, it looks like they were they had a proposal and they had 3 different architecture firms have 3 different skyscraper proposals, and they're like, you know what?

Bryan Cantrill:

Just build all of them.

Adam Leventhal:

Yeah. That's what it felt like. The time every direction you looked was like construction cranes everywhere as far as the eye can see. Totally.

Bryan Cantrill:

Yeah. The no. So I remember the but I also remember when we were grappling with a really inside baseball D choice problem. So one of the things I did that I that we did that I thought was really interesting that the world does not care about, is this idea of interface stability on probes.

Adam Leventhal:

That's right. That's right.

Bryan Cantrill:

And We really thought

Adam Leventhal:

that was gonna be a big deal in part because at

Bryan Cantrill:

Sun big deal. Goddamn it.

Adam Leventhal:

It it wasn't huge

Bryan Cantrill:

deal. Wrong.

Adam Leventhal:

No. It's at sun because, you know, we were running binaries that had been compiled, you know, 85 years ago for the Jacobian Loom or whatever. It was really important to us to to define interface interface stability. So, and and you're right that, like and and we so we made it so that your programs could, articulate their associated stability. You know, fast FPT probes were lower lower stability.

Adam Leventhal:

Syscallpros were were more stable, that kind of thing.

Bryan Cantrill:

Nope. It's worth I feel like

Josh Clulow:

this is still actually important today.

Bryan Cantrill:

I know that we don't It's important.

Josh Clulow:

But, like, it's important to, you know, 8 people or whatever. But, like, but it is,

Bryan Cantrill:

it is important. People. 6 of whom are here.

Josh Clulow:

Well, like,

Bryan Cantrill:

something that I think are trying to deal with their audio right now. Something I think we need

Josh Clulow:

more of, right, is, library interfaces into DTrace for for to to build tools like lock stat.

Adam Leventhal:

Totally.

Josh Clulow:

Like but but rather than have to, like, post process quite so much text to fish things back out. Right? Like, we we could have more typed programmatic access to the the swapped out buffers or whatever. And, I I feel like and I feel like that that at that point, the the stability stuff would become extremely critical to to, like, those programs that are they're trying to run against

Bryan Cantrill:

Totally.

Josh Clulow:

A set of probes and providers and stuff.

Bryan Cantrill:

So a couple of things on the interface stability. 1 is so, Adam, just, like, leading up to something we did right that you may actually I'm I'm maybe we're even maybe you're forgetting about this, that we were grappling with how to deal with stability on a probe by probe basis.

Adam Leventhal:

I do remember this. Yeah.

Bryan Cantrill:

And it was just thorny. And It was

Adam Leventhal:

thorny, and there were, like, literally, I mean, this was before PID provider when there were like bajillions of probes. But there were already like tens of thousands of probes that felt Yeah. Extremely unwieldy.

Bryan Cantrill:

And your ops your question slash observation was maybe we should force the interface stability to be associated with the provider, not the probe.

Adam Leventhal:

See, I associated that moment with the moment that you and Mike both looked at me and thought, you know what? This kid isn't useless.

Bryan Cantrill:

This kid might have You

Adam Leventhal:

know, I I sort of asked

Bryan Cantrill:

what I thought was I

Adam Leventhal:

thought was a sort of, I don't know, maybe this is a good question. And you guys just riffed on it. And I I I think I I barely kept up with where the conversation went from there.

Bryan Cantrill:

But I think I've said something important and valuable, but I'm not really

Adam Leventhal:

Right. Based on the thought

Bryan Cantrill:

I'm gonna go

Adam Leventhal:

back to sleep. Celera's night.

Bryan Cantrill:

No. That was a super important observation because then it was like, oh, yes. That's how we can go do this. I definitely remember that. They're like, this is, like, and I also remember, like, we had a couple things this where it's like, alright.

Bryan Cantrill:

We're gonna do this for now, and then we may we'll we'll have to go revisit this later. But then we got a bunch of things we found that we never revisit, like the jetting. I mean, we never revisited because we didn't really have to. So that was very important. And then I also I think it also has to be said that the way we went to the architectural committee architectural review committee, that's all.

Bryan Cantrill:

I think we we go we go and I know we're long here, but we've gotta regale with some quick PSARC tales. This is this is a so PSARC was the architectural review committee inside of Sun. And this is where you would you would kinda sit behind, you you know, this kinda council of elders and present the work that that you are doing. And piece there were valuable things to PSARC, but PSARC also, it was comprised of folks that whose job was piece arc, which was a real problem. And this is one of the things I I don't know what your take is on this, but I think, like, when you have people whose whose job whose only job is the review of other software, it's really tough to stay, I think, current and relevant, and it's easy to lose that.

Adam Leventhal:

Totally. I mean, and and then the incentive becomes the only way you add value is by extracting your pound of flesh. Otherwise, you're just a red you're just rubber stamping everything.

Bryan Cantrill:

Right. And so we did there was this issue of, like, how are you all gonna deal with with PeaceArc? You've got this, like, large thing that you've clearly thought a lot about. And this is and I think that this is really due to Mike. This was kind of a master stroke of getting to Zurich.

Bryan Cantrill:

Like, okay. What we're gonna do is we're gonna split this into multiple cases. And we are going to have the DTrace case is going to be all of DTrace except for the providers. And because what we predicted rightly was that everybody was gonna want to really talk about how the system was instrumented, and not the actual what we felt was actually, in many ways, the more important thing from an interface perspective, which is this broader system that's consuming the, the actual tracing information. So we split this, and we had the all of the the the providers in a, in a separate case.

Bryan Cantrill:

And, I mean, Adam, do you remember this? Like, this this strategy worked to perfection because what we knew is, like, ultimately, you wanna run out the shot clock with these arc. That's ultimately what you're trying to do. And because, ultimately, if you run out the shot clock, they're gonna approve you. So we get to this, like, we and do you remember bringing the Detroit's documentation into piece art?

Bryan Cantrill:

Were you, like

Adam Leventhal:

No. I don't.

Bryan Cantrill:

Oh, god. It was so great. It was it was, like I feel it's, like, one of these moments from, like, like, Erin Brockovich or whatever. Like, you know what I mean? We're like they're kinda coming in with, like, this stack of depositions or whatever.

Bryan Cantrill:

So we're brought like, here's our piece, our case. Drop this box filled with manuals on the that we'd written on the table. We're dispensing these manuals that have all where we've documented this entire facility that we have done all without piece of preapproval. And sure enough, in the discussions, all they wanna talk about are the providers. And what we would do just brilliantly I remember, like, letting people kinda pontificate for a while and letting them expand on this and be like, okay.

Bryan Cantrill:

Well, that's very interesting, but that's actually not in this case. That's in the next case. And I now see that you've used 3 minutes and 13 seconds on that. Next question. And they, like, they kept doing this.

Bryan Cantrill:

And, of course, it was like, ultimately, there was too much for them to meaningfully understand, and none of them were actually using features themselves. They didn't really appreciate it. And then we we get to and they approve it, which is just and, also, we also were encouraging them when they wanted to waste time. And do you remember, like, they wanted to talk about the definition of standard c? Do you remember this?

Adam Leventhal:

Yes. I do. But in in particular, you mean, like, the the header file definition of standard c? Because, like, we were sort of, like, in header files in some cases. So they I mean, the other brilliant move that Mike did on this case and many others was to ask a question whose answer we did not give a shit about.

Adam Leventhal:

Yes. To give them an opportunity to bike shed, you know, ad nauseam, and then give us an answer that either was obvious on its face and we knew how it was gonna go or it's like, fine. I don't I don't care.

Bryan Cantrill:

Would you you tell me? You would think that they would see through this, but, no. As it turns out, this is just such an easy kind of con about like, if you take somebody who's, like, has got some predilection of bike shedding and you put a bike shed able question in front of them, like, it's they're basically gonna take the bait, and they basically did. And they spent the entire time. I remember thinking, like, do you not understand what we built at all?

Bryan Cantrill:

And he's like, let let them cook. Like, let them they're going because as soon as, like, they start feuding among themselves, like, this is what we need. We this is the way you go, like, from taking a minute off the shot clock to taking, like, you know, 35 minutes off the shot clock. Because by having these these, them go at it. So they, they pass the Detroit's case without so much as a, like, sentence of discussion on this, like, massive stuff.

Bryan Cantrill:

And in particular, we had made interface stability, kinda piece of interface stability. We had kinda elevated in this case in a way that we thought was, like, gonna be really controversial, but they no one had paid any attention to it. So then we get to the 1st provider case, and we get the 1st provider case, and now they've kind of exhausted themselves talking about the way the system is instrumented when they weren't supposed to be talking about it. And in particular and remember all those provider cases would have the matrix of interface stability for the provider. And they're, like, what the hell is this?

Bryan Cantrill:

Like, that is the interface stability of the provider. It's, like, well, we're okay. We we we're not you can't, like, programmatically define interface stability. It's like, actually, if you go back to the case that you

Adam Leventhal:

You already have.

Bryan Cantrill:

You already have. And there was definitely a possibility.

Josh Clulow:

So you can't you can't take it back.

Bryan Cantrill:

It was stable beginning back. It really was. It was just like it did feel like we we had defeated Pixar, and we had

Josh Clulow:

we had dominoes fell like a house of cards.

Bryan Cantrill:

Exactly. It felt like a kind of a kaiser's to say moment. Anyway, it was that that was, I'd sorry, Adam. I just I I had to Oh, that's great. I I think that the interface stability per per provider was great.

Bryan Cantrill:

I think interface stability is, as Josh kind of alluding to, one of these things that's really important that people don't understand. Anonymous tracing has been huge. I think postmortem tracing is talking of trace I mean, and I and I just I love the fact that this is still something that we're, you know, we're we're still using, and it's still we're still using it, still evolving, still bringing in new folks, still, lighting up parts of the system that were previously unseen. And, boy, how lucky are we to have to be to been able to participate in something that that has that kind of staying power, is is really terrific. Right?

Adam Leventhal:

I mean, I think when when you when we were working on it, before we had shipped to any customers, I really thought this was gonna be something that really was relevant only within those four walls that, you know, we were gonna use in the kernel group. Maybe some folks at Sun were gonna use it, but and and that was gonna be worth it. Like, it that was gonna be an unquestionable value. But the fact that the customers, you know, got into it, that it's been ported, didn't see it coming. And and as you say, incredibly grateful that all those things happened.

Bryan Cantrill:

And grateful that honestly, I I also have to say grateful to Sun, and honestly, Jonathan Schwartz for open sourcing it. I think it we if it had gone down with the ship, we wouldn't be using it today. That's what it boils down to. So Absolutely. Grateful for that.

Bryan Cantrill:

And I just I I grateful that we're here. Detroit's a 20, man. I'd it it's Who

Adam Leventhal:

would have thought it?

Bryan Cantrill:

Yeah. Who would have thought it and still using it still in daily use? So Yeah. Josh, thank you for joining us.

Josh Clulow:

You're most welcome.

Bryan Cantrill:

For, that that Hungry Jacks was not just a fever dream that it actually happened. Yeah. And, thank you all for I I, again, really appreciate everyone and and the community, and and look forward to an awesome detrace.com for 2024. It's gonna be great. Alright, Adam.

Bryan Cantrill:

Thank you very much. I'm glad we did this. And I thank you all for joining us, and we will, we'll see you next time. As as Janet says in the chat, here's here's to 20 more. Thanks, everybody.

Adam Leventhal:

Thanks.