Oxide and Friends

Lots of engineering decisions get made on vibes. Popularity, anecdotes—they can lead to expedient decisions rather than rigorous ones. At Oxide, our choice to go with CockroachDB was hardly hasty! Dave Pacheco joins Bryan and Adam to talk about why we choose CRDB… and how Cockroach Lab’s recent switch to a proprietary license impacts that.

In addition to Bryan Cantrill and Adam Leventhal, our special guest was Dave Pacheco.

Some of the topics we hit on, in the order that we hit them:

If we got something wrong or missed something, please file a PR! Our next show will likely be on Monday at 5p Pacific Time on our Discord server; stay tuned to our Mastodon feeds for details, or subscribe to this calendar. We'd love to have you join us, as we always love to hear from new speakers!

Creators & Guests

Host

Adam Leventhal

Host

Bryan Cantrill

What is Oxide and Friends?

Oxide hosts a weekly Discord show where we discuss a wide range of topics: computer history, startups, Oxide hardware bringup, and other topics du jour. These are the recordings in podcast form.
Join us live (usually Mondays at 5pm PT) https://discord.gg/gcQxNHAKCB
Subscribe to our calendar: https://calendar.google.com/calendar/ical/c_318925f4185aa71c4524d0d6127f31058c9e21f29f017d48a0fca6f564969cd0%40group.calendar.google.com/public/basic.ics

Bryan Cantrill: 00:00

Dave's here.

Adam Leventhal: 00:01

Dave's here.

Dave Pacheco: 00:02

Hello. I'm back. You're back?

Bryan Cantrill: 00:06

Back to back on the the podcast here, Dave. It's exciting.

Dave Pacheco: 00:10

It's an honor.

Bryan Cantrill: 00:11

It, it it the honors all hours, Brent. It Yes. Although this one, I think it's fair to say this one, you know, we we've said we've been talking about our episode on sagas. We've been thinking about it for a long time, Adam.

Adam Leventhal: 00:25

Yes.

Bryan Cantrill: 00:26

True. This one, a little a little more getting back into our groove of not running more than a week in advance for sure.

Adam Leventhal: 00:35

This is the hurry up offense.

Bryan Cantrill: 00:37

This is the this is the no huddle. This is the run and gun that Oxide and Prince is famous for. And so, I did I also loved the comment. There was a comment from last week's episode. It's like, hey.

Bryan Cantrill: 00:49

I'm really glad to see that Oxide and Friends is back into the the groove of not telling me what this thing is about for the first 17 minutes.

Adam Leventhal: 00:56

That, you know, that that was just building suspense. Like, we did after we did mention what it was about. We did.

Bryan Cantrill: 01:04

In in on a re listen, I definitely had some hate read the tweet fellas, got vibes on that one. I felt like, hey. Could someone define what a saga is? Unfortunately, we did, but it was good. We're building.

Adam Leventhal: 01:14

Awesome. We're building to it. You know? They got us set us up storytelling.

Bryan Cantrill: 01:17

Yeah. Exactly. You can't just rush in and say what we're talking about. That would be far too orienting. But also on this, we so to to give the context, we are users of CockroachDB.

Bryan Cantrill: 01:34

So, we, one of the the first things that we did, or or early on in the company is, we knew we, Rob's building a control plate. We wanted to have a data repository for that, and we embarked on an entire process. Dave embarked on a process, led the charge on evaluating, different options for that, and we decided on CockroachDB, and for a bunch of reasons. And we haven't really talked about it much, I don't think. I mean, I think that we the the most we talked about it, and correct me if I'm wrong, but I think Dave, the most we talked about it is in your debugging odyssey episode.

Dave Pacheco: 02:13

Yeah, and you've been probably right.

Bryan Cantrill: 02:15

And I don't know that we've talked about it publicly very much, because just based on the number of RFTs we made public for this. So we hadn't talked about it very much, but not deliberately. I mean, we we it's not that we were, you know, we had mentioned it obviously, and, but then, fate kinda forced our hand because, Cockroach Labs, which famously or infamously had, taken an open source project and, started to relicense aspects of it under the the BUSIL, the business source license, decided that they were gonna go all proprietary on Thursday morning. Right, Dave? I think it was

Dave Pacheco: 02:55

Yeah. That's right.

Bryan Cantrill: 02:57

So, we, obviously, this is a, an issue of great interest for us, and, had a conversation about it obviously on Thursday morning, and then, I I wrote up an RFP, that we made public on Friday morning, and included in on that, Dave, we made public all of the RFPs you've done, 53, 110, and then your, cockroach evaluation repo. That kinda took people through the entire process of why we decided what we decided, and then also, what we were deciding to do, with the the move to a strictly proprietary CockroachDB, in in RP 508. And I mean, obviously, it's like, you know, we are we are a a podcasting company that that that larps as a computer company. Did you see that Hacker News comment over the weekend?

Adam Leventhal: 03:54

I did, actually. Good for them. I mean, it is our show. But I

Bryan Cantrill: 03:59

I first of all, I lulled at that. I thought that was a funny comment. I I thought that was a very funny comment. It was on the Hacker News thread about this. Or maybe it was on the Hacker News thread actually, but it was about the RFD.

Bryan Cantrill: 04:10

Anyway, I thought it was very funny. One of our defenders, was was, which I totally admire, was calling him out on it, but whatever it's worth, I love.

Dave Pacheco: 04:20

I thought it

Bryan Cantrill: 04:20

was Johnny. The, but that's I think our our our our defender is probably right, that this is someone who is probably one of our detractors, one of the members of the Heter's Club, who knows. But we, got the RFP out there. The RFP got more attention than I actually thought it was going to. And it might is that naive?

Adam Leventhal: 04:41

No. I mean, assisted in no small part by friend of the pod, Kelsey Hightower, who had just such a delightful repost of of what you had done. Like, you know, quote tweet, you know, but just keeping praise. Like, almost too much praise. Like, suspicious.

Adam Leventhal: 04:59

But I

Bryan Cantrill: 05:00

I feel that you were suspicious. I feel like your first thought was Brian has hacked Kelsey's account.

Dave Pacheco: 05:05

Well, I think I

Adam Leventhal: 05:06

to see if Kelsey has

Bryan Cantrill: 05:07

been a police task person. Logged in around the Oxide office, and and that little mischief maker has gone in and tweeted out, that it was, very fulsome praise from Kelsey. Really, I mean, on honestly, it's like and, obviously, you know, you always, are it's always very meaningful to get earnest praise from people that you hold in high regard, and obviously, we hold Kelsey in highest regard. And, that was really it was really unexpected. It was it was delightful, but very unexpected, to get.

Bryan Cantrill: 05:36

And I think also, I mean, it, I I RFT 508 is, you know, I think it did a good job of kind of explaining our options there in our decision. But I actually think that the, like, the bigger piece of this, I mean, obviously the bigger piece of this is, Dave, the work that you had done, in 53, RFP 53 to define the rubric, and RFP 110 to describe why we selected CockroachDB. So I do wanna hit on all of that. I I feel that I I mean, I don't want to, like, overly fixate on the action of Cockroach Labs here, but I also feel that I would be not be being true to myself. I actually and and and there was one comment online.

Bryan Cantrill: 06:20

It's like, oh, I love this RFP because, like, you know, there's no, like, you know, there there wait. What was the line? There was, you know, no no No.

Adam Leventhal: 06:27

The garment. Yeah. Right.

Bryan Cantrill: 06:28

The rending with the garment. That's right. And you're like, yeah. Wait. Well, tune into the podcast.

Bryan Cantrill: 06:32

That

Adam Leventhal: 06:33

Yeah. I mean but does it make sense to sorta go into the time machine and, like, dial it back to whenever it was 2020? I

Bryan Cantrill: 06:42

do wanna go to the time machine. I think before we go to the time machine, we I I I just wanna get it out a little bit. I I know. This is where I know. It's like, this is Odysseus being asked to be tied to the mask here earlier.

Bryan Cantrill: 06:54

You're like, okay. Now untie me. I know I know all that those things I said about the mask. Now untie me. Just so I because I I do think that I mean, yeah, and I described this, you know, I and I had a p 99 comp talk that we then had a podcast episode on of the kind of the update to, corporate open source anti patterns.

Bryan Cantrill: 07:13

And, you know, I relistened to that talk over the weekend just to kinda remind myself what I'd said because we we did call out Cockroach Therin in a positive way for the business source license, the way that they defined it being, relatively crisp, or crisp, actually, not even relatively crisp, like crisp. I I do think that the concern on all these relicencings, and and this one also, is not the the cockroach has the indisputable right to do it, and, you know, they certainly, we want them to have a thriving business, and they've got a right to be part of it, so I definitely understand that. I think that you always have to be mindful of social contract. So that's kind of the question is like where are the social contracts here? And I don't think they've got much of a social contract with oxide.

Bryan Cantrill: 07:55

I think they've got because and Dave you you've you've talked with this a lot, just internally, but like they they delivered a a they have developed a an engineering artifact, a software artifact that's been really good, and they they even don't owe us things. So I I think that the social contract with Oxide and with other other is is pretty minimal. Think there's some other social contracts here that are are probably more important that I can't really speak to. So, I mean, it's like I I mean, in particular, I do think, like, the the most important social contract, and the maybe the answer is just, like, there's no one in this category but are contributors to Mhmm. To CockroachDB.

Bryan Cantrill: 08:34

I think that those are the outside contributors CockroachDB is in many ways the most important social constituency here because they their work was made available under one set of terms and now it's being made strictly proprietary.

Adam Leventhal: 08:48

Yeah. I I think there's another question it raises about, you know, what what the direction of the company is. Like, who is the company? And this is not to answer for other people and with regard to Cockroach Labs, but what it, you know, what does it say to us and how does it affect our use of it? And in particular, in this case, for us, you know, some of the terms around it like unlocking at 10,000,000 ARR and some stuff like that just was signaling in such a way that, to me at least, indicated potential future greater incompatibility between values or or between our use case and what they intended.

Bryan Cantrill: 09:25

Yeah. I know that that's a good point, Adam. And I do think so something to elaborate on before just before we step into the time machine because we talked about, like, you know, would we let's take a look at at, you know, what's kind of available. And, they do have a source available CockroachDB that is, that is not licensed, but, there are two requirements on there that, you mentioned the the $10,000,000 of annual revenue, that you have to be less than $10,000,000 of annual revenue, and then also you have to have mandatory enabled telemetry, and that's that's an absolute nonstarter for us, like, a 100% nonstarter because we our customers are deploying in potentially air gaps, secure facilities. It's like, that's a hard nope.

Adam Leventhal: 10:07

Like, we don't even get telemetry from our customers.

Bryan Cantrill: 10:09

We we don't get telemetry from our customers. Exactly. So it it so, like, you guys want to you definitely don't get telemetry. Sorry. The, so it's like, okay.

Bryan Cantrill: 10:17

What then? Why not license it? And I think that, you know, a question that came up online is, like, alright. So just, like, license the thing. And, we cannot encumber this product at all from a software perspective.

Bryan Cantrill: 10:29

So, there every bit of software that we have in this product, we have to have a perpetual royalty free license for. And it's not impossible that you can negotiate that with Cockroach Labs, but, it's, yeah, I would say it's it would get them, I would say, out of their sweet spot and probably out of where they wanna be. I mean,

Adam Leventhal: 10:51

I think it's impossible I think it's impossible in terms of, like, there's just too big of a gap between what we would conceivably pay and what would make sense for them to bother doing. Like and we we just like, I don't think it's a use case that they particularly care about. Right? It's not what they're trying to preclude. It's not what they're trying to enable.

Adam Leventhal: 11:09

It's not big bucks

Bryan Cantrill: 11:10

for Or or should care about. Like, this is like don't yeah. The the don't like, please, let us be in our own filter. You do not want and so we got a well, we got other, like, aspects of the problem. Like, so we are I mean, our software is gonna go out to a customer, and it's gonna run for potentially a long time.

Bryan Cantrill: 11:25

And we can't make real guarantees, like, that can be pretty down rev with respect to CockroachDB. And so you can't really tell us, like, oh, by the way, this is out of the support window. It's, like, sorry. Bad news like that. Like the and so just a bunch of other things, like, we're gonna take you out of your sweet spot.

Bryan Cantrill: 11:45

So, we're not, almost, it doesn't make sense. But we it's it's very important for us because we need to make sure that one, we take the responsibility for being able to understand any misbehavior on the rack. But then also, it's really important because we need to have control over our own pricing, and we need to make sure that we are a big part of the oxide value is all the software that's built in. So that's why all that that that's kind of why that's a a non starter, kinda where we where we landed on that. But, so with that, I think we can step into the time machine.

Adam Leventhal: 12:20

There we go.

Bryan Cantrill: 12:21

I I think yeah. Exactly. I've been I've been coaxed in a lot like rendered the garments. Dragged into the plane. Yeah.

Bryan Cantrill: 12:28

Right. Exactly. We ran into the garments. So, Dave, when you so I think it is worth so there are a couple of reactions online under Dave how much you were looking at this. There is definitely a why didn't they just use Postgres?

Dave Pacheco: 12:48

Oh, really? I hadn't quite seen that.

Bryan Cantrill: 12:50

Yeah. I kinda figured you hadn't. This feels

Adam Leventhal: 12:53

like blindsiding, but this is

Dave Pacheco: 12:55

How much time do we have?

Bryan Cantrill: 12:56

I am biased. Well, no. No. No. It dovetails into a question that Dave had asked the 2 of us, Adam, which is like, may I spend the first 45 minutes talking about those crusts?

Dave Pacheco: 13:04

I think I said 90 minutes.

Adam Leventhal: 13:06

You said 90 minutes for sure.

Bryan Cantrill: 13:07

Right? He's saying 90 minutes. Yeah. And Adam's like, of course, because Brian thinks that that 90 minutes is 45 minutes. This is why I've got so many domestic problems because I've had this is why it's all made promise to everyone.

Bryan Cantrill: 13:20

It's right. It's all making sense. Brian thinks this podcast is 45 minutes long. Like, that's the problem. I think we actually found it.

Bryan Cantrill: 13:28

So, I mean, Dave, I think it is worth talking about, why our experiences with Postgres, and you've got, I mean, a line in RFT 53 that is, well, you've got the there are some whatever a subtweet is when it exists as a line in an RFD, about Postgres. But you wanna talk about our experience with Postgres a little bit, and, positives and, findings.

Dave Pacheco: 13:57

To be clear,

Adam Leventhal: 13:57

this is your experience at Joyant.

Dave Pacheco: 13:59

At Joyant. Yeah. Yeah. And so I'm not sure how how far back in the time machine we want to go. But I wanna give at least a little bit of context.

Dave Pacheco: 14:08

So, certainly, the whole process of figuring out what we were gonna use for the database here was heavily informed by having just previously come off of several years fighting spent most of that time fighting issues productionizing Postgres databases.

Adam Leventhal: 14:25

If I get interjective Yeah. Every time I would go over to your office for lunch with Brian, no joke, you'd be in some Postgres firefight. And it

Dave Pacheco: 14:37

it just got to come

Bryan Cantrill: 14:38

to lunch every day. It would have been I mean, yeah. What what Okay. Good. I just Every time I visited you guys in your office, you were in your office.

Bryan Cantrill: 14:44

I yes. Yeah. It was cool. Yeah.

Adam Leventhal: 14:47

Yes. Yes. We were doing the work that we do every day. That's what anyway.

Bryan Cantrill: 14:50

Yeah. That's what we were doing. I mean There

Dave Pacheco: 14:52

was yeah. There was a lot of that. So, okay. So this started, you know, at back at Joynt, we built a system called Manta, which, the sort of short version of it is it's very similar to Amazon S3 in spirit. It's an HTTP object store, you know, put, get, delete of large blobs.

Dave Pacheco: 15:10

And, some things that were different that are that turned out to be pretty relevant for many reasons, there's that it was strongly consistent at a time when s 3 wasn't. So if you put something into Manta, you would get it back out, which required a strongly consistent metadata tier that we'll get to in a second. And it also imposed a directory structure, so that you could list the contents of things efficiently, essentially. And so this thing, you know, implement I'm going super fast, but the implementation is basically divided into storage and metadata. Right?

Dave Pacheco: 15:38

The storage is durably storing these large blobs of data, which you would not use something like Postgres for. And then metadata is this gigantic database mapping user facing names to the underlying storage objects that, that contain that data. Right? And, of course, it's a cloud service, so we need high availability. How do we build high availability?

Dave Pacheco: 15:58

Well, let me get back to that. The the other thing we need is horizontal scalability. And so we ended up going with Postgres largely on the strength of its reputation around data integrity and performance and just general rigor. Right? Is this that your recollection, Brian?

Bryan Cantrill: 16:15

Do I need my lawyer present?

Dave Pacheco: 16:16

It feels like I need I'm

Bryan Cantrill: 16:17

just a little bit pointed. And I unfortunately, I fired Adam as my postmortem lawyer after the last postmortem in which I was hung out. So I'm actually I need a lawyer here. I'm gonna have to go with the the public defender. Actually, we need to do the public defender.

Dave Pacheco: 16:30

I think that's right, though. I mean, I think we didn't do a lot of, like, testing or, like, I don't know how much we did, like, a serious survey of alternatives, but it seemed like the thing to beat. I was like, well, what It

Bryan Cantrill: 16:40

it what's driving? It was.

Dave Pacheco: 16:42

No no. What else will we use and

Bryan Cantrill: 16:44

It was what we did it on Vibes. Absolutely. I it it kills me to say it, but we did it on Vibes and Reputation. And we did not dig into that Reputation at all, really.

Adam Leventhal: 16:55

And I would say that's how most people make technology decisions, though. Like, it's it's expensive or perceived as expensive to do, like, a a rigorous analysis, and going with the consensus approach makes sense in a lot of or or in a lot of cases. And, I mean, until it doesn't.

Bryan Cantrill: 17:13

Yes. And I feel that, like, if you had frozen time and said, hey. Wait a minute. You guys are doing it the wrong way. Like, don't you wanna do something more rigorous here?

Bryan Cantrill: 17:22

I don't know, Dave. I mean, I you I've got way more faith in you than me. I I think I would have been like, no. But, like, we know all these people that are using it. Like, we and, of course, our accountant, like, do you know it?

Bryan Cantrill: 17:33

Do you know it know it? Like, what do you miss them? What do you think you know? Well, what do you think you know? Have you because, you know, and I I and so I think that you're right, Adam, that this is the way, like, lots of people make make decisions, and I don't know that we thought it to be as flawed as we now know it to be.

Dave Pacheco: 17:54

You know, even in the end, it's, it's kind of a mixed bag. I mean, we made it pretty far on those, on that decision and those vibes. Right? I don't know. I I feel like

Bryan Cantrill: 18:05

Well, and I I think we no. And I think you're right. And I think I mean, I don't know that we would have come out with a different decision necessarily, but I think we would have been eyes much wider open about some of the perils. I mean,

Dave Pacheco: 18:18

is it nice to say that when we did this again, almost 10 years later, we made a different we we went about the decision making process differently, which is why we're here today talking about it. Right?

Bryan Cantrill: 18:29

Yes. And so if if in RFT 53, you have a explicit non requirements, and so this is people were wondering, it's like, well, this is kind of a loaded paragraph. Based on past experience, the reputation of a system or stories about it being used by other organizations are weak data points. We will want to independently verify any properties we care about. That is a Postgres subtweet, such as it is.

Bryan Cantrill: 18:54

And that's more criticism of us, not Postgres. It's not Postgres' fault that we did not evaluate it rigorously. So, Dave, what did we find just to give so a I I mean and and you linked to the post mortem, one of the post one of the post mortems.

Dave Pacheco: 19:11

But Yeah. So just to be a little tiny bit more context, you know, what we ended up doing for this metadata tier for high availability and horizontal scalability is sharded Postgres. Sharded so we had in our initial production deployment, we had, like, 3 shards of Postgres, and each of them was using synchronous replication with an async peer. So it was a primary, a synchronous peer, and then an asynchronous peer. And so for the high availability part, you are using the synchronous replication, and then you do a failover.

Dave Pacheco: 19:39

And now the former sync is now the primary. And then for horizontal scalability, we built this whole consistent hashing system on top of it that would figure out for a given user key what shard its metadata should be on. But it's important to say that, like, that was a lot of engineering effort. Like, a whole lot. We we spent, on Manatee, which is the component for managing the high availability, and, Electric Moray was the consistent hashing thing, and all the stuff around setting up synchronous replication.

Dave Pacheco: 20:07

Which is relevant because you don't have to do well, they as we'll get to the CockroachDB, like, this is sort of a solved problem in more modern distributed databases. And that is kind of that was an important factor. We didn't really wanna go and do all that work.

Bryan Cantrill: 20:21

That's definitely right. I also think that I mean, my conclusion was, like, Postgres is just built in an era when people go home at night. And Yep. When the and there are lots of design decisions have been made in Postgres that are relying on the database getting some breathing room at some point and being able to catch up.

Dave Pacheco: 20:39

And we Yeah.

Bryan Cantrill: 20:41

So It would it was just fine for years for us, and then it was not fine.

Dave Pacheco: 20:47

Yeah. So we were in production for several years and had we had some minor outages that were basically due to insufficient auto vacuum in the compute tier, which we didn't really talk about. But we were also using the same primitives for the compute tier, and we would run into these weird cases where, like, pathological performance. Like, queries over, like, literally a 10 row table would take, like, 70 seconds because it had all this trash in it that the auto vacuum hadn't cleaned up, which is, like, you know, that's a whole complicated thing. And then we ended up having this 6 hour outage.

Dave Pacheco: 21:18

I wanna say July 27th. Am I I don't know if I'm gonna get the year right. Oh my god. Yeah. You're right.

Dave Pacheco: 21:23

It's July 2017. 2014.

Adam Leventhal: 21:26

2015. You you nailed that. I mean, obviously, this is a date. You know what?

Bryan Cantrill: 21:29

It's not a good date. Yeah. It's a bad one. That was born on this one though.

Dave Pacheco: 21:36

That one was kinda crazy. So this was wrap around auto vacuum for people who may already be familiar with it. But what it boiled down to is it was a a sort of latent bug in, I don't know, the configuration or the setup of the system or something. That would only you'd only experience on the 200 millionth transaction of the database. Like, let that sink in for a second because every time you hear about people debugging outages, it's always like, well, what changed in the last hour?

Dave Pacheco: 22:04

And this is always my canonical example of like, yeah, the thing that changed was we ran a 200 millionth transaction since we had launched. Like, that was not something that we had.

Adam Leventhal: 22:14

Dropped from the ceiling.

Bryan Cantrill: 22:15

And balloons, balloons all made of explosives.

Dave Pacheco: 22:18

And the behavior was hard hang for 6 hours. And, like, we don't have to go into the whole thing, but basically we had a combination of things that were taking a combination of locks that was basically fine most of the time, but the wraparound autovacuum takes a particular type of lock and holds it for 6 hours, or at least in this case it was 6 hours. And that prevented all the read blocks from being takeable by anything in the system, and so the whole thing was basically stuck behind that. But, okay, so this was like, these were the kinds of problems we had in the early years, but they weren't like, I don't think they called into question like, are we doing the wrong thing? They were exactly We were kinda like, alright, these are kind of big things, but like, let's go spend some time understanding them and figure out how to fix that.

Dave Pacheco: 23:04

I don't know. They weren't This was This is a small compared to the stuff that we ended up hitting later. So I was gonna dive into the Samsung era.

Bryan Cantrill: 23:13

Yes.

Dave Pacheco: 23:14

So then Samsung buys Joyant and in part to be able to deploy several much much larger mantas. They were about, you know, 2 orders of magnitude larger than our production one. Was that was sort of what we were going for. And unlike our production one, there really were like a 100% duty cycle. Like, I mean, it was like that in our production one, but it was less latency sensitive, whereas, with what we were trying to do afterwards, it was we are trying to do tons and tons of writes constantly for, you know, days, weeks, months on end.

Bryan Cantrill: 23:49

Yeah.

Dave Pacheco: 23:50

And that's where we started running into a whole lot of problems. And we got hammered on so many fronts in terms of, especially synchronous replication, more auto vacuum, you know. And and, like, this is where, like, do we want to go in for 90 minutes? Because, like, there's We found a lot of behavior around synchronous replication that was either undocumented or, like, very poorly documented, maybe not widely understood, and it just it contributed to feeling like this thing is really hard to operationalize. And even though even if you know about these things, they're very hard to work around or fix.

Bryan Cantrill: 24:27

When you have this moment, you realize like, oh, god. We are pushing this harder than anyone else.

Dave Pacheco: 24:33

Yeah. That

Bryan Cantrill: 24:33

we and and be there were and to give you kind of a very concrete example, I mean, Postgres had this idea that, like, I understand where they came from, but it, you know, it feels like an elegant idea of, like, you know, what is replication if not crash recovery? Like, where are we going with this exactly? It's like, well, it's the same logic to recover from a crash as it is to, like, replay a log that you've just been shipped. So let's use the same logic for that. And the problem is that the the wall replay logic on the secondary is single threaded.

Bryan Cantrill: 25:03

And the database is massively multithreaded, multi processed. And it's like, okay, so if you're throwing as much work as you can at this thing, it's like you obviously can't keep up. Right? And we would have replication would be, like, I mean, days behind. The secondary is days behind the primary, and there's no real, like, blaring alarm telling you, like, by the way, you think you have a replicated database, and that is true in the most academic sense.

Bryan Cantrill: 25:29

Maybe only the academic sense. Because if you lose your primary, your database is gonna be down for hours and hours and hours and hours. Well, we while we We did.

Dave Pacheco: 25:39

Have takeovers that took days.

Bryan Cantrill: 25:41

Days. Days. And the thing that was really frustrating is that we and I think this is I this is the part that, like, broke me a little bit on vibes as as engineering methodology. When we talked to folks who deployed a lot of Postgres, they'd be like, oh, yeah. What do you mean you owe?

Bryan Cantrill: 26:01

Yeah. And we're like, what do you mean oh, yeah? Like, oh, yeah. Yeah. No.

Bryan Cantrill: 26:03

Like, this is yeah. This is a thing. It's like, okay. I guess validating. But what do you mean?

Bryan Cantrill: 26:07

Okay. But but when we talk to you before, you're like, Postgres is great. I can't even hear any of these experiences. Like, where, you know, that there's just such a it felt like and I and maybe this is overly critical, but it kinda felt like it's an open secret in Postgres that, like, there are these ridiculously sharp edges and steep cliffs, And they, you know, they kinda undermine the narrative of post cross being so reliable so people don't talk about it. And then, I mean, I do remember, and you know who you are out there in the world, so I'm not gonna name you.

Bryan Cantrill: 26:39

When we were, we had someone you know where I'm going with this, Dave. We had someone really pinned on this, and it's like, well, look, if you really want that, you need to play Oracle. It's like like, no. That's not no. Sorry.

Bryan Cantrill: 26:52

That's like the the the the the the no. That's that's not the right answer. You can't both say that this is a, this is kind of a I mean, there's this, like, major gap in and, like, on all 4 major gaps, it's just, like, we need to be upfront about it. And but that's, again, it's on us. We do we didn't do the testing.

Bryan Cantrill: 27:11

We didn't find those gaps until it was too late.

Adam Leventhal: 27:15

Do you think the do you think those are found findable?

Bryan Cantrill: 27:19

So I had this sort of surreal moment. Yeah. Go ahead, Dave.

Dave Pacheco: 27:23

I had this surreal moment at, one of the postgres comps that was in San Francisco. I think I went with Jan, and there was a talk about I think about wall replication or maybe it was about all diff all the different supportive kinds of replication. And I asked about this problem at the end, which we haven't really gone into the technical details of it, but it's basically, you know, you have the synchronous replication apply lag that just builds up and it can't catch up because it's on it's single threaded. And the speaker, like, agreed and referred me to the thing that we had built to work around this problem. Like, you should check out what the giant folks had done with the future.

Bryan Cantrill: 27:59

You're like, I am the giant folks. I built that.

Dave Pacheco: 28:02

I mean, I appreciate I mean, obviously, appreciate that, like, that was known and, like, people were now starting to talk about it. But it was also, like, how what? How was this the I I mean, I do think it is worth a little more texture on this problem. So if you set up, this is wall replication in Postgres, it's basically shipping the write ahead log from the primary to the secondary. And there's 2 steps there.

Dave Pacheco: 28:28

1 is well, 3. There's one is sending the data and there's writing it durably to disk in fsync, and then there's actually applying that to the live database that's on the sync. And it's synchronous with respect to the sending and writing to disk, but not with respect to the application. And so, what would happen is you would be you would think you were synchronously replicating, and you would be like you would all the data would be there technically, but the apply lag would or you have what's called apply lag, which is this lag between what you have actually written to disk and what you've actually applied to the live database. But But in order for that thing to become the primary, it has to have finished applying all those changes.

Dave Pacheco: 29:03

And so, when you would do a takeover, you'd be like, oh, by the way, I actually have 3 days worth of stuff to take to to catch up on that it that like, it's very easy to not know. Like, you were doing synchronous replication, like, how can you have a whole bunch more work to do? And it it was just accumulating it for a long time. We we called that secret lag. And then discovered many months later something that we call double secret lag.

Bryan Cantrill: 29:25

Hope so speak.

Dave Pacheco: 29:26

Which is that there's a separate process called checkpointing, where it takes whatever has been applied via the wall and like kind of serializes a snapshot of all of the state to disk so that it doesn't have to replay the wall again. But our check pointer also would start falling behind because it is also single threaded. And so, you could be fully up to date on the apply lag, and then restart as part of a takeover. But you hadn't checkpointed in 2 days, and again, you have to go reapply all that stuff. And that that mean, these were real gut punches each time we discovered this.

Dave Pacheco: 29:56

And that one, like, there's not even a way to really monitor that. There's no At least at the time, there wasn't a way to ask Postgres, how far behind were you on this? You could get it, but you had to like run up a command line thing. It wasn't built into any of the other monitoring tables they have built in. So it definitely felt like, I mean, I guess it felt like a couple of things.

Dave Pacheco: 30:15

There weren't There wasn't a lot of documentation around this, and maybe we were the first people to hit it seriously or to hit it this hard and and hit these problems in the ways that we did. But the takeaway for us was certainly that it's not I mean, you have to test it. That's what it boils down to. And you have to test it at high load, at high rate throughput, whatever it is you're trying to do. You have to do the fault testing under those conditions and see what happens.

Dave Pacheco: 30:42

There's no substitute.

Bryan Cantrill: 30:45

That's exactly right. Like, no one cares about your workload the way you do, and Yeah.

Dave Pacheco: 30:49

That's a better way to say it.

Bryan Cantrill: 30:52

We and this is why, like, I just don't care about anecdotal information on people running software. I mean, it's like, it can kinda point you in the right direction, like, okay, I'll go, you know, I'll add that to the list of things to evaluate, or I'll go decide that for myself. But, even, like, people's good I I I I know Davey and I are both scarred in the same way. Like, good experiences with software don't really tell me anything. Bad experiences, I'm curious about.

Bryan Cantrill: 31:17

Like, that's interesting.

Dave Pacheco: 31:18

Yes.

Bryan Cantrill: 31:18

Because that actually happened.

Dave Pacheco: 31:20

The bad experiences are great because they give you a better sense of of how far people probed and where they found the limits. And so, you know, if you're like, well you hit that problem then that means you didn't hit these other problems or something. It gives you a better sense of what was working and also obviously what wasn't. But, yeah, when people have a good experience, it's very hard to draw conclusions about that.

Bryan Cantrill: 31:39

Yeah. And on the one hand, it felt like, you know, and, you know, we got we got Adam in the chat. He's basically blaming us for having to play Postgres and not having realized that that replication, AJ, are terrible. And it's like, don't worry. Like, you're not the first, Don't be the last.

Bryan Cantrill: 31:50

We know. To be

Adam Leventhal: 31:51

clear, other Adam. Not Adam. Not this Adam. I just wanna No. That he can't myself of victim blaming you.

Bryan Cantrill: 31:56

I I thought it was more interesting to kinda leave it open ended. Just what or, like, is Brian referring to Adam to the third person? No. The I did ask my friends at Velocity Adam, other Adam. I did ask my friends at Velocity.

Bryan Cantrill: 32:08

I asked all of my friends at Velocity. And the so that I did do. And, no, it's like, no. That's wrong. Actually, the the the conclusion, like, oh, you should've asked more people.

Bryan Cantrill: 32:18

No. Wrong. Wrong. What we should've done is tested it ourselves. That's what we should've done.

Bryan Cantrill: 32:23

And, like, the what what my friends say at Velocity is only interesting in as much as it's pointing us to things that that that actually, to go go verify ourselves. And, I do not care about vibes. I care only about so, that's I would say, that that was our disposition, Dave, coming into this, of, like, we because this was the other thing that that was also a little bit I mean, this was I this is the the, only time I've been screamed at in my life, by a, by a colleague not at joining, was over this issue. The, like, stress levels were, like, through the roof. People were extremely upset about this thing, and, you know, being told, like, well, you should've done, you know, it's it was, it would felt very frustrating, and it left a mark.

Bryan Cantrill: 33:15

It's, like, definitely not gonna do it that way. Very educational, in in that regard.

Dave Pacheco: 33:22

Yeah. And and on the specific question of, like, would what about using Postgres again for something like this? I mean, it's true that a lot of this stuff has been improved, and there there's like a I mean, I haven't followed all the stuff in the postgres world. There's logical replication now and and all kinds of other stuff. But there were so many other ways that the the thing how do you describe?

Dave Pacheco: 33:44

I mean, it just it was built from an era, like you said, where you have downtime, where you can go do a bunch of these maintenance operations, and you also got like a team of DBAs that are basically monitoring this thing or at least making sure that it seems healthy most of the time. There's there's vacuum I mean, upgrades are offline for example. That's not a thing you can do online. If you're doing replication synchronously, you couldn't you have to update all of them atomically which you can't do, so you have to take that down. The protocol is not guaranteed to be backwards compatible across major revisions.

Dave Pacheco: 34:15

I mean, there's all these and it's not to criticize Postgres for what it is. It it was built in an era in which those things were a lot less important, I think. And

Bryan Cantrill: 34:23

the That's right.

Dave Pacheco: 34:24

The zero downtime, you know, easy horizontal scalability, high availability, all this stuff. It just wasn't as much of a thing. But more modern databases have been built in, you know, with those constraints in mind, and that's kinda why we started looking around more broadly.

Bryan Cantrill: 34:39

Yeah. And again, I I'm I think we're very happy for Postgres, and we want all the success in the world for Postgres. But, like, what I have said about c plus plus, where it's like, I want all the success for in the world for c plus plus, but c plus plus dragged my shit into the street and lit it on fire. And I don't care that it was 20 years ago. Like, I'm happy that c plus plus 25 years ago.

Bryan Cantrill: 34:58

I'm glad that c plus plus is, like, got its, you know, has managed to straighten its life out, maybe. Like, we're not dating. Sorry. So I feel the same way about Postgres. It's like, yeah.

Bryan Cantrill: 35:09

I'm happy for you Postgres, but nope. We're just not or or it would be under very, very limited conditions where I knew exactly where we're getting ourselves into.

Dave Pacheco: 35:20

So thing that, I can't remember if I've, like, actually, if I even talked to you about this brand, but I I talked to other peers in the industry about stuff like this. I mean, I kind of needed to while we were going through this is, like Yeah. Therapy sessions. But people were telling me too that, like, they'd used other things like Cassandra and other databases that also and they were never nearly as bad as what we'd experienced,

Bryan Cantrill: 35:40

but they

Dave Pacheco: 35:41

would totally start kicking off some background operation, and then Oh, absolutely. Go to hell, and all kinds of horrible things would happen. And so, that was another reason for me to feel like, okay, maybe all these databases have this potential in this problem, and then the trick is going to be figuring out where that is before it's a problem for us. And we had our own data nightmare.

Bryan Cantrill: 36:01

We're like, the, so we had a wholly separate nightmare, which ultimately ended up being a configuration problem, with, Elijah and Zapatracheck, debugged extraordinary. We, Elijah and I worked on this for a long time together, and I learned a lot about Cassandra, and one of the things that, you know, we had the preeminent, Cassandra, tuner consultant in, and we're looking at GC times, and are we had, like, GC times, freeze the world GC times of, like, a 110 milliseconds. I'm like, well, that's gotta be feels like that's a problem. He's like, actually, that's pretty good. I'm like, that's pretty good.

Bryan Cantrill: 36:38

And he's like, I know. Right? Who would think of writing a database in Java? And I'm like, okay. Yeah.

Bryan Cantrill: 36:44

Okay. You know? Right. But, yes, I think that there's it it I I just you just realized that, like, the things that you care about in something may or may not be what the other people who are running it care about. It may not be what the people who wrote it care about.

Bryan Cantrill: 36:59

And you that's the kind of, like, you wanna find that values match as much as possible.

Dave Pacheco: 37:07

Totally. And so So we get to oxide.

Bryan Cantrill: 37:11

Yeah. We get to oxide. And it's it's a new world at oxide. We and we've got a control plane that we need to go build.

Dave Pacheco: 37:17

And it's it's several years later in terms of, like, the industry development of distributed databases. Right? Yep. There's this whole group of databases called NewSQL based on Spanner basically. They're like born in the cloud era.

Dave Pacheco: 37:31

At least some of them with hands off operability in mind. And they have things like, high availability, online schema changes, rolling upgrades, automatic sharding, online conspec contraction and expansion. Just like built in. And it's like that sounds pretty good. Let's not go rebuild all the all those things that we had to go build on top of primitives that weren't really intended to be able to support that.

Dave Pacheco: 37:56

So, we looked around at a bunch of those, and I don't know how much we wanna go into that, but CockroachDB was basically, I would say, the most appealing of those, for various reasons. The big one was that it had a really strong Jepson report. So for for folks that don't know, Jepsen is, well, it was originally a project. I think it's now a consulting company that will basically Kingsbury. Yeah.

Dave Pacheco: 38:19

Stress test Yeah. Kyle Kingsbury. It'll stress test all kinds of distributed database like things to verify their cap properties. Their properties with respect to the cap theorem. Consistency and availability in the face of a partition.

Dave Pacheco: 38:33

And a lot I mean, a lot of them are horror stories. A lot of the reports around, like, all the terrible things that went wrong. And for some of the databases we looked at, some of these NewSQL ones, not only were there terrible inconsistencies in the data, but, like, the thing would keel over partway through the test and, like, not be able to be recovered. And those were obviously pretty worrisome. And concord.

Dave Pacheco: 38:56

Db's report was really strong here. So that was a big reason, I think, we were like, okay, that sounds pretty plausible. But then it also had a bunch of these features that just felt like it was actually pretty well aligned technically with what we wanted in terms of hands off operability. So like, with Cockroach you basically just like run Cockroach, run, and then you point it at the other nodes of the cluster and it just kinda figures it out. Whereas with some of the other ones, you would have to go deploy like 4 or 5 different components in a bunch of different places and get everything pointed at each other, which is fine, but like that's just a bunch more software you have to go build to make sure that that all gets set up correctly.

Dave Pacheco: 39:34

And you just kinda didn't we didn't have to worry about that with CockroachDB. So that was pretty appealing too.

Bryan Cantrill: 39:39

Yes. This is a really really really important point because, I I mean, we're not surely, we're not the first to do it obviously, but I think it is unusual to ship a distributed system as a product. And it's also unusual to ship a distributed system as a product that works. The so we are we are shipping a distributed system as a product that's going to exist over an air gap where we can't get

Adam Leventhal: 40:03

Critically, no remote maintenance, no remote hands. Right? Like, this is it's gonna gotta just work. We're sending it into space.

Bryan Cantrill: 40:11

It's got it's gotta just work. Yes. Which feels like it it it doesn't feel like you're splitting the atom with that requirement, but you but as you take it apart, it's, like, actually really really hard because you're taking a lot of these systems, a lot of these systems have an implicit dependency on an operator. And I think in the Cloud SaaS era, that's more, that's kind of more true than ever, honestly. And that is something that, I mean, Dave, that that was very, very, very important.

Bryan Cantrill: 40:39

That that, the ability for these things to to operate autonomously and automatically, and for us to be able to, through software, control our actions and not rely on an operator.

Dave Pacheco: 40:51

Yeah. Yeah. Yes. Huge. Very huge.

Dave Pacheco: 40:56

And I'd say that has worked out pretty well

Bryan Cantrill: 40:59

for us. Yes. Well, I think it just that that is also where a lot of these other things fell down. Yeah. Where we and, you know, I think that it's because folks have been asking, well, you know, what about this?

Bryan Cantrill: 41:11

What about that? You know, what about you know, and there was folks were asking about Yougabite, and, you know, Yougabite has some strengths, for sure, in that it is it, it was definitely, it's purely open source, completely open source, which is great. And we we kinda like the Postgres compatibility, same size. But the the Jepsen report was a problem for a gigabyte, and we were concerned about some of the operability of it. The Yeah.

Dave Pacheco: 41:37

I think that was one of the ones where it crashed during the test and required intervention to bring it back up, if I remember it right. And they fixed a bunch of those issues too. So it's not it's not like these are just, like, permanent problems, but it was it was worrisome.

Bryan Cantrill: 41:51

It was worrisome. And, we looked at, TiKV and TiDV. We had some similar kind of, problems there. The, so, I mean, with, we had, and then we looked at FoundationDB, people were asking about FoundationDB as well, and I think, you know, FoundationDB we didn't dig into too too deeply because I think we concluded that it was gonna require all us to build quite a bit. Is that right, Dave?

Dave Pacheco: 42:21

Yeah. That's right. So at that point, I don't think it even supported secondary indexes out of the box. That was something that people would build on top of it, which was fine. I mean, yeah, the takeaway was that it seemed like people had really great experiences and could build the thing that they needed and had to invest quite a lot of engineering to to take it for from what it ships within the box to what they needed.

Dave Pacheco: 42:43

It it really seemed like what the name says. It's a foundation for your database. It's not the complete thing.

Bryan Cantrill: 42:49

And our constraint on this was that it was open source. So we

Dave Pacheco: 42:55

are I saw that. I mean, I I saw that I mean, I've I saw that again today for the first time since reading this RFD, you know, writing this RFD, like, 4 years ago. Yeah. And I'm I'm like a little bit, like, I mean, it was BSL. Right?

Dave Pacheco: 43:09

I mean, it's not

Adam Leventhal: 43:10

Not open source. Right. Right?

Bryan Cantrill: 43:12

Not open source. Yeah. Right.

Dave Pacheco: 43:13

Close enough. It's close enough. It was just interesting to me rereading that that, like, oh, yeah.

Bryan Cantrill: 43:17

Oh, reading one's end in terms of your yeah. I mean, it's it's it's prescient.

Dave Pacheco: 43:22

Yeah.

Bryan Cantrill: 43:23

So, so, yeah, we'll we'll talk about how kinda how Cockroach did on, like, its strengths, and this is, you know, it's outlined in 110. And then what were some of the weaknesses that we saw out of Cockroach?

Dave Pacheco: 43:33

Yeah. So I can pull this up to remind myself. But basically, you know, we did we did a whole bunch of testing. We did online expansion. We did online contraction.

Dave Pacheco: 43:42

We did an online schema change. I did, like, you know, basic sort of p kill, like, send it a 6 seg v, rebooted the operating system out from under it. I also panic the operating system, which a lot of people don't realize is different than rebooting the operating system because TCP connections don't get cleaned up. So it looks more like a partition for a little while. And, in all of these cases, we I mean, not a high bar, but there was no data loss.

Dave Pacheco: 44:09

That's pretty huge.

Bryan Cantrill: 44:11

Yeah. Go ahead.

Dave Pacheco: 44:11

Not a high bar, but, like, pretty critical. I wrote no unexpected crashes. CockroachDB does like to crash when the clocks get out of sync. And this was something that I ran into a lot at first because our our stock NTPD was not very good at keeping them in sync. And I gather this has been a pain point for other folks.

Dave Pacheco: 44:30

But since we started using Kroni, that's just like a complete not problem non problem for us. Kroni keeps them well enough in sync that I don't think I've seen a single one of those crashes, knock on wood, since, we started doing that. So no unexpected crashes, you know, in this testing. Go see

Bryan Cantrill: 44:46

And and Dave, I'd like to point out that, I mean, I mean, you're such a model for us all in terms of the way you conduct your engineering, and one of the things that I loved about the way you did this, I mean, so much rigor with respect to not just the RFTs, but then the software you wrote or the evaluation, you got, like, in a in a repo in a in a repo that we that has been closed. And in fact, you had archived it, and we opened it up as part of opening up 11053. And I I mean, this I this is delightful, and, you know, the 3 of us were physically in the office together, and I was going into this repo, and I I mean, I'd been in there, you know, 4 years ago, but definitely not recently, And, Adam was kinda roasting me for being so surprised about how complete it is. And, you know, Adam was basically like, how dare you insult Dave by being surprised by by the depth of and the polish on this evaluation repo. And then, Dave, you walked over and you were looking at it, and then it actually gave me great solace that you yourself were surprised at your own level of rigor in this work.

Dave Pacheco: 45:53

Well, yeah. In the write up, I I was I was like, oh, there's a lot of graphs here. I gotta think It does take a little while to, like, do, you know, to get that data out. But, you know, I

Adam Leventhal: 46:02

thought it was really poor. Chat g p t was doing that for us.

Bryan Cantrill: 46:05

Yeah. Exactly. That's right. This is this is all a handcrafted human generated analysis.

Dave Pacheco: 46:12

Yeah. Man, I wonder how chat gpt would have changed that. Draw the draw graphs of this random data. Go parse this. Figure out how to parse this data and draw me some graphs.

Dave Pacheco: 46:24

Yeah. I mean, so the report was very detailed, and it's like, yeah. It hasn't a lot. But, I mean, that came out of the pain of having experienced this, at at Joyant. And in particular, you know, some of the problems that we met we mentioned some of the really horrible problems at Joyant, but some of them were not.

Dave Pacheco: 46:40

They don't sound so bad. It was like, okay. This auto vacuum was running for 3 days. Actually, auto vacuum is running for, like, a month on some shards, and latency is degraded, like, 40%. And you're like, okay.

Dave Pacheco: 46:51

But, like, this thing that was taking a 100 milliseconds is now taking a 140 milliseconds. It's not so bad. And it's like, well, if your goal is to move, like, petabytes of data from one place to another, it's like, actually, that's a huge decrease in throughput, and it's a huge problem. So it made me appreciate, like, how important it was to actually look at those graphs of latency. And and also, like, the small spikes and dips and stuff like that too.

Dave Pacheco: 47:13

Like, those can have a pretty big impact. And I I just wanted to see if we were gonna see some of those same effects. And we did see latency degrade in some cases, like in you know, I have to go back at the data to look at where it was, like which of these operations it was. It wasn't for, like, failures, transient failures and stuff like that. But I think some of the expansion and contraction use cases would see these latency degradations.

Dave Pacheco: 47:34

But, it wasn't it wasn't enough of a problem at the time.

Bryan Cantrill: 47:38

Well, and we we spent some time trying to investigate that. Right? I think that those are the Fair answer. Outliers. Yeah.

Dave Pacheco: 47:44

Yeah. We had 2 rounds of this testing, and the first round actually didn't go that well. We had things would basically just stop for like 2 minutes or something like that while these operations were going on. And I think we concluded that they were CPU starved, and maybe IO starved also. I can't remember.

Dave Pacheco: 48:01

And we ended up, fortunately, we were able to just, like, bump up a bunch of the resource limits and that helped quite a bit. This is also when I discovered the crazy what I feel like is crazy behavior on AWS, which is and I think Adam, you were, like, completely unsurprised by this. But it was like, when you provision a new when you reboot a VM on AWS, you get like a whole bunch of IOPS for free, like super fast IOPS. Yeah. And the idea is basically that they wanna make booting fast.

Dave Pacheco: 48:29

They like, whatever level of IOPS you're going to get from a disk of a given size, if you were to boot a typical VM at that IOPS rate, it would be pretty slow to boot. So they just give you a whole bunch of IOPS fast upfront. But they don't really know when you've booted, so you just get like a fixed number. It's like a whole lot of them. And so we had all these tests where it's like, things are great for, like, 4 hours, and then all of a sudden it just craters.

Dave Pacheco: 48:57

And it was because I just hit the limit that, like, however many IOPS they thought I needed for booting, I hit it 4 hours into the test, and then things cratered. Is that I can't remember if that's actually it must be in the the write up from the first testing notes. But that was crazy to me. I mean, is that something everyone just knows?

Adam Leventhal: 49:14

No. I think that's one of those things that you only learn the hard way. Like, even even as you're saying it, even now that it's on the podcast, now that we've told people, people are gonna hit it the hard way and say, oh, yeah. I remember Dave told me, but I had to learn it the hard way.

Dave Pacheco: 49:31

Yeah. It's and it's also very it's subtle because, like, there's nothing that goes off and says, you hit this limit and now it's just gonna be slow.

Bryan Cantrill: 49:38

Oh, yeah. No. Right. You can tell,

Dave Pacheco: 49:40

like, if you go into the the cloud metrics thing and, like, look up the throttle metric or whatever, like, you can confirm that this is the case, and I did confirm that. But thank you for pointing me to that, Adam. But anyway yes. So we we did have to run a couple rounds of tests. We did find a couple of problems, but we were able to get to a pretty good point when we understood that problem and, and had the right amount of resources in HBM.

Bryan Cantrill: 50:07

So we the one question coming from the chat is that, in r p53, it doesn't call SQL per se, but it mentions that you we need transactions, and we need asset semantics, atomicity, consistency, integrity durability. We and obviously, Jepsen would indicate the consistency is important to us, And the question is kinda curious about the reasoning there. The like, do you,

Dave Pacheco: 50:32

Yeah. So the the consistency is sort of easier to think about. Like, if you if you were using something like EC 2, or an I mean, any API where you're gonna go provision a VM, it would be pretty weird if you could create a VM and then list your VMs and it wasn't there. Or, like, create a VM and then get its state. Is it booted yet?

Dave Pacheco: 50:51

And it's like, actually, I don't even know about it. So strong consistency was important for just the user experience of using our API and being able to configure things and getting a consistent view of whatever it is you've done. Similar example would be a firewall rules. Right? If you imagined an eventually consistent API for firewall rules where you're, like, putting a bunch of firewall rules, and then the networking in your VMs doesn't do what you expect.

Dave Pacheco: 51:15

And you go check the state in the API, and it's showing you something that's old or might be old and you don't know. That would be a pretty awful experience, I think. So that's why I would say consistency was really important. It's just what we've done.

Bryan Cantrill: 51:27

Consistency is really Yeah. And especially then you want it's just very and this also there are not so many elements in this database that consistency is unreasonable. This is not, you know, this is this is not exabytes of data. So That's true.

Dave Pacheco: 51:42

That that was a huge help. It's certainly relative to the joint use case. It's a much smaller database, and it's probably much smaller throughput. It's certainly much smaller throughput now, but I think even for a long time, it probably will be.

Bryan Cantrill: 51:53

But we care a lot about it. We we care about its consistency. We care about its availability. Yeah. And we because if it's not consistent, there's just or it's eventually consistent, that just casts such a long shadow, up the control plane.

Bryan Cantrill: 52:09

And just end up with and then it's, like, also, I mean, how do you when oh, if it's not there, just, like, wait a little bit. It's, like, well, okay. So at what point is that kind of pathological? It's, like, I don't know. I just gotta wait a day this time.

Bryan Cantrill: 52:23

I don't know. They're not sure. And it didn't mean then you are kind of, like, training yourself to not really explore provisioning failures or long tails of provisioning. We we want provisioning to be really quick, really robust, and to always succeed or fail very explicitly. Like, we don't want to have that kind of gooey property that it's like, nah, it's kind of in a bad mood today, so maybe, like, you know, go take a walk and see if it's done.

Bryan Cantrill: 52:46

It's like, nah. No. No. Thank you.

Dave Pacheco: 52:49

Totally. The the sort of transactions are it's a little bit more abstract. I'd have to think about a better example. But, like, some examples would be, you know, if you provision a VM, we want to find the resources for it, which, you know, most basically includes, like, the sled we're gonna put it on, and we wanna allocate those resources. And that's a complicated decision.

Dave Pacheco: 53:10

Actually, disks are even more complicated because we need to find 3 sleds that have enough storage on them. And enough and that's that storage needs to be on disks that are currently in service and believed to be working and blah blah blah. So it's like a complicated set of constraints. And then there's like an optimization around, like, which of the sets of ones do you want to actually pick, and then you wanna commit to that. So like that looks like a CTE in SQL where you're basically fetching a bunch of state and then picking a couple of them and then, like, committing that state in the database.

Dave Pacheco: 53:39

So that makes sense, I think, why that would be in a transaction.

Bryan Cantrill: 53:42

Another example explain what a CTE is in SQL?

Dave Pacheco: 53:46

Yeah. CTE is a common table expression, and it's, basically a way of doing a but well, I think of it as a way of doing a bunch of things in sequence in SQL. Because SQL is sort of declarative, sort of. Where you're basically like, you know, if you're writing a select, you're describing what you want the output to look like. But it is useful sometimes to be able to give names to some of the intermediate values.

Dave Pacheco: 54:07

And so what a CTE lets you do is, like, with some name as some query, go execute some other query that uses that name in it. It's like kind of a basic thing just to be able to do a bunch of things in a row, basically. And they all happen in a transaction. Maybe a better maybe I don't know if this is a better or worse example of using transactions is that we're I'm working on a system called reconfigurator where we the system basic it's kinda like the reconciler pattern. We talked about this a little bit last week where the system is looking at the configured like, what the operator has configured, what they want the system to look like, and what it actually looks like, and making a plan that we call a blueprint that's like moving forward to the next step.

Dave Pacheco: 54:48

And so the Nexus instances that the control plane instances are looking like, potentially independently looking at that, and then making a decision about what should happen and trying to create a next blueprint. And then they will try to make that the next blueprint that the system is trying to work towards. But it's conditional on other things not having invalidated that those choices. So that's where we also want to use a transaction to say, basically, make this the new target blueprint if this other one is the current target because that means nothing has changed since I made all these decisions. So that hopefully, that makes some sense in terms of, like, why we're using transactions.

Dave Pacheco: 55:25

There's just a lot of decisions like that.

Bryan Cantrill: 55:28

Right. Right. And and another I mean, an example where consistency is really important, actually. And we talk I mean, you know, cue our episode from last week in terms of sagas where it'd be really hard to do that

Dave Pacheco: 55:40

without transactions and consistency. Yeah. So so

Bryan Cantrill: 55:43

what have been kind of our experience? So we we we decide to go cockroach, What were some of the actual some of the the the drawbacks? We got we got the thing working.

Dave Pacheco: 55:55

I if I'm gonna ask

Bryan Cantrill: 55:55

them yeah.

Dave Pacheco: 55:56

One or two more things to the the testing that we did that I think were really huge.

Bryan Cantrill: 56:00

Yes.

Dave Pacheco: 56:01

The cluster always converged to the right level of replication and came back online without operator intervention. No matter what I did to it. That was really huge for me. I mean, having just come off of a couple of years of fighting these issues. That was really huge.

Dave Pacheco: 56:15

And it also clearly communicated when data was under replicated and what it was doing to fix it and, like, what the progress was like on that. And it was it was just it was a breath of fresh air compared to what I had been doing. So that was huge.

Bryan Cantrill: 56:29

When it tells you that we're we're we're in the the kind of the design center for this thing. Someone just cared about that kind of tooling and that kind of visibility.

Dave Pacheco: 56:37

That's right. That's exactly right. It makes it made me feel like the people who built it understood some of the complexities that at least we didn't understand when we chose Postgres for for the joint use case. And, and that made me feel better about it. And the experience I'd say has been pretty positive.

Dave Pacheco: 56:55

I don't think we've had too many major issues with it. The cup the couple that come to mind, we did Ben and I ran into some horrible problem around partial indexes being corrupted, which turned out to be a known issue that had been already fixed in the in the latest version. We were able to upgrade to it and, like, that that was great. I mean, it was good to know that they had already found it. I mean, it was scary.

Dave Pacheco: 57:16

And I did this demo at demo day where I just, like, did a bunch of selects and updates in such a way that it produced obviously wrong data. So that was definitely scary, but, you know, they take it seriously. They'd had a cockroach technical advisory for that. I think that was the point where I started subscribing to those and making sure that we were aware of all those kinds of problems.

Bryan Cantrill: 57:35

Do you have that issue? Do you we'll we'll dig that up.

Dave Pacheco: 57:38

I can I can definitely dig it up? The other one is Yeah. Client side retries were a bit of a were pressing. We had to go figure out. And this is definitely this is a little different than Postgres.

Dave Pacheco: 57:51

I think so. I'd have to actually think about that. So CockroachDB is generally Postgres compatible, but it is distributed under the hood. And everything you write goes through Raft and, you know, winds up on a couple of nodes. And you can have failure modes that are at least much less common than on Postgres where you are trying to make some change and CockroachDB is or sorry.

Dave Pacheco: 58:12

You're trying to make some change that ends up spanning more nodes than a simple change would. And in that process, some of the stuff changed out from under you. And the thing you tried to do is no longer valid. Does that sound like rambling or did that make any sense?

Bryan Cantrill: 58:28

Yeah, it makes sense.

Dave Pacheco: 58:29

It's a, it's a little bit like optimistic concurrency control. It's not exactly that, but what's happening under the hood is it's basically making a bunch of changes conditional on the underlying state not having changed on the other node. And if the underlying state does change, then it basically aborts the transaction with an error saying, sorry, you just need to do this again. And if you run it again, it should work. Which is like a little cheesy, but it's a pretty pragmatic trade off, I think.

Dave Pacheco: 58:52

Because the alternative is to try it by itself, and then you have, you know, silent latency bubbles. But it actually can retry it by itself when they are simple transactions. Like if you just if you're able to issue the whole, you know, begin, select, insert, update, whatever, commit in one go, then it can retry it by itself, and you don't have to think about this at all. But we, for better and worse, use a bunch of this pattern that I call interactive transactions, where in Rust, we will write, like, begin the begin transaction thing. And then we will go issue more SQL.

Dave Pacheco: 59:24

But, like, our client is involved in that. It's not like we're issuing all that stuff to the database at once. We're having a conversation with the database with the transaction open, and then committing it. And that it that opens you up to a much longer window where this can happen, and it also means that the database can't retry it by itself because it can't replay that conversation. It doesn't know what you're gonna say.

Dave Pacheco: 59:44

You know what I mean? Doesn't know what what sequel to replay because it was dependent on whatever results you got back from earlier queries in the same transaction. So that was a thing that we had to go build some mechanism for that. And, I mean, I don't know if I I'd say pain point in terms of our usage of it more than anything. Because it is actually pretty well documented.

Dave Pacheco: 01:00:03

That that this is a thing, and that there's there's these two ways of dealing with it, and you might have to invest some client side work if your, language doesn't already have it. And then we had to do that.

Bryan Cantrill: 01:00:15

And then how I mean, so, the autonomous operation has so far been okay. We haven't had,

Dave Pacheco: 01:00:24

Yeah. I'm trying to think it. Like, I don't remember So I would say we probably don't have enough monitoring to know that it's not crashing transiently and, like, coming back really quickly and working. You know what I mean? But, like, we I don't think we have, like, dozens of core files from production systems.

Dave Pacheco: 01:00:39

We certainly like, I cannot ever remember getting to a system where Cockroach was down, and we couldn't bring it back up, or or even where it was down.

Adam Leventhal: 01:00:48

Yeah.

Dave Pacheco: 01:00:48

Like, I think it's just basically stayed up. And and as I say that out loud, it makes me worry that we're not that the things we're trying to do are just in its wheelhouse. We haven't hid it in the places where it's not ready for it. But that's that's helpful enough. Right?

Bryan Cantrill: 01:01:04

And this Yeah. We don't trust anecdotal evidence including our own examples.

Dave Pacheco: 01:01:08

Absolutely. Absolutely. But that was never true at Postgres at Joyant. Even in the like in development, we'd all of them all the time we'd run into cases where it had just, like, decided replication was broken and you had to, like, figure out what was wrong and then manually fix it. So, like, there were tons of cases where it would just, like, fall down and not come up without your intervention and that just hasn't been the case.

Dave Pacheco: 01:01:29

So that's been pretty good.

Adam Leventhal: 01:01:31

And, Dave, the 2 places where we have been questioned data integrity turned out to be not related to Cockroach at all. I think one was when you came on the show 2 years ago to talk about sort of called the YMM registers, and, like, basically, a a gap in in our operating system at Helios. And then the other one being, a misunderstanding of cancellation with Rust async. And both of those sort of kind of looked like, data corruption, but it was not the cockroach cockroach's fault.

Dave Pacheco: 01:02:07

Yeah. That that second one was very wild. And turn I mean, so that one was we we walked up to a system that was broken. I think it was our preproduction system, our dog food system. And it its log had indicated that it had done a bunch of work that was not in the database.

Dave Pacheco: 01:02:26

And, like, it's simple enough code that, like, there's like, it's an insert only thing. Nothing ever deletes anything from it. It's like, how can this not be there? And that turned out to be a client side problem where we were our connection pool basically had a bug where we would check out the connection, we'd begin a transaction, then we'd encounter an error that would cause Rust async cancellation to happen, and we weren't aborting the transaction. And so anything else that tried to go do something with that connection was operating in the fantasy land in which this transaction was going to complete.

Adam Leventhal: 01:03:02

Like an alternate universe.

Dave Pacheco: 01:03:03

It's absolutely. And which is actually kind

Bryan Cantrill: 01:03:05

of amazing. It's a different timeline.

Dave Pacheco: 01:03:07

It would run for hours like that. And, like, that's that's kind of a problem. I mean, I don't know that Postgres would have done worse with that, but, like, I mean, it's a problem for the client. But it's also a problem for the database because you end up having all this data that is sitting there not referenced by anything, but it still has to hang on to it. And if you'd signed up doing it This this this the hell of

Bryan Cantrill: 01:03:25

a transaction you guys have going on. But alright. Yeah. But another one, come on in. Coming.

Bryan Cantrill: 01:03:29

I'm using,

Dave Pacheco: 01:03:30

like, a terabyte of storage, like, hundreds of gigabytes of memory for this thing. Like, are you sure you wanna do something with it?

Adam Leventhal: 01:03:36

I'm gonna close this transaction at some point. Alright. Whatever.

Bryan Cantrill: 01:03:39

Alright. No. I guess not. Alright. Give me some more.

Bryan Cantrill: 01:03:41

Fine. I'm here. Yeah.

Dave Pacheco: 01:03:43

That was a big eye opener for us about cancellation. But fortunately, even though it looked for all the world at first, like, oh my god. The database completely eaten everything.

Bryan Cantrill: 01:03:50

Yeah. I mean, did you I wish you kinda lift the breath a little bit when you first realized, like, it's not in the database. You're just like,

Dave Pacheco: 01:03:58

oh, no.

Bryan Cantrill: 01:03:59

It's happening again. It's like data corruption week was last week. This can't be happening right now.

Dave Pacheco: 01:04:05

Yeah. Yeah. I was

Bryan Cantrill: 01:04:07

Yeah. It's I the and I Adam, I I totally forgotten about the the the transaction being left open because the certainly, the debugging odyssey that we that Dave you described 2 years ago is, I I think because there was also, you know, some vibes online of, like, oh, like, self supporting on a database. Like, you bet that should be overnight. Well, you should go listen to this. And, I mean, we what kind of we we knew, and I actually this is kind of an important detail about the decision around Cockroach, and Dave, I thought it was very interesting because you're very prescient about this, in RFT 110.

Bryan Cantrill: 01:04:46

We knew we were self supporting at the outset. We were not, we knew we were running it first of all on an Alumos derivative in terms of Helios, and it was I mean, we were having to do a lot of work to get it to work there. We were using the open source builds. Dave, one of the things that you have found that have been busted have kinda been like little things with the open source build. Is that correct?

Dave Pacheco: 01:05:09

Yeah. So, the context here is that it's been since we've since we have used CockroachDB, I. E. Not just for the last week, it has had parts that were proprietary or more proprietary, I guess, and parts that were more open. And we had decided early on we wanted to stick to the open source stuff, and they had a build target for that, which was amazing.

Dave Pacheco: 01:05:30

Yeah.

Bryan Cantrill: 01:05:30

I was

Dave Pacheco: 01:05:31

like, that's perfect.

Bryan Cantrill: 01:05:31

That's very helpful.

Dave Pacheco: 01:05:32

Yeah. We can go build just the OSS thing and not and may and know that we're not accidentally using some of these features that, you know, then we'd be in violation of their license. So but occasionally, they would break it. I don't think they were testing it that heavily. I'm not sure if they just didn't have a CI for it or something.

Dave Pacheco: 01:05:47

But, yeah, I found a couple of issues like that. But they were pretty good about fixing them.

Bryan Cantrill: 01:05:51

And I think that, you know, part of, like, when we were, you know, when when people wonder, like, why would you adopt beautiful license software? Well, part of it because, like, they did have things like this that made it easy for us to not accidentally trip over into their CCL, poorly named CCL, the Cockroach Community license, which is basically like proprietary stuff, basically. And the the their, abuse of clause was pretty crisply defined. So there were a bunch of reasons why it's like, no, this feels like this is not OSI approved open, but it is actually sufficiently open for us, for what we need to go do. And then, again, we are we know we're self we're born self supporting on this.

Bryan Cantrill: 01:06:29

We're just not gonna get help is not on the way, which is one of one of the many oxide mottos. We are we we for many of the things that we do, we are very much on our own, so, we were not and this is no different than that regard.

Dave Pacheco: 01:06:45

Yeah. I mean, I should say we filed a bunch of issues with CockroachDB, and working with your engineering has always been a pretty positive experience, I would say. And also, we all know from our experience that our use case, our platform is different enough that what what did you say earlier? Nobody cares about your workload like you do. Like, we're not gonna get, this we're not going to have the experience of, like, we can just write to you with this issue, and you're just gonna be able to fix it for us.

Dave Pacheco: 01:07:12

Right. It's not been our experience with other software, in general, but also on our platform and in its use case. So, like, yeah, we were gonna be self supporting to a large degree, but hopefully with the help of the engineering team, and that's been pretty true.

Bryan Cantrill: 01:07:25

Right. Right. We've been and even when we had, I mean, we had this issue, about, you know, where we went had to go very deep on the YMAM registers, not being restored properly. And, you know, we did reach out to them. And, you know, they were, like, reasonably they were they were as earnest as you could kind of expect them.

Bryan Cantrill: 01:07:50

I mean, they they they and, you know, they were kind of reasonably saying like this, I don't know, might be an OS issue, but they I did it felt like they weren't blowing us off. It felt like we were taking it seriously, which is great.

Dave Pacheco: 01:08:01

That was wild because that particular issue could cause all kinds of bedlam, like, absolutely all kinds of can't happen bugs would happen. And so, before we figured this out, I wound up filing like 4 or 5 different bugs that were like, from their perspective, somewhere between you got pretty unlucky in some race condition and this really can't happen. And, as you said, they replied to them all pretty earnestly. They were like, okay, well, you are seeing this, so here's what I would try. Here's what I would go validate.

Dave Pacheco: 01:08:28

Go validate that that that the Go test suite passes on your system or that, you know, this memory test works or whatever. And it was helpful. It wasn't just, like, go fetch rocks. It was, like, actually helpful stuff.

Bryan Cantrill: 01:08:39

And, a question in the chat about, how we're feeling about CRDB written in Go. Go is obviously not our first choice for things. A lot of StudioB is written in c plus plus, which is important. It's not merely Go, but, you know, I think it's on the one hand, you know, not great. On the other hand, it's something that we need to work on the platform.

Bryan Cantrill: 01:09:02

So it's, like, I think it's kinda helpful to have a a a current use case for it. I don't know. Dave, what do you feel?

Dave Pacheco: 01:09:09

Yeah. I mean, definitely didn't feel great. I don't like it. But that's not I'm not gonna not use something because of that. I mean, in in the course of debugging that horrible signal handler YMM thing, there were a lot of cases where I ran I was I spent a lot of time with the Go memory allocator, and there's an awful lot of stuff that was runtime invariance that could not be verified at compile time.

Dave Pacheco: 01:09:33

And I was thinking the whole time, like, Man, if this were Rust, this just couldn't happen. But I need to rule all these things out because I know this thing is broken and I don't know which of these things is broken or why or anything. It was definitely like a whole lot of convention, I guess is what I'd say. It was like working by a lot of convention. Like a lot of c code bases do.

Dave Pacheco: 01:09:49

Right? It's like Right. Everyone has to use this function correctly. There are a 100 callers. I'm gonna go check them all.

Dave Pacheco: 01:09:57

You know?

Bryan Cantrill: 01:09:59

So it is all Sorry. Go ahead. It's also worth mentioning that one of our challenges with Cockroach is that we it has been robust enough that we're kind of far behind, and Eliana's got a great RFD. Eliana, no. You I I actually I did not I will make that one public.

Bryan Cantrill: 01:10:16

I didn't wanna make that one, but Eliana was out of vacation, and it felt unsporting to make someone's RFD public when they were out. I don't know. It just felt like, I don't know. Talk about social contracts. Oh, yeah.

Bryan Cantrill: 01:10:24

And I just felt like it would be violating 1 if I made 469 public. But I did, I there's nothing incriminating at 469. I just wanted to kinda get your blessing. So, Ilyana, just give me your blessing. So I I will make that public later today here.

Bryan Cantrill: 01:10:36

But one of the things that that, we had is like, okay, wow, we're really kind of far behind. And we needed to find a way to kinda catch up. Because I think one of the things we were concerned about, and it actually highlighted another constraint that I kinda mentioned at the top that, like, we actually need to support the stuff. Support kinda pretty old versions of this stuff. And, you know, we're on what 22.one.9.

Bryan Cantrill: 01:10:59

Right, Dave? And Yeah.

Dave Pacheco: 01:11:01

Bryan Cantrill: 01:11:01

think isn't that, like, out of their support window even if we were we I mean, we're again, we're we we we were born self supporting, but I think that is That's right. I think that is too old, which actually in this case has been I think that that that has actually been helpful to us, because they are not I think that it is the supported releases that are being relicensed. If it were supported, it would also be subject to this relicensing. I would say, because I didn't mention this at the top, I I don't like the fact that the patch releases are being relicensed, and I think that that is the one bit of this. There are 2 bits that such as there is a social contract between Cockroach Labs and Oxide, or Cockroach Labs and users of CockroachDB, that of the open source vendor CockroachDB, and or the general public.

Bryan Cantrill: 01:11:56

There are 2 kinda social contracts that I think are being stretched a bit. 1, I I I I don't like the fact that the patch releases are because the patch release is you're taking a release that was first released, you know, a year and a half ago, and you are now, like, there are potential regressions in there, and the fix for the regression is now gonna be proprietary.

Adam Leventhal: 01:12:14

And More like a security update? Like, that kinda sucks.

Bryan Cantrill: 01:12:16

Security update. I know. That just feel it it just it sucks. And it it I do think, like, I don't think that that's okay. I don't think Cockroach Labs should get a free pass on that.

Bryan Cantrill: 01:12:27

I think that does care at a bit of the social responsibility. Like, if you didn't wanna do it, you shouldn't have open sourced it. And so I think that that is

Adam Leventhal: 01:12:36

I think that's what they're saying too. Like, we shouldn't have open sourced it.

Dave Pacheco: 01:12:39

Right. We should

Bryan Cantrill: 01:12:40

I think this this guy gets it.

Adam Leventhal: 01:12:41

That's This

Bryan Cantrill: 01:12:42

guy gets it. That's exactly that's what our blog entry says. And, practically, that was aligned in the blog entry. We should have open sourced it.

Adam Leventhal: 01:12:48

The prevailing pieces. Yeah.

Bryan Cantrill: 01:12:49

I know. You did open source it. That's the problem. You did open source it. So I think that that is not great.

Bryan Cantrill: 01:12:55

I also think and I I sorry. This is the other one that I did not say at the top, but I I also think is not great about this. The all of the the blog entries and the discussion of the move to the busil in, I believe, 2019 has all been deleted. And I think that that's lame.

Adam Leventhal: 01:13:14

Oh, I think That's gross. Yeah.

Bryan Cantrill: 01:13:16

Because I think that, like, look, you had a bunch of rhetoric in 2019 about how Amazon was and it was a called out as Amazon over and over and over again that Amazon was, you know, taking your product, taking your project and turning it into a service and that you had to punish Amazon with this clause. But, like, that's okay. That that's not where we are now. So clearly, like, this is not about Amazon anymore. And I just think, like, just to own all that And just say, like, yeah, that was then and this is now, and, like, this is, like, how our thinking has changed and expanded, and I I wish they would be a little more more transparent about that.

Bryan Cantrill: 01:13:49

I think that's a bit lame. But again, I don't think that they they because the the other reason I think that that to be clear, they definitely don't owe us anything. It's like, also, Cockroach could be when we make the decision to play Cockroach, Cockroach could go out of business. Right? And that did happen to, like, rethink DB, You know, rip rethink DB.

Bryan Cantrill: 01:14:10

Adam, did you do you remember me during my my rethink DB? I know Dave was Dave was like, oh my god. You were so

Adam Leventhal: 01:14:17

No. I was gonna I wasn't gonna pick it up because it was so embarrassing. But no. I remember how excited you were.

Bryan Cantrill: 01:14:23

We I

Adam Leventhal: 01:14:24

think we're I think it was general excitement.

Bryan Cantrill: 01:14:26

I liked I really liked rethink DB. And and rethink DB was, it and they have, like, a model that is, like, not an unreasonable model of it's a GPL, and if you want not to be a GPL, then, you know, contact us. And, like, well, I'm, you know, using this for this, you know, cord up database for Thoth. And, the, but RethinkDB went went out of business and went out of business extremely abruptly, and left. It doesn't matter who you were.

Bryan Cantrill: 01:14:55

Like, everybody, whether you're a commercial customer or an open source user, you're like, okay. Like, that would that now we're, what now? And, yeah, Adam Adam's linking to the, on the the CNCF, and I I I and I, you know, I, really am am grateful to Dan Cohn who has since passed away, but, Dan, who was the executive director of the CNCF, also was like, oh my god. It's like, folks that are using this are now really stuck. And, Dan went and found the, the folks that had bit the the investors because, I mean, it Rethink TV really went into the side of the mountain.

Adam Leventhal: 01:15:36

It went quickly. Right? Like, very quickly. Quickly.

Bryan Cantrill: 01:15:39

Yeah. It I mean, it went so quickly. Yeah. Free start up advice. If, like, the VCs aren't paying attention, that's not a good thing.

Bryan Cantrill: 01:15:47

It's a bad thing. And the, the the VCs had kinda mentally zeroed it out, and I just don't think the executive management over there realized it. And by the time they realized it, like, the the the altimeter was was, the terrain was coming up, the warnings were were blaring, and that thing just, like, went out of business within, you know, like, very abruptly. And, they found one of the, the investors who had had received all the IP Dan did, and he's, like, actually, I kinda know these guys, let's just call them up. And they had no idea what they had, and Dan bought it for did I put the dollar figure in there?

Adam Leventhal: 01:16:29

Sure. He did. What was it? Remind us.

Bryan Cantrill: 01:16:32

It's a good idea. Don't read the blog entry. I'm sure it's in there. Just tell us what it is now. No.

Bryan Cantrill: 01:16:35

It's in there

Dave Pacheco: 01:16:36

for sure.

Bryan Cantrill: 01:16:36

No. I was going so I think it was, like, they wanted 1,500,000, and Dan's, like, how about 5,000? And they were, like, how about 10,000? Dan's, like, so so, you know, I, and again, you know, Dan has unfortunately since passed away. And, you know, Dan and I didn't see eye to eye in everything, but I was always incredibly grateful for what Dan did for the Rethink DB community.

Bryan Cantrill: 01:17:02

And I just like just getting it out. I mean, even if it was lived only as kind of an artifact that people could now just, like, freely borrow from. Right? If there's any if there's value in there, you can and it's, you know, it's it's it's there. It's it exists.

Bryan Cantrill: 01:17:15

People can use it. It's not something that is current necessarily, but, so that could happen too. Like, if you're using open source, like, you just can't assume that, like, and a commercial entity does not have an obligation to you to exist.

Dave Pacheco: 01:17:28

Mhmm.

Bryan Cantrill: 01:17:28

And they can fly into the side of the mountain, and you can't really have, like, you know, what are you gonna have? Like, a blog entry explaining how they shouldn't have flown into the side of the mountain? It's like, yeah, they they they know that. And, so I you always have to know that that the the the self support is gonna be. That's what you have to you'd have to take that responsibility.

Adam Leventhal: 01:17:48

And that I mean, that is one of the benefits of of open source.

Bryan Cantrill: 01:17:53

Yes. Extremely important is that that the and I think that, you know and again, you know, Adam, you and and Dave and I have have been through a couple iterations of this, and have seen software survive the corporate vessel that it's in. And so the the corporate vessel separating from the software is not as shocking to us as it might be to other people. And it's also not as foreclosing as it may be to other people. The idea of, like, oh my god, like, you'll never self support this.

Bryan Cantrill: 01:18:23

It's like, yeah, go listen to our episode onboarding up. We're not worried. Mhmm. You know, not to not to minimize how hard it is to support a database certainly, but I just like I feel we can do it. And, you know, I I I think that we, and we I think we we said as much in in 508 that, we intend to allow other people to run Cockroach as we're running it.

Bryan Cantrill: 01:18:46

That does not necessarily mean it's a community fork. I don't I don't think it'll be interesting to see. I'm not sure if a community fork is gonna spring up on this one. I gotta tell you. I'm not sure.

Adam Leventhal: 01:18:55

Yeah. Unclear. I mean, I've not seen a bunch of noise about it. I wonder what the community use of it is like.

Bryan Cantrill: 01:19:00

Yeah. So David, I guess maybe to bring us up to the present, I mean, what was your kind of reaction to the the news? Did you see this coming? What was what was your thought on Thursday morning? How did you learn about it actually?

Dave Pacheco: 01:19:11

I I checked my email, like, early in the day. Like, I might have just been in bed just, like, opened the phone, opened the email, and I had an email, I think from the CEO. I mean, it was a blast email to everyone who was on any of their mailing lists being, like, great news for everyone who's using CockroachDB. And I was like, oh, no. Oh, no.

Bryan Cantrill: 01:19:30

Oh, no. No.

Dave Pacheco: 01:19:32

I mean, there's really not great news. News, everyone.

Adam Leventhal: 01:19:34

Good news, everyone. Oh, totally.

Dave Pacheco: 01:19:37

Yeah. And definitely disappointing. I mean, I I understand, you know, they're making an argument that like this is better for a lot of people who want some of the enterprise features, but could never have gotten enterprise license and now that maybe sort of free, I guess. But you still have to get a license key and, like, prove that you are entitled to a free one. I don't know.

Dave Pacheco: 01:19:59

I don't I mean, it seemed from the Hacker News thread that some folks anyway are pretty happy about it. I don't know what most people's experience with it is. I was definitely not happy to learn of this. And so

Bryan Cantrill: 01:20:11

I'm embarrassed that I obviously, I did not get the email because, you know, you're on their list and I'm not. And, but I I am embarrassed to to say that I learned about this on Twitter from Adam Jacob. So, Adam, you were you I'm like I look at Adam's I'm like, oh, no. And then I'm like, I wonder if anyone's talking about this internally. Of course, every by the time, like, okay.

Bryan Cantrill: 01:20:31

I should've got a I should've gone to water cooler first. Everyone has been, the bad keeter, I think, had had seen this very early in the morning, and it had ice on the East Coast. So, people were all avidly discussing it. I'm, like, I I guess, I think my my first movement in the morning should be probably to our own internal chat and not to Twitter. But, Adam, there you go.

Bryan Cantrill: 01:20:52

You're a news agency. So and and then in terms of our options, just to bring it back to kind of 508, Adam, Dave, I mean, I think we what what were your thoughts in terms of our options there?

Dave Pacheco: 01:21:05

Well, gosh. I mean, I mean, from first principles, we could obviously, like, rip cockroach out. That's one option.

Adam Leventhal: 01:21:13

Oof.

Dave Pacheco: 01:21:13

An incredible amount of work. And it is basically working for us. So, you know, that suggests option 2, which is to just jump off the train at the point we're at. Sort of related option is upgrade to the latest free free ish thing that we can and then jump off. But you raised the point that that you raised earlier, which is, like, there's this real risk that in updating to the latest, we encounter a new problem.

Dave Pacheco: 01:21:41

I mean, it may not even be a regression. Right? It might just be some change in behavior that happens to be broken for our workload and, like, not be able to get the fix, which is pretty scary. So, I, you know, we kind of concluded on it. And, you know, you could probably speak more to the option of, like, we'll just pay them and license it, but you kind of talked at the top about why we don't really want to do that.

Bryan Cantrill: 01:22:03

Yeah. Which which doesn't enclose all commercial relationships by the way, and there may be, you know, I I think in my kind of ideal world that there'd be a commercial relationship that allows us to get, like, some of those patch releases open source, you know, or I don't know. I think that we, we'll see. But I would love to have a way of getting us to something that is, that we, because we have seen issues that were, you know, we've seen the upgrades have been good news for Cockroach, not bad news, which broadly, which is not the the case for all software. And then someone's the Ileana had said in the chat that, like, that I think very aptly, like, it's gonna be hard to self support a database, but it's not the hardest thing we've done.

Bryan Cantrill: 01:22:47

And someone's like, alright. Well, what metrics do you use for that? And I I which is a a very valid question. I would say it's not the hardest thing we've done because it doesn't feel like existential risk, and we've got, an extraordinary amount of software expertise, at Oxide. We've had some really scary things that were that did feel like, well, if we don't find this, if if the the NIC doesn't come out of reset, like, we don't have a company, or if the CPU doesn't come out of reset, we don't have a company, or if they if we can't get that lowest level of platform enabled software working, we don't have a company.

Bryan Cantrill: 01:23:24

And so we we've we've had some things that have been really, really scary. But but we we we've also I mean, god, it it I I would be loved to get your perspective on it, Dave and Adam, but I feel with Inyo Allen Hansen, our our colleague, has has pointed this out a couple of times that every time we knock one of those things down as a team, I think it improves. It it it it gives us confidence that we can knock down the next thing. And, you know, I I don't think it's, I just feel like, man, if we pull together, like, we can we can do it. We can figure it out.

Bryan Cantrill: 01:24:00

Yeah. Yeah. It it would just not say, like, it could be really, really hard and, but if it's really hard, I mean, that's podcast content. Right?

Adam Leventhal: 01:24:08

Now you're talking.

Bryan Cantrill: 01:24:09

I got it. I gotta tell you. You know, as someone like you know, Adam's looking at the numbers, he'll tell you stuff kills.

Adam Leventhal: 01:24:16

Yeah. And, like, I totally agree with you, Brian, and and agree with the Alan's sentiment there. And, like, this is not to diminish the complexity of these things, but we're we're pretty good at software. We're pretty good at debugging, and Cockroach has been great. It's been Yeah.

Adam Leventhal: 01:24:32

Very solid for us. So I

Dave Pacheco: 01:24:33

mean, that's the other thing. It's like we've been running this stuff in production for years now. Like, it's not it's I mean, I guess tomorrow we could have some horrible, impossible to debug problem, but, like, it

Bryan Cantrill: 01:24:44

It obviously will. I mean, so We're chasing it pretty hard. We're we're sitting here, like, on our knees begging for it. And the gods are, like, this guy again. And how many times do we have to punish this guy before he stops doing this?

Bryan Cantrill: 01:24:55

Like, oh, order

Dave Pacheco: 01:24:57

it up. If you're gonna make this judgment about, like, how much work is it gonna be to support this thing based on the last couple of years of self supporting it, Like, I mean, you would look to that. Right? And it's like, well, we actually haven't spent that much. The most effort we spent by far was what turned out to be the OS bug.

Bryan Cantrill: 01:25:13

Yeah. Absolutely.

Dave Pacheco: 01:25:13

Well Yeah. By by, like, an order of magnitude or 2.

Adam Leventhal: 01:25:17

Not only if we not invest too much, we haven't even sort of taken the time to upgrade us to a more modern version, in part because we hadn't haven't had to. And then that turns out to be pretty lucky where it's it's gonna roll over to be Apache 2 in 9 months as opposed to, you know, multiple years if we had been a little more diligent about upgrading.

Bryan Cantrill: 01:25:41

Absolutely. We've got so, I mean, the fact that it is actually gonna be that we are with every passing day, this is getting closer and closer to being and and again, we're not worried about the bushel. They've constructed that clause well. We are not violating it. It would not be.

Bryan Cantrill: 01:25:53

So we're abiding by it. It's all fine. But, boy, it being Apache 2.0 would be great. All Yeah. Totally.

Bryan Cantrill: 01:25:59

Yes. I did, like, you see Steve O'Grady's tweet on that. That was pretty funny. It was

Dave Pacheco: 01:26:04

just like

Adam Leventhal: 01:26:04

What did he say?

Bryan Cantrill: 01:26:05

He said, well, oxide would be the first company that would be deploying something that basically had timed out into Apache. And again, you were once again, oxide goes its own way. Like, you bet maybe oxide goes its own way.

Adam Leventhal: 01:26:19

You know, the the busil I had previously thought that, that flip over to open source was kind of baloney, was kinda window dressing. Yeah. Yeah. Like like, kind kinda thinking, well, who wants 3 year old software anyway?

Bryan Cantrill: 01:26:32

But turns out People are running 2 years and a 3 months old software, actually. That's right. Are counting down the months to 3. Yeah. Just totally.

Bryan Cantrill: 01:26:41

I feel the same way. I felt like and, actually, you know what? It was funny. It's, like, I actually thought it was 5 years. Mhmm.

Bryan Cantrill: 01:26:48

And it was only when I met, I think, in our internal chat was being like, oh, wait a minute. We're I was doing the math. I'm like, wait a minute. It's only 3 oh, wow. Wow.

Bryan Cantrill: 01:26:56

3 years feels like a really long time and is in some regards, but, like, is not. So, we, so I I think that, you know, we our our path is pretty clear here. And, Dave, I felt like that was as a result, like, this just doesn't feel that anxiety producing relative to the other crises of Right.

Dave Pacheco: 01:27:16

I think that's the other thing. It's like, is you could reasonably ask, like, is this the 10 year plan? Are we are we in 10 years? Can we run-in Cockroach 22/19 on the go version that's there? And, like, I'm sure we will run into problems.

Dave Pacheco: 01:27:29

Like, you know, the TLS that is in there is no longer allowed by any of the clients that we wanna use or something like that. But, like, these are not, oh my god, we need to rip everything out and rewrite, you know, how we use the control plane data store problems. You know what I mean? These are like, okay, we'll we'll deal with those problems when they come up.

Bryan Cantrill: 01:27:48

That's right. We got some work to do. And but we always we always have we always have some work to do.

Adam Leventhal: 01:27:52

When a customer has a 100 oxide racks, you know, spanning multiple geos, and are we, you know, revisiting the decision, like, conceivably, like, when that happens. Sure.

Bryan Cantrill: 01:28:03

Yeah. I mean and, Andrew, I'm not sure if Andrew is work lurking, but, our colleague, Andrew Stone, is just, like, itching to write the RFP on what the he's like because we will I mean, like, the next step for us, unless I mean, if we if we hit something that invalidated Cockroach with a new use case or what have you, I mean, we'll probably, like, go back to the rubric a little bit, Dave, and I think it would be, you know, I I think we would kinda reevaluate everything at that point, and Yeah. Who knows what it what that what that world would look like. And will we be writing our database before or after we're doing our RNA sick? Only time will tell.

Bryan Cantrill: 01:28:37

These are these are 2 two things that we believe we will do in the fullest of time, but not today. Well, this is Dave, this was this was great, and I great to talk about the kinda the whole rubric. I I really would encourage people to, to check out the not just the rubric, but all the evaluation you did. I I really do think it's a model, for NS Dave, especially because you didn't, like, think of this as all being public, but I think our disposition is to be is to get as much of the stuff out there as possible.

Dave Pacheco: 01:29:12

Yeah. If you really go digging, you can find my day to day notes and all the stupid missteps I made along the way. I discovered them there. But I also was very fun going through the history again.

Bryan Cantrill: 01:29:22

It was fun going through the history again. Yeah. It was funny just because I also feel that, like, damn, this analysis is really good. And then, I you were like, god, was it because I don't remember the time you're saying, am I spending, like, too much time on the analysis here? It's like, no.

Bryan Cantrill: 01:29:35

Definitely not. This is an extremely good use of time, and gave us a lot of confidence in the decision. And, you know, I think it's been terrific to get these RFPs out there. And I'd and thank you for both 53 and 110. 2 RFPs that I definitely, like, appreciated at the time.

Bryan Cantrill: 01:29:55

But there's always like an r you know, the RFP that you read, and you're like, oh, yeah, this seems great. And then all of a sudden something changes where it's like, everything's load bearing. In some cases, like, for me, that's like, okay, I'm actually implementing this bit now. Or, like, now this has become an issue that is a hot issue the way it is with cockroach. So now I'm like reading every word of this RFP.

Bryan Cantrill: 01:30:14

I would love to tell you that I read 110 as closely then as I did after Thursday, Dave. But, the and just remind

Dave Pacheco: 01:30:23

me, Kat, this is this

Bryan Cantrill: 01:30:24

is really good. What's that?

Dave Pacheco: 01:30:25

I said, I wouldn't blame you for not letting you know the first time around.

Bryan Cantrill: 01:30:28

Well, just like it all it'd be just because, like, your your your metric is totally different. Right? In terms and and it was just great to go back and and reread all that. It was also great, and this is a a segue, a teaser for Adam, we are doing something nearly unprecedented for oxide. I'm not gonna say totally unprecedented.

Dave Pacheco: 01:30:47

Yeah. I mean, it's only

Adam Leventhal: 01:30:48

maybe happened once or twice before.

Bryan Cantrill: 01:30:50

It's happened once or twice. We are scheduling an episode in advance. Only 1 week in advance, but still, I mean, it's pretty amazing.

Adam Leventhal: 01:30:57

Whole week. Yeah.

Bryan Cantrill: 01:30:58

Whole week. An entire week. Entire week. Of course. This is going to be the week that, like, Oracle acquires Broadcom, and and and they both merged the Linux Foundation or something.

Bryan Cantrill: 01:31:11

And there's something that, you know, something absolute, like, absolute spectacularity is basically guaranteed this week. The, like, Eric Schmidt's gonna shoot his mouth off again. He's gonna go out and be more excited diatribe. He's gonna be like, oh, wait a minute. Oxide and friends can't talk about it on Monday?

Bryan Cantrill: 01:31:26

Okay. I wanna get back to that return to the office thing I was getting on.

Adam Leventhal: 01:31:29

That's right.

Dave Pacheco: 01:31:29

It's a

Adam Leventhal: 01:31:29

good time to bury all your news, folks.

Bryan Cantrill: 01:31:32

Bury all your news. Get it out there now. The, but we are gonna talk about RFTs, and we're gonna talk about the the why we developed RFTs, how we've used them. But very importantly, we're gonna talk about the actual the the mechanics that we've developed, and our colleagues Ben Leonard and Gusus Mayer are gonna join us. I'm really excited about that.

Bryan Cantrill: 01:31:54

Because something that was really important here is our ability to make individual RFTs public, and that way it relies on the RFT site, which has been great. And so we can go through and and share this stuff with you, and that's been really really helpful. And it's been our disposition for more and more RFTs. Although we do wait for people to get back from vacation, so Leon thank you for your but I'm really looking forward to that one, and Adam, because Ben is in the the UK, and you are gonna be in Europe, so this is gonna be a European friendly episode on RVs. So we're gonna do it at 9 AM Pacific on on Monday.

Bryan Cantrill: 01:32:34

It's gonna be, which I understand to be 5 PM in London and 6 PM in in Europe. So

Adam Leventhal: 01:32:40

Allegedly.

Bryan Cantrill: 01:32:40

Yes. Allegedly. So I apologize for to all of Europe for having to catch us on time delay or in tranec waking up at at 3 in the morning or whatever it is in in Armenia, but we, this is gonna be a great one. I'm really looking forward to that because that RFD site has been really, really important. As someone's saying in the chat, it's gorgeous.

Bryan Cantrill: 01:33:00

It really is gorgeous. It looks great on mobile, and it's been there's a lot of technical meat behind it, and then that's been extremely important for the way that we do engineering. I think we you know, Adam, as you and I are kicking around titles, it's a we are calling it the backbone of Oxide.

Dave Pacheco: 01:33:15

Yeah.

Bryan Cantrill: 01:33:16

It really is.

Adam Leventhal: 01:33:16

It really is.

Bryan Cantrill: 01:33:19

Alright. Dave, thank you very much for for joining us. May you join you should join us next week too on RFTs, man. I'm just you know, I know, but, really really appreciate you joining us in the evening hours here. Yeah.

Dave Pacheco: 01:33:30

Thanks for having me. It's been great.

Bryan Cantrill: 01:33:33

Really great stuff. So, on the one hand, it was a bit of a a downer to get the news, but on the other hand I think it was a a real opportunity, you know, when one door closes another door opens, and it was great for us to be able to get all the stuff public out there, describe what we're doing, and really know, our own fate. It's always always nice to to know where you're going, so. Alright. Adam, when when we talk next, you will be on a on a distant shore, and we will be talking RFPs.

Adam Leventhal: 01:34:02

Looking forward to it.

Bryan Cantrill: 01:34:03

Alright. Thanks, everybody. Talk to you next time.

Whither CockroachDB?

Whither CockroachDB?Whither CockroachDB?

More episodes

Whither CockroachDB?

Whither CockroachDB?

Chapters

Creators & Guests

What is Oxide and Friends?