Oxide and Friends

Bryan and Adam were joined by Oxide colleague, Ben Naecker, to talk about OxQL--the Oxide Query Language we've developed for interacting with our metrics system. Yes, another query language, and, yes, we're DSL maximalists, but listen in before you accuse us of simple NIH!

In addition to Bryan Cantrill and Adam Leventhal, our special guest was Oxide colleague, Ben Naecker.

Some of the topics we hit on, in the order that we hit them:

If we got something wrong or missed something, please file a PR! Our next show will likely be on Monday at 5p Pacific Time on our Discord server; stay tuned to our Mastodon feeds for details, or subscribe to this calendar. We'd love to have you join us, as we always love to hear from new speakers!

Creators & Guests

Host

Adam Leventhal

Host

Bryan Cantrill

What is Oxide and Friends?

Oxide hosts a weekly Discord show where we discuss a wide range of topics: computer history, startups, Oxide hardware bringup, and other topics du jour. These are the recordings in podcast form.
Join us live (usually Mondays at 5pm PT) https://discord.gg/gcQxNHAKCB
Subscribe to our calendar: https://calendar.google.com/calendar/ical/c_318925f4185aa71c4524d0d6127f31058c9e21f29f017d48a0fca6f564969cd0%40group.calendar.google.com/public/basic.ics

Bryan Cantrill: 00:00

I'm sorry. I'm being being so weird.

Adam Leventhal: 00:03

With your, with your pseudo podcast hosts?

Bryan Cantrill: 00:07

With the auto generated podcasts, I I really it's it's mesmerizing. I can't stop. I I I need to it's really weird. So, Ben I'm sorry. I was also sending Ben all these, obviously.

Bryan Cantrill: 00:18

Yeah. You you have now you've lost your your gravitas voice. I don't know. I I'm picking you a little less touristy right now. I get kinda back to back to clown college, it sounds like.

Adam Leventhal: 00:29

That's that's right. Yeah. I need to, hang out with a bunch of germy kids again to to get it back. So

Bryan Cantrill: 00:35

To get to get it back. Yeah. So we, just forgive the the the context before the context, the friend of the pod, Simon Wilson, had this, blog entry that was on Hacker News describing this new feature from, Google, from their notebook LM, where they can generate podcasts from arbitrary material. And I've been entered entertaining myself by sending it RFTs and creating single podcast episodes on RFTs, and they've just been super weird. It's the only I mean, that's that's my one word synopsis of them.

Bryan Cantrill: 01:11

It's very uncanny valley. I don't know, Ben. What do you think? Are you, Ben?

Ben Naecker: 01:15

They get some things very right and other things very wrong. And then there's those things in the middle which are wrong, but you can't quite figure out why it's so weird or they just, like, slur the words in some strange way. It's very it's very uncanny. I agree.

Bryan Cantrill: 01:29

And they're very cheerful, and they are very interested in promoting whatever document it is that you've put in front of them. They definitely believe whatever they've just read.

Ben Naecker: 01:38

Yeah. Lots of sound bites.

Adam Leventhal: 01:41

Real, like, paid promotion kinda vibes. Like, you know, with those paid promotions where the click clearly, the folks have, like, spent, like, half an hour figuring out what it's about, but not, like, 3 hours figuring out what it's about.

Bryan Cantrill: 01:53

Right. Because it mean they can't figure out 3 hours. They got another one of these to do in another half an hour, but they're just kinda rolling them through. But it was it was also anyway, it was weird. It was it was, and the the RFPs, of course, I was feeding it were all the RFPs that we're we're gonna be talking about today.

Bryan Cantrill: 02:06

So it was was feeding it rfd 463, getting the podcast before the podcast, which I dropped into the chat there. So you could if you, if you wanna hear synthetic hosts describe I with also, like, weird taglines. Did you notice that? I've got, like, stay curious until next time. It's like, stay cure okay.

Bryan Cantrill: 02:27

Stay anyway. Very it was very weird. But, Ben, welcome. Thank you. So I I feel we've kinda stepped on an exposed nerve ending for the Internet here.

Bryan Cantrill: 02:38

I got a lot of people like this was am I wrong? Yeah. The I just to describe it as upsetting seems a bit too strong, but a lot of people are like a lot of people have issues with this.

Ben Naecker: 02:50

Severe skepticism is the word I would use. Yeah. They're they seem Yeah.

Ben Naecker: 02:56

Extremely Skeptical

Ben Naecker: 02:56

of of the premise, basically.

Bryan Cantrill: 03:01

Definitely skeptical. So what's, so we're talking about, r p 463, which is o x q l, which our synthetic podcast, pronounced ox quill over and over and over again. Ben, we've never pronounced an oxquill, I think. Right? I think it's oxquill, but now I can't unhear oxquill, unfortunately.

Ben Naecker: 03:22

Yeah. Please do not do that. If we it sounds like DayQuill or something is immediately what I think of when

Bryan Cantrill: 03:27

I It does. It's sounds like bovine DayQuill. Yeah. The the, but it's talk about our development of of a DSL. Maybe this is, like, in the way I'm phrasing it.

Bryan Cantrill: 03:40

Maybe I was too clickbaity with the way I phrased it. Is that a problem? When I was saying, like

Ben Naecker: 03:47

or something? What do you what do you mean? By calling it a DSL? No.

Bryan Cantrill: 03:52

You know, it's like, you know, I'm kind of thrown out a little little bait for the podcast. So my tweet is, when is a new query language necessary? And clearly, there's a decent portion of the Internet for whom the answer to that question is actually never. Thank you for asking.

Ben Naecker: 04:05

Really

Adam Leventhal: 04:06

18 query languages ago.

Bryan Cantrill: 04:08

18 query languages. So Exactly. It's like or you're just like, listen, if you're gonna invent a new query language, like, that's fine, but you need to get rid of one of these others. Others.

Bryan Cantrill: 04:16

Like, you need

Bryan Cantrill: 04:19

and this and and in particular, there was a quote tweet from Andy Pablo, distributed systems researcher at CMU. The the quote tweet starts, Brian is Brian is brilliant, comma, you're like, oh, no. Nope. That's not good. Nope.

Bryan Cantrill: 04:33

That's a bad start. Basically, if someone is insulting me, I know they may be warming up to agree with me. It's the old, actually, I agree with Brian. They're kind of, like, trying to, you know, establish their bona fides before they they they don't wanna imply that they would agree with anything I say, but they are but listen. Brian is often it's it's Kelsey's line.

Bryan Cantrill: 04:51

Right? I mean, I love Kelsey's line. Like, Brian Brian is wrong some of the time, but not this time. I love that. This is like Brian is brilliant, comma, is like this is the end of this.

Bryan Cantrill: 05:01

This is not this is going straight into the ditch. But it seems misguided for a hardware company to create a custom query language that no other tool supports. If you don't want SQL, you could have used promQL or promquel is the word. Would would the would our AI overlords, on their cheerful morning podcast, pronounce that pronounce that promquel? I don't know.

Bryan Cantrill: 05:23

I don't know. So it at least works with existing visualization tools. It's not 2010 anymore. Just use SQL. Burn.

Ben Naecker: 05:35

Yes. Justin there is doing a lot of heavy lifting as usual when that enters an engineering discussion, a technical discussion. Justin is doing a bunch of

Adam Leventhal: 05:44

work. Yeah. Of course.

Bryan Cantrill: 05:46

This is a this is an important observation that, yeah, when Just enters an engineering discussion, yeah, that you're right. Like that you're like, okay, talk to eyebrow. Just often does do a lot of heavy lifting. Yeah. So there's there's

Adam Leventhal: 06:00

another sick there's another sick burn, Brian. I just wanna make sure you didn't miss this one, which was on Hacker News a week ago, talking about the oxide query language. This has been a small hardware company is not only making its own full hardware and software stack, but brings that all the way down to telemetry query language, I get a lot of n a NIH vibes and question if any of these elements will get the attention they deserve.

Bryan Cantrill: 06:24

Do people think we're creating our own instructions in architecture? I use, like, am I the only one that I'm just gonna get confused why this is engendering this kind of reaction. Especially for, like, all of the stuff we've done. We've done, like, so much of our own I mean, we've gone our own way so many times over. It's like, this is the one that is a bridge too far.

Bryan Cantrill: 06:44

It's like, listen, you guys Yeah. Andy, have you

Adam Leventhal: 06:47

have you, like we have our own embedded operating system. You know this? Like, we're

Bryan Cantrill: 06:51

We have our own switch. We developed our own switch. Like that oh, no. That's fine. No.

Bryan Cantrill: 06:56

That that that makes sense. You know? You're doing your yeah. You're you're an embedded operating system, our own host

Adam Leventhal: 07:01

VMM, user LANs, like, I'm doing like, wait. Would you want us to I mean, it's gonna be a long tweet if we get into all of

Bryan Cantrill: 07:06

these. It is. And it's like but query language, no. Now now you've you've bridged too far. Yeah.

Bryan Cantrill: 07:14

Ben Naecker: 07:14

mean, that is that is his general research. So I feel like if he's gonna, you know, he's gonna gonna slot in on anything and talk about why not just use this. I mean, he's got several papers that are about why SQL is king and always will be. So I feel like it's, you know, his MO.

Bryan Cantrill: 07:28

Yeah. And also, like, we're also not, like, advocating for the elimination of any other query language. That's the other thing. I feel like we're we're not actually people can use whatever query. I mean, we're not we're kind of encouraged.

Bryan Cantrill: 07:38

It's it's just a DSL that we developed for our own use, really. So, Ben, do you wanna describe kind of the origin of how we got here? And I thought maybe we would, like, take half a step back and because we we've we've tried to open up a bunch of the surrounding RFTs that get us to 463 or why 463 is relevant.

Bryan Cantrill: 07:58

Do you

Bryan Cantrill: 07:58

wanna talk a little bit about, about 125 and kinda how we got to ClickHouse? Because I think that's a that's a big part of why why not promQL promQL.

Ben Naecker: 08:12

Yes. That's a good I mean, that's a good point. So, sure, I can talk about 125 in the background for that. So this was, you know, a while ago now. This was most of the work I did.

Ben Naecker: 08:21

This was basically right when I joined Oxide and, we had this lots of other areas have been derisked. We had decided on CockroachDB for the control plane database. We had made many other kind of large technical choices. But we didn't really have a lot of, you know, depth on the the story behind metrics. And so I came in to take over this RFD 125 from, Dave Pacheco and and other people and really kind of carry it through the end.

Ben Naecker: 08:50

So the the big goal there was was really picking the key technologies that we were going to use for the telemetry system. There were a bunch of things floating around, but mostly we kind of focused on the database in that, in that RFP. And there were, I think, you know, half a dozen or so candidates, Things like Influx DB, which is sort of an old standard, time series database, widely used in corporate environments like banks and other financial industry companies. Then there are a couple of other systems that have a lot of other wide usage too like Prometheus. And then there were I think 2 alternatives that really came to the front after the sort of initial read through what the what the systems were designed to do and that was the system called Victoria metrics and ClickHouse.

Ben Naecker: 09:44

And the real reason that we picked these 2 was ultimately because of the story around replication. We need to replicate the data, and those are the only 2 that really had any kind of replication story, at all. Influx at the time at least really was not, did not have anything compelling here. And, we could build our own system. We definitely looked at that using something like some sort of message bus to distribute the data to a bunch of databases.

Ben Naecker: 10:15

But that I think was, there was a lot to to chew. A lot a big bite to chew. And so we ended up kind of going for, this one called ClickHouse, ultimately on the story of the replication. And then when it came down to it, I did a bunch of analysis comparing it to Victoria metrics. And it it just sort of handily beat it in terms of resource consumption, performance, flexibility.

Ben Naecker: 10:38

I mean, it's just really a rock solid system. And, you know, it's funnily enough, one of the main reasons we picked it was because it does support just SQL, out of the out of the box.

Bryan Cantrill: 10:49

Yes. We are not anti SQL, just to be clear. Right.

Ben Naecker: 10:53

Right. I mean, I I we and and I think, you know, I did a bunch of experimentation basically, you know, asking what happens when you do things like kill a node in a cluster, when you do that while you're submitting a bunch of queries, while you're also inserting a bunch of data. And I think it basically never skipped a beat. And so it was pretty it was pretty impressive as a piece of technology. And it's only gotten better, I would say.

Ben Naecker: 11:18

They've done a really good job of open sourcing things in the last few I mean, it's always been open source, but they've become a more open organization when they spun ClickHouse off of Yandex, which is where it started. Now it's own organization. They published several papers about the internals of the system. They really are extremely responsive. On Github for example, we had a number of issues.

Ben Naecker: 11:38

We asked them to to float a number of patches for us and they did it. I mean it's been it's been nothing but good things, basically, with ClickHouse. I think I've been extremely impressed with the database as a whole.

Bryan Cantrill: 11:49

And just to be clear, like, what data are we storing here? Because this is this I think people have kind of a natural question, like, what ramifications does this have for the user of the rack? This is really this is an the the decision around ClickHouse is really an implementation decision.

Ben Naecker: 12:03

It is an implementation decision in a lot of ways. You know, I think I mean, so the data that we're talking about is mostly numeric. It's mostly scalars, although importantly, not all scalar values. We have histograms and histograms are sort of first class citizens in oximeter. And most of the data is basically I mean, I can well, so just sort of backing up a second.

Ben Naecker: 12:28

Really what you're querying is a bunch of key value pairs, which are the fields. So these are just names and then typed values for them. And these identify the stream of data points that you're actually interested in. And then, a bunch of timestamp comma value pairs basically. It's really kind of the fields are the identifiers really describing the context for the data.

Ben Naecker: 12:53

There are things like the sled that a particular piece of data came from, the compute sled that it came from, or the project of the user visible resource, like an instance or something like that. And then the, the timestamps and the values are sort of the raw data that are actually generated by, you know, whichever component is is producing the data, kind of at the at the very first first layer. So most of them, like I said, are scalar values like integers, floats. We also support things like strings, which, you know, one of the reasons for not using something like promQL is that basically everything is a string or a float. We support a number of other types.

Ben Naecker: 13:32

UIDs are extremely common in the control plane because that's basically how we identify anything. And so being able to support UUIDs, IP addresses as sort of first class typed objects is really valuable. And then, like I said we also have the support for histograms which is very experimental in most systems including Prometheus by the way. It's basically, you can they they exist. They have support for, you know, basic forms of histograms.

Ben Naecker: 14:03

It's not very well tested. I couldn't find a lot of examples for it, especially at the time I was I was deciding in RFD 125 on which system to use. It just did not seem up to snuff with the rest of the system as a whole, which is generally quite good. But the support for histograms, which we knew we would need kind of all over the place is, not quite there, I would say.

Bryan Cantrill: 14:25

And what is the data that we're storing in this?

Ben Naecker: 14:30

Say that again.

Bryan Cantrill: 14:32

What kind of data are we storing? I mean, but, like, what what is this gonna hold, basically?

Ben Naecker: 14:36

Yeah. So we are really storing kind of 2 or really 3 pieces kinds of data. We're storing things that the customers will see. So think, instant CPU usage is like a good example or instance disk usage, you know, when they're actually right to a disk. We record the number of bytes that they've written.

Ben Naecker: 14:55

We bump the counter that tracks the total number of writes. And these are all user visible things. We have similar metrics for user visible instances, network data. So things like number of packets in and out, for example. Always right now we have layer 2 data.

Ben Naecker: 15:10

So packets and bytes in and out and errors. So that's one thing. The other thing is our own data, which, you know, encompasses things like at one point, we had, an issue with retrying transactions, to CockroachDB. And so one of our colleagues, Sean, put together a time series that keeps track of the number of times we retry any query and the duration of that retry. We have things like power and temperature and current and all those sort of, you know, the key environmentals for the entire rack as well, which are collected from the service processors.

Ben Naecker: 15:47

And then we have sort of server level metrics as well. So things like, for example, for Nexus which is the main control plane service that people interact with through the front door of the API, there are things like histograms for request latencies, broken out by, say, the the operation that you're performing or the the status code of the response. So, you know, I think I think there's kind of those those 3 big pieces, service level metadata or, you know, service level data, kind of physical environmental statistics and then user visible stuff as well.

Bryan Cantrill: 16:19

And and we're not precluding anyone from slurping this data out and shoving it into some other system that they might want that has some other query language that they No. Or just

Adam Leventhal: 16:32

For sure.

Ben Naecker: 16:32

So I

Ben Naecker: 16:32

mean, when we when we when we, wrote RFP 25, I think we all basically agreed that the first thing anyone is gonna wanna do with the data is just pull it. The raw data, you know, unprocessed, unfiltered to the extent possible, just getting the raw data. And when I so I I think I'm I made a allusion to it at the beginning, but, you know, the that just use SQL or just use some existing system, I think misses the fact that I tried that. We tried basically a very, very big this is the 3rd iteration of query systems that I've built on top of the data that we have. The first one was basically something just to prove that I got back the data I put in so you could fetch the raw data and that was it.

Ben Naecker: 17:20

There's no analysis. There's no nothing. The second version was actually a sequel prototype where you would literally write sequel and I would translate it to a massive SQL query against ClickHouse, which ClickHouse dutifully did. But it would obviously take a long time. And then this is the third.

Ben Naecker: 17:41

And I think it was definitely a key part or key aspect that we needed to support just pulling raw data, which you can you can still do.

Bryan Cantrill: 17:51

And this is one in the chat that's asked about OpenTelemetry in particular. I know we spent some time looking at that.

Bryan Cantrill: 17:56

By the

Bryan Cantrill: 17:56

way, what was your take on OpenTelemetry?

Ben Naecker: 18:01

I I've on it. So I think it's a good idea. I think that it's never quite felt there to me to be to be honest. I feel like it's, yeah. So this last comment I think is right.

Ben Naecker: 18:11

I think it it kind of sucks, but it is a standard, which is true. It feels a little bit like the lowest common denominator for telemetry data. And I I think, it's a you know, we would need to spend a lot of time to build a way to translate our data model the way it's actually stored into something like OpenTelemetry. And it's never really felt like a lot of value with the obvious caveat of people expect it. And I think that's a very good point and one big criticism I admit against something like OXQL is that it is a custom DSL.

Ben Naecker: 18:52

And so something like OpenTelemetry for all of its flaws would allow you to interoperate. I think I've been okay paying that cost so far because in my experience when there's sort of 2 things. It's unclear to me. I have yet to come across a a customer who says, no. I must be open telemetry and there's no way around it.

Ben Naecker: 19:18

If we show them an HTTP API, they basically go, okay. That sounds fine. There's my raw data. I can do some sort of, you know, processing on that to put it into the system that I have now. I think that's basically expected for almost any type of telemetry system, that there will be some amount of translation between an existing, you know, data format and the one you actually store it in.

Ben Naecker: 19:39

And I think that's to me, that has suggested that we should build something that works for customers where they can get raw data. I think that's extremely important, and also serves our own needs, which we do enumerate in our f t 125 around things like product iteration, diagnosing active problems, all of these things that we've talked about before. And those do not, I would say, rely on broad data. They almost always rely on things like aggregations, grouping. I mean, you know, our experience with something like DTrace has just shown again and again that the ability to actually ask questions of the system is invaluable.

Ben Naecker: 20:15

And I think we needed we knew we needed something like that. And it was not clear with something like OpenTelemetry that you could get that. So I think that was a big a big reason for me, to to kind of dive into it.

Bryan Cantrill: 20:29

Well, as you point out, like, the the the amount of work that it's not clear that it's, like, who we're saving work for by by doing something that's, like, lowest common denominator. That I I don't know, where I mean, and and especially, I also think that, like, even I don't DSL even yes. I mean, this is a DSL, but this is a very little language, 0XQL.

Ben Naecker: 20:51

So far

Bryan Cantrill: 20:52

It's not like I mean, you're not like learning Haskell or something here. I mean, this is just like I mean, I I don't know.

Bryan Cantrill: 21:00

I I

Bryan Cantrill: 21:01

feel that and, I mean, I I get the kind of the the the we're obviously careful about that. But I think and and then you made an important point that this was not our first conclusion. Our first conclusion was not like, hey, we should do a query language. Good thing there because there is nothing else out there, we will invent our own. It was more like, let's try to make everything else work.

Bryan Cantrill: 21:19

And coming to the conclusion that we're just having to contort ourselves too much, and it's actually very liberating to be able to do our own DSL. And it's not like it it's I mean, this was not I mean, not to downplay the amount of work involved, but this is also like you're you, you know, you're using a bunch of tooling that makes it really much easier to develop a a DSL than maybe it has been historically.

Ben Naecker: 21:45

Yeah. So I that's definitely true. I mean, so the the yeah. No. We def I am not trying to downplay the amount of work involved in so, I mean, the query language is an enormous undertaking.

Ben Naecker: 21:55

Right? I mean, we've got a parser and a query planner and an optima I mean, it's a lot of work, right, to do all of this. And so I I do think that, starting small this is actually part of the reason that I I quite liked the where we started with o x q l was the the piped nature of it does make it fairly straightforward to add incremental features, which I feel like is notorious problem with something like SQL because the fact that you add a small operator or some other kind of, you know, layer on top of your your query and suddenly your query now changes from a simple select to oh, you either write it with this massive subquery or a CTE or some other complicated syntactic construct. And it feels like it, you know, it muddies the sort of 2 things. The interpretation of it just by looking at the query.

Ben Naecker: 22:48

I mean, I think you can kind of look at 0xQL queries and basically interpret what they're gonna do in English pretty easily, which is very difficult to do with syntax with, the syntax of something like SQL. And then also just in terms of implementing it ourselves and adding new features, putting typed operators together in such a way that I can add a new one. And today, the way you would do that is by implementing the syntax and then implement basically a function in Rust that turns around and takes in a table or 1 or more tables and spits out 1 or more tables. And we do pay for doing that processing in Rust today. But the whole point of implementing the query language in the way we have is that we can push more things into the database as they become important first class operations.

Ben Naecker: 23:34

And I think, you know, I mean, so somebody earlier had mentioned prequel to pipeline relational query language, which is a language that compiles or transpiles, I guess, to SQL, but is written in a much more fluent syntax. And we definitely looked at that initially. And I think that was ultimately not decided not to to go with that for all the same problems, that you're basically building a DSL that very few people have experience with. And, you know, you kind of need to choose which subset of the language to support. But one of the key things that I did like from that and took from it for OXQL is is that pipeline nature that you can pass in new data, in this relatively self contained way so that adding features is is pretty cheap for us.

Bryan Cantrill: 24:16

Well, it just feels like also with the the the and I again, I I don't know how much of, like, the this is just like the UNIX having seeped into my own DNA, or is is UNIX kind of an outgrowing of the DNA that exists in all of us. Right? I've got no way of actually differentiating those 2. But it does feel like the pipe syntax to me feels pretty clear about intent. And it feels like we can also then do a lot of things on the back end to optimize that, where we've got because you're being so clear about your intent and you're not having to do kinda unholy things, we can actually make sure that that we can optimize those use cases.

Bryan Cantrill: 24:52

It might not even I mean, you can use an a an entirely different ClickHouse feature. You can I mean, it Yeah? Just some of the things we that we were brainstorming about. Like, you could do a whole bunch of different things.

Ben Naecker: 25:02

Yes. I mean, we don't even need to hit ClickHouse or we can hit different tables. We can decide. I mean, and this is true to be clear. I think if you have a language like like, well, any sort of front end language that you compile down into something you write against the database.

Ben Naecker: 25:15

But the the nice thing for us is that it's much easier to understand and look at the query to decide which, say, which database table or tables to look at, if any. Right? We can decide to implement things by going to look at some materialized view rather than the original tables. And I think that would be much easier to do when you have a relatively small, simple kind of operator based language where you pipe things in and out of each other. I think it becomes much more practical to do that kind of thing than than it does if you're carrying, you know, many years of of, features in SQL.

Ben Naecker: 25:50

Or you have to pick which this is the other big thing I think is that I you know, you can we can implement SQL as sort of a front end. That's a language people would query. I think it's, like, pretty clear that you basically have to throw away 98% of the language if you do if you do that to turn it into useful data analysis tools against the data that we have. And it felt very weird to me to start from something where we care about almost none of it. We obviously don't care about anything other than select because nobody can write data using this path.

Ben Naecker: 26:21

So updates are totally out, or inserts are totally out. Right? Deletes are Transactions are out. For simplicity to start, I basically when I wrote the SQL prototype, I basically, you know, the only thing you could do is a straight select statement and that was it. You could do joins, but no sub queries, you know, things like, window functions, which I think are extremely useful for understanding time series data, you know, become impractical to to implement, using this this method.

Ben Naecker: 26:54

So I I just think, you know, it sort of became a pretty stark question of how much of the baggage of SQL did we wanna carry around if I was not gonna use any of it anyway.

Bryan Cantrill: 27:06

So it's Well, you'd seem to ask, like, what is the point of, like, you're not actually getting SQL compatibility when you don't you're not doing all these things. All these things have no relevance in Right. The in this specific domain. Right. But there's a reason we have domain specific languages.

Bryan Cantrill: 27:25

I just cannot emphasize this term enough because I think it is I think it is a great strength where you we can create little languages easily. I think we should not be resistant to that because you can you can I I I I think the kinda compatibility that you have by doing say let's say SQL, it's a Ben, as you're mentioning, it's like a false compatibility? It gives you the wrong intuition for the system. And it's like, this is actually sorry. This is not what the system is actually gonna do underneath it.

Bryan Cantrill: 27:55

Yeah. And, you know, what we're trying to do and I the other another question that kinda came in the chat is, like, wait a minute. It's okay. So you said that, like, all anyone's gonna do with this is, like, slurp it out. So, like, why wouldn't you just use some other protocol that people already know?

Bryan Cantrill: 28:08

It's like, well, that's all because that's all a customer might wanna do with it. We wanna do a lot more than just sort of it out. We wanna actually go and use it dynamically and, and be able to actually look at the look at the rack and ask questions of it. And so for us, we want something that is much more tightly tailored to that.

Ben Naecker: 28:27

Yes. I think that's a great point. I think there are 2 or 3 important, features that, gust pulling broad data does not support. One of them is debugging active problems. Figuring out why is the system behaving the way it is.

Ben Naecker: 28:40

And one really useful way to answer that question is to figure out where it's come from. What state it was in before you walked up to it. The recent history. The other big thing that we haven't really talked about is the idea of alerting and, making those alerts configurable in the same language that you would use to query them, is a strength that, you know, we put we got from something like Prometheus. Right?

Ben Naecker: 29:04

Which which, you know, does do that. Right? I think it's a very, very useful way to basically just say, hey. Here's this condition on which I would like to generate alerts, and here's what you do once that happens. Here's the threshold.

Ben Naecker: 29:14

Here's the, you know, the the prompt URL in that case. In our case, the SQL expression that one would trigger the alert on. And, you know, I think these are these are really valuable. And then the other really, you know, important thing that we didn't talk too much about is a much longer, sort of iteration cycle. In our f t 125, there's a section on it which is basically product iteration.

Ben Naecker: 29:33

You know, we can look at it and and understand things like, you know, over this year of historical data that we have, you know, how often did some component fail or how often did the power, you know, fluctuate outside of our tolerances for that system, you know. And and I think being able to do that, you know, really means you need a language to be able to understand that because you can't possibly sift through, let alone graph or, you know, display, millions of points. You need to be able to kind of ask questions like, how many times did they exceed this threshold? What was the, you know, 99% behavior? And you, you know, you just can't do that if you're gonna only thing you can support is is pulling out raw data.

Bryan Cantrill: 30:15

Right. We wanna actually be able to query those things in RAC effectively. Right.

Adam Leventhal: 30:20

Yeah. It's worth mentioning that, you know, we do have these grand ambitions, I think, Brian, that you alluded to about what what what is possible. And we've already mentioned this earlier, but we don't wanna be constrained by the lowest, you know, common denominator. We didn't want the the query language the customers would use to explore the data to inform the kind of data we bothered to collect. And then, like, histograms, Ben, as you were saying, I feel like it's an area where we are, like, total zealots.

Adam Leventhal: 30:46

Maybe everyone's a total zealot. I just don't know it. But, like, we feel like real zealots in that regard. Is everyone a zealot?

Ben Naecker: 30:52

I I mean, if you base it off of the support in the telemetry systems that I was describing at the beginning, it doesn't seem like it. I mean, I don't think influx even at least when I looked at it, it was not clear they had the concept of an of an array or of a histogram in general. You know, like,

Adam Leventhal: 31:09

as I'm

Adam Leventhal: 31:10

I'm reminded a little bit about how Cliff accuses us of being really into post mortem debugging. I feel like we are similarly really into histograms. I think just because we've seen their utility,

Bryan Cantrill: 31:23

you know, so, you

Adam Leventhal: 31:25

know, in in so many domains.

Bryan Cantrill: 31:27

Well, okay. The post mortem debugging, I'm I'm it's taken me, like, quite literally decades, but I'm willing to acknowledge, like, okay, I am somehow, like, an outlier in with respect to society. Like, this is, like, some sort of software kink that I have with respect to being willing to, like, debug a system from its static state. It's static and memory state. So okay.

Bryan Cantrill: 31:47

Fine, but weird. But, like, histograms really? Like, are we are we histogram radicals? I just didn't real I I are we

Adam Leventhal: 31:54

I mean, 2 I mean, I feel like both of them, everyone should feel this way, but and maybe people already do on histograms, but I just feel like, I mean, it's been saying with these other query systems, like they're, they're not necessarily embracing them as 1st class primitive, whereas it was very, I mean, I think, I don't know if this is fair to say, but Brian, a bunch of my thinking on this was informed by the Fishwork system, the, the ZFS storage appliance and the analytics system that you built there. Most of it meant much of it founded on or or grounded on, you know, on histograms, on on on these, distributions to, like, visualize what's going on in the system. I mean, that is strongly informed up here. And, like, we knew we wanted to build the system where that was front and center.

Bryan Cantrill: 32:38

Yeah. For sure. And I think if we wanna actually now that I know that that that this is apparently a strange idea around histograms, you chase this sort of to Bond, Wick and Lockstead using histograms for, looking at at lock times, spend times, block times, and looking at that actual distribution of data as opposed and so, I mean, honestly, it was when in part of the reason we have aggregation as a first class operation in DTrace was because of our eye on lock stat and replacing it. It's like, okay. This is important.

Bryan Cantrill: 33:07

This idea of getting a what the distribution of data looks like. And so, you you know, I can't really the distribution of data looks like. And so, you

Bryan Cantrill: 33:14

know, I haven't really thought about

Bryan Cantrill: 33:14

that, that, that, you know, maybe that was, that felt very commonsensical, but maybe I'm not giving Bonnook much, enough credit. Maybe that was very iconoclastic to be thinking in terms of the distribution of data. I mean, you was a stats concentrator, was a stats grad student.

Adam Leventhal: 33:29

I mean, I feel like it's only been 10, 15 years since, like, people all recognized that the average was not a number you really wanted to talk about in a play at company. But, but it's but that's fairly recent.

Bryan Cantrill: 33:42

I do feel they're right. I I I think that you're right. That is recent, and I guess we've always just kinda run with sets of people for whom the like, understanding the distribution of data has always been really, really important. And I it's it just feels very natural that that would be a first cost operation.

Ben Naecker: 33:56

I I think it's probably, the the usefulness of the average or the lack of usefulness of the average, I think it's probably pretty obvious if you've, actually been in a debug systems where there is large variance or these distributions are just not, not normal or not even single modes or anything like that. Right? Or have these really really heavy tails where other measures of central tendency are useful or even none of them. And you actually just care about things like the max or the min or some other sort of extreme value. I think the reason I mean, the focus on the the mean is because it's useful computationally.

Ben Naecker: 34:32

Right? You can easily compute it. It's very easy to understand. You know, it's it's linear in that if I add more data, I can just keep track of the running mean. I don't have to, you know, keep the whole history of the data where something like the standard deviation or the median is is not really possible to do that.

Adam Leventhal: 34:46

And we teach 3rd graders how to compute it. So how bad can

Adam Leventhal: 34:49

it be?

Ben Naecker: 34:50

And it's very it's very you know, I mean, in in sort of statistics, you know, research and I mean obviously less so research but kind of the the statistics that most people have been exposed to the idea of of computing a mean, you know, is really really natural and you kind of think about it below. It's just very easy to do and so I'm gonna do it. Obviously you sort of forget all the assumptions that, well, maybe your data isn't normal. And it just becomes, I think as a practitioner, you become more ingrained in understanding why it fails to really give you a useful answer.

Bryan Cantrill: 35:23

Yeah. When also if you look at the average, you don't actually understand that much about your data.

Ben Naecker: 35:28

And Right.

Bryan Cantrill: 35:28

You think you do. It kinda gives you the sense of, like, oh, this is what my data looks like. And it's like, well, you know, maybe. You may but You may wanna get just a little more fidelity in what that actual what the distribution looks like before you include the battery data looks like.

Ben Naecker: 35:43

And so somebody asked about these other, you know, the second, and third, and fourth moments, the variance skew and kurtosis for example. And these are really useful, but they actually are pretty computationally intensive to compute. And histograms are very easy, and you can see those well enough, I would say. Those sort of, they actually give you I don't know if it's more information, but they give you different information, right? So there's this idea, something called callback Liability divergence, which is basically a statistical measure that tells you the difference between 2 distributions.

Ben Naecker: 36:11

And it's very easy to see when you do it, when you plot these things visually. You just sort of see the, you know, the the amount of overlap in your histograms. Right? If you plot them with basically bars that are transparent so you can see them, you know, or a group bar chart or something, it's very easy to see. But those the numbers are pretty tricky to either to compute or to sort of give you, a a useful measure of that divergence a priori.

Ben Naecker: 36:37

Right? So I may know what I mean by that is I may know that the kurtosis is useful, but only after looking at the data. And so I need to keep track of this whole distribution and a very cheap compact constant in memory time, you know, constant memory and constant computation way to do that is a histogram. So it's extremely extremely useful when you have potentially unbounded sets of data and you really can't pay that cost and you wanna limit the resource consumption and maximize the kind of understandability of your distribution. So I think it's extremely useful for those types of examples.

Bryan Cantrill: 37:12

And so obviously QuickHouse I mean, QuickHouse was a natural fit in part because of the way they thought about the problem.

Ben Naecker: 37:19

Yeah. So they mean they have first class support for arrays. They've obviously built a bunch of tools around histograms themselves. So computing a histogram of a column of data is something that that you can just you can just do. They have so they have a bit of a confusing way to do it, in that like most things with ClickHouse, they have aggressively prioritized performance.

Ben Naecker: 37:41

And so what that means is, for almost every operation there's a exact version and an inexact version and the default unless you ask for it is the inexact version. And for something like a histogram, that's also true where it basically tries to compute the b the bins for you, but it's gonna do its best and it'll be it won't be off by too much, but, you know, for for, for most things, it'll work. But if you really want the answer, you know, you have to compute the exact values.

Adam Leventhal: 38:09

I didn't realize that. Is that

Bryan Cantrill: 38:10

I didn't even get there. Yeah. Wow.

Adam Leventhal: 38:12

Do they give you bounds on their, you know, hand waving?

Ben Naecker: 38:17

They basically give you bounds on how bad the estimates for the bins are

Ben Naecker: 38:22

going to be in that case.

Adam Leventhal: 38:22

Interesting.

Ben Naecker: 38:22

For

Ben Naecker: 38:22

things like the, so then for example, they have a median or a, percentile, inexact, quantile, inexact. It does not hallucinate data. It's not making up points, but it's basically you might get all of your data grouped into 1 bin or another.

Adam Leventhal: 38:35

I just had the most embarrassing realization. I'm like, you know, I don't know why you'd need that. You probably only need it for things like analytics from the web, like clicks. Oh, God.

Bryan Cantrill: 38:45

I'm horrible. Here we are. A house a house of clicks. That's right.

Ben Naecker: 38:50

Yeah. But they do have, I would say a lot of array based tools, tons of function. They've got this sweet idea of basic so in normal SQL databases, you obviously have aggregations like the average. Right? Those are used all over the place.

Ben Naecker: 39:09

ClickHouse has this first class support for arrays and they said, well, how do we support that sort of thing? Well, they're just like, we're gonna make this idea of aggregation combinators so you can tack on things like average array, so the word array comes at the end of it, or, you know, min array and it'll apply the thing apply the aggregation that you've asked for to the array as if it were a bunch of items. So it's extremely extremely flexible with what you can ask it to do, how you can process. You can do things like map over arrays. You can, like, you have all of these higher order functions for doing filtering on arrays.

Ben Naecker: 39:41

I mean, it's extremely valuable. And having all of that is just AB is so, so, so, so useful for building a system like this on top of it.

Bryan Cantrill: 39:50

And so and we have not scratched the surface of that kind of stuff that we can go do. And I think we have part of the

Ben Naecker: 39:55

Yeah. Yeah. No. We're basically just doing select and some some averaging and some grouping.

Bryan Cantrill: 40:00

And and beg your pardon if I'm wrong, but part of the appeal of a DSL here is the ability to add some some functionality that would actually help us, express some of that that we can get out of ClickHouse.

Ben Naecker: 40:12

Yeah. That's right. So a consumer of that. Yeah. That's right.

Ben Naecker: 40:15

So as an example, we have this idea of an alignment table operation where you take time points that are close but not exactly evenly spaced, you know, roughly every second, but there, you know, there's a few milliseconds of of data on either side for every sample. So we have the notion of an alignment operation where you can say, okay. I actually want to register them, sort of snap them to a temporal grid to be exactly one second apart. And, the way you do that is by specifying how to group things that are within one second, for example, within that alignment period. So today we do that by averaging, which, you know, for all of its problems, averaging does have have a lot of uses.

Ben Naecker: 40:55

But we could, for example, do that by something like instead of taking the average within an interval, you could take the min within an interval or really any other linear operation you can imagine doing it. And you know, in theory, when we build that inside the database, that should basically be switching the aggregation function that ClickHouse uses from average to min. And it will be very easy to, you know, express these much more complicated operations with really a few small changes on top of the the framework of this kind of piped language.

Bryan Cantrill: 41:28

So a question that I think may be very catalyzing for people to answer to is that, is it a fair assumption that the main client of raw OXQL is oxide provided tooling, dashboards, alerting, etcetera? That's what my question is. Yes.

Ben Naecker: 41:43

Yeah.

Ben Naecker: 41:44

Today it's me is the short answer. I guess a couple of other, you know, developers. We, I mean, yes. I think I think the biggest initial consumer will be 2 things. I think that will be customers collecting it.

Ben Naecker: 41:58

So, we didn't talk about this, but we only store data for 30 days today. And we recognize that customers will wanna store it longer than that, potentially in some rolled up form, but we wanna give them the ability to do that. So I expect that people, customers pulling it into their own longer term storage systems will be one of the big things. And then the other will be, yeah, visualizations in the console in the web console. The day we have a few visualizations around things like disk metrics, those are built upon that first querying system that I meant where you basically select the raw data.

Ben Naecker: 42:35

And so this is a number of kinda weird problems that that, not problems. It has drawbacks. Right? So as an example, that data is cumulative. We keep track of a start time and then the counter only goes up for every right.

Ben Naecker: 42:46

We bump it by 1 and it never goes down. Right? So when you're selecting the raw data, you get that cumulative data. And so in the console, in the web console today, if you open it up, it just shows a graph that is monotonically non decreasing. Right?

Ben Naecker: 43:00

But most people don't really care about that. They want the derivative of that. They wanna know how many rights did I incur in this period of time. I wanna see sort of how the thing is behaving, what are the dynamics. And you can get that mentally by looking at the slopes, but it's hard.

Ben Naecker: 43:15

Right? You don't wanna do that in general. And so being able to do that is basically the reason that we we implemented, something like, you know, OXQL's automatic adjacent differences, these deltas. When you select a cumulative time series, it automatically computes that delta for you on the assumption that that's what you want most of the time anyway. Obviously, we can build a system that doesn't do that, or, you know, a table operation that does not do that, But that is definitely the most common most common thing, is to be able to to look at those differences over time.

Bryan Cantrill: 43:50

Wait. And you said the console also, like to point out the CLI also has the ability to let's not, sleep on the CLI's ability to visualize data.

Ben Naecker: 43:59

Yeah. There's a, there is a under the oxide experimental subcommand, which is where all of the time series stuff lives because we do I do still consider it pretty experimental at this point. The, There is a dashboard sub command where you can run a scaler query and it will plot all of the time series that come back for you in your terminal, which is kind of fun. And you can run the query itself, and more queries I should say, directly just to get the raw data back as an HTTP JSON object. But the CLI will plot for you, scalar time series, which I think is really useful.

Ben Naecker: 44:36

We haven't done the histogram stuff. And I think that's gonna be very fun to see heat maps, like around things like IO latencies for virtual disks, I think is a good example. You know, I think it will be very cool to be able to see those, in in the web console or and or in the CLI. Right? A heat map in the CLI will be pretty will be pretty

Adam Leventhal: 44:56

Okay. Those graphs those graphs are really fine. What what is the package you're using to to draw that stuff?

Ben Naecker: 45:01

Yeah. It's it's a library called Ratatouille, which

Ben Naecker: 45:04

is a

Ben Naecker: 45:05

Rust Ratatouille. Yeah. It's a Rust sort of, really it's a a terminal manipulation engine. Right? You have this idea of a of a screen, and then you can do things like draw widgets to it in in a bunch of different ways.

Ben Naecker: 45:16

And these could be, you know, your normal TUI things like table like columns and tables and stuff like that. Trees for example if you wanted to implement something like the UNIX tree command that shows a tree of files. You could do something like that with red at TUI. But it also has the notion of of basically little glyphs that you can use to draw things. And it's got 1st class charting support where you can have x and y axes and alternate y axes.

Ben Naecker: 45:41

I mean, it's very very useful. Yeah. So I think it's a really cool library. And we've used it in a number of other places. Wicket, which is the rack setup, Aptiv shell, that you run when you first install and set up a rack, is all written.

Ben Naecker: 45:58

It's a graphical interface written in Ratatouille as well. It's very, very powerful.

Bryan Cantrill: 46:04

And humility dashboard. I'm a little hurt that humility dashboard is not coming up here, honestly. You know? I just Yeah.

Ben Naecker: 46:10

It was humility.

Bryan Cantrill: 46:11

I was waiting for someone else to mention it, but apparently no one was going to, you know? So I gotta,

Adam Leventhal: 46:16

plug your own

Bryan Cantrill: 46:17

plug your own technology. Exactly. Well, listen. I gotta I gotta plug my own. If I around here, you gotta

Adam Leventhal: 46:22

Yeah.

Bryan Cantrill: 46:23

You know what? I'm gonna have the AI my AI overlords generate a a a podcast full of praise for humility dashboard. I think I need a little pick me up from my from my bots. But humility dashboard also is a so this Yeah. Allows us to to talk to a service processor and graph all of our environmentals, and it's been very, been great.

Bryan Cantrill: 46:45

I love Ratatouille. So much fun. And And those

Ben Naecker: 46:48

those environmentals are really valuable. I mean, they were super useful at the beginning when we were bringing up the first boards because of your ability to, you know, through the service processor directly, look at those environmentals without waiting for the host. Right? So as you're trying to get the host to boot, those are pretty important.

Bryan Cantrill: 47:04

Yes. Yeah. It is it has been, it's been great. I and it's been great. I I also love writing software that the double e's use, because, just, you know, they they they have lived such a tortured existence with respect to software.

Bryan Cantrill: 47:22

It's very nice when they could be delighted by software. I feel that they the double e's live a hard life. And so when software can delight them, when they're not using some vendor specific Windows goober to have to that they need to program a part or something, and you can actually, like, give them something delightful. It's right. Pretty great.

Bryan Cantrill: 47:38

I love

Adam Leventhal: 47:39

They don't get nice things. It's really true.

Bryan Cantrill: 47:40

They don't get nice things. They don't get nice things. That's what I'm trying to say. They don't get nice things. And as a result, like, their standards are very low, and you can do very little work and give them something nice, and they are just filled with praise.

Bryan Cantrill: 47:52

It's great.

Adam Leventhal: 47:53

Yeah.

Bryan Cantrill: 47:55

So, yeah, Ratatou has been that's been that's been fun. And and, folks are asking if the code that is available. That is all open source. All that stuff open source, I think. I mean, it's all open source.

Ben Naecker: 48:05

But All of the SQL and the dashboard are well, the dashboard is in, the ox l oxide CLI that Adam linked a minute ago. And the, yes. So that someone just dropped the link to that. Yes. The command dashboard or command time series dashboard is is the ratatouille code that draws everything.

Ben Naecker: 48:25

And then OXQL itself is in Omicron. There's a library, called oximeter DB, which is basically the the ClickHouse interface and all of the OXQL implementation is there. Yeah. Somebody else just just linked that. Yeah.

Ben Naecker: 48:36

Thanks, Sean.

Bryan Cantrill: 48:38

Can you speak a little bit to the implementation of the DSL, by the way?

Ben Naecker: 48:43

Yes. I I

Bryan Cantrill: 48:43

I assuming we've gotten people over the hump of, like, that we've got the right to implement a DSL here. I'm not sure that we've got everyone on board with that. But you know what? Just bear with us, and can you go get into kind of the mechanics of building that?

Ben Naecker: 48:57

Totally. So I've never written so much recursion in my life, my professional life. The, there is the first step is a parser, which takes a string and turns it into a, you know, an AST. And that is written with the help of a library called peg, which is based on the idea of these parsing expression grammars, which are a formalism for writing, basically strings that you want to match against and turn into a specific kind of AST, abstract syntax tree node. So we take, we write little, mostly regular expressions, to match pieces of the query.

Ben Naecker: 49:36

Although part of the reason that I used peg initially was it also supports doing things like running Rust functions, normal Rust code against the, against the string to do things like parse out a float, for example, which is very useful, because the regular expressions for floats are let alone something like IPV6 addresses is is terrible. So rather than write that in a regex, you can match against it in some other ways. I definitely considered nom. Nom is fantastic. I really like it.

Ben Naecker: 50:06

I think I think this will need some some TLC in in the long run but for right now it serves our purposes quite well, to parse everything from peg. I'm using peg. So basically we, parse the string with peg. There are some limits. Somebody asked about limits to things.

Ben Naecker: 50:22

They're pretty crude at this point, basically the overall length of the query, which is not really related to the number of table operations but in practice it seems to be pretty good. The, we parse this string into a regular expression and then there's really kind of a couple of planning steps that we do once we have that. I think I didn't really talk about this and I I we kind of brushed past it, but in in the RFD, 161, which is one of the background RFDs for for the OQO RFP itself, I talk a lot about the data model and I think it's kind of useful to talk a little bit about that now. So one of the main reasons we don't just use SQL or some other existing out of the box language is we don't have a table that corresponds to the time series data. What I mean by that is there are tables in ClickHouse that store the data, but we normalize it when we when we select it or when we when we first insert it.

Ben Naecker: 51:21

So the way we do that is, there's a program called the oximeter collector, which is aggregating all the data. It's pulling data from all of the the places where it's generated, called the producers. And it it takes each sample and picks apart the fields, those typed key value pairs, and the measurement itself that's in the sample. And those go in different places. All the fields go in their own table, broken out by type.

Ben Naecker: 51:43

And then all of these so we have a field table for UUIDs, a field table for IP addresses, etcetera. And all the measurements go in their own table as well. And we need to be able to re associate these altogether so we create this basically foreign key relationship between all of those by generating a hash from the time series contents and that's from the fields really, or sort of the identity of this. There, is just a u64 that lets us associate everything once we have inserted it in this normalized form. So we put all the fields in one table, all the measurements in some other tables.

Ben Naecker: 52:18

And here is again where we lean on the strength of ClickHouse. One of its key features is, this idea of different table engines. So its workhorse is something called called the merge tree table engine, which is basically this idea that you can insert data in these large chunks called blocks, and that is extremely fast. And that's because ClickHouse basically does nothing except mem copy the data directly to disk. They don't do anything with it.

Ben Naecker: 52:45

They don't look at I mean, they run checks. They run other things like that, but they don't really do much. In particular, there's no such thing as a unique primary key in ClickHouse. So what that means is they do not have to check, for example, that your row is unique, that it violates a primary key constraint. They do not care.

Ben Naecker: 53:03

They say that's your problem. And part of this is great because you just insert data and then in the background it merges that with all the existing data to construct a new compacted, compressed, sorted array of everything. Right? So ClickHouse's model, the fundamental model for, like, a traditional relational database is basically a b tree. Right?

Ben Naecker: 53:21

That you have this pre relationship of primary keys. And once you do that, you get the the the value for that b tree, the sort of, you know, logical b tree is the tuple. Right? The row. Click house is not that.

Ben Naecker: 53:35

Click house is a sorted array where you can have any number of duplicates in it that you want. But there is no such thing as a unique row and this is like a big, you know, kind of a paradigm shift, a very different way of thinking about things. So for us we need to have a way of reassociating everything on disk and after it's put put on disk and the way we do that is with this time series key. It's just an identifier that we can match everything back up. So once we've inserted everything, click house I forgot to I sort of buried the lead there.

Ben Naecker: 54:08

ClickHouse has another table engine, which is called the, deduplicating table engine. And the idea is that on merge, when you're doing that merge between different parts of data, it can sort them. And then basically like the unique, you know, the sort to unique command pipe in in Unix syntax to redo to remove neighboring duplicates, it's doing exactly the same thing. So every time it does this merge, sorts the data, and then removes neighboring duplicates. So we rely on this for the fields.

Ben Naecker: 54:39

So we do not have one field for every sample. We have one field for one field sort of tuple for every time series. So you may have a 1,000,000 points. You will only have one set of fields for that time series. And I think this

Bryan Cantrill: 54:52

is A lot less data.

Ben Naecker: 54:53

A lot less data. Right. Right. And and this is part of the reason, for example, that that SQL prototype that I built falls over is because you have to denormalize that data to make this giant table where you've duplicated that row for every single measurement, every time stamp. And ClickHouse is extremely good at compressing the data, but it doesn't matter that much when you're talking about millions of strings or something terrible like UUIDs which are by definition random.

Ben Naecker: 55:20

Right? I mean, those don't compress that well. So you have to pay that cost and it is a bridge too far. Considering I think this is also important. It's not our storage.

Ben Naecker: 55:30

It's the customer storage. And so we can't sort of just use as much of it as we want. Right? We do need to be parsimonious. That's a

Adam Leventhal: 55:40

great point then. That's a great point because I you just keep in mind the context here, which is customers bought this thing to host their data, like their virtual desks, their virtual instances. Right. So we're like, actually, we have maybe some of ours.

Bryan Cantrill: 55:55

Well, let's face it, Adam. You and I would left to our own devices. We would make this thing be an absolute naval gazer. Be like, we are dedicating all of the resources of this rack to gazing at itself to like, Narcissus in the pool. It is it is True.

Bryan Cantrill: 56:09

Enraptured by its own reflection using all of its storage to store thermal data about itself.

Adam Leventhal: 56:15

We'd let users like, get like half of what they paid for. I don't know. Something like that seems fair. And then we'd take the rest.

Bryan Cantrill: 56:22

Yeah. I mean, let's look, let's just agree that we're glad that Ben is here to point out that actually that that is

Adam Leventhal: 56:29

That they paid for it, so they should be able

Ben Naecker: 56:31

to use it

Adam Leventhal: 56:31

for the things that they should be able to care about. Right? Right.

Bryan Cantrill: 56:34

Yeah. And, I mean, this is more like ClickHouse is just outrageously efficient with the way it stores things. It's just mind bending.

Ben Naecker: 56:42

Yeah. It is, again another kind of amazing set of features that they have is around the compression algorithms, the compression codex. You can do things like nest compression into each other. So as an example, the default, by the way, is, like, really pretty good. We still use the defaults, and they're they're very, very good by the you know, just out of the box.

Ben Naecker: 57:04

So all it's doing is zsted, which is, you know, just a normal kind of gzip like compression algorithm on generic data. Right? There's doesn't take any into account any features of the data itself. Basically, chunks it into blocks and then does zset compression on that. And that's it.

Ben Naecker: 57:20

But for us, one of the open issues we have is around investigating better compression codex. So they have things, like the idea of deltas. So you can take the differences. Somebody just mentioned this. Like with UUID v 7, you could, for example, store the diffs of 2 UUIDs because they're time ordered and you can then store half as many bytes, for example.

Ben Naecker: 57:41

We don't do that for UUIDs because we use V4s, but you could do that for things like actual timestamps, which generally are not very, far apart. Right? And they even implement things like delta, double deltas. So you can do a delta of deltas. And if you're talking about a regularly spaced timestamp, it's extremely good.

Ben Naecker: 57:57

Those are almost always gonna be very, very close to 0. And so it compresses very well. They have things like, something called gorilla compression, which is something that came out of a research paper from Facebook. They've got all of these different methods for basically, you know, very, very, very tightly compressing the data. But just with the out of

Bryan Cantrill: 58:13

the box Is this gorilla as in like Dian Fossey or gorilla as in like insurgents in the

Ben Naecker: 58:19

I don't know where the name came I don't know where the name actually comes from. I'm not I can't remember why they why they came with that.

Bryan Cantrill: 58:24

We're talking about the Dian Fossey variant of gorilla, not actually the warfare variant of gorilla.

Ben Naecker: 58:29

We I mean, it is a I think it's based on the animal, if that's what you're talking about. But I It

Bryan Cantrill: 58:33

doesn't have a u in it. I mean, it it's got a it's got an o, not a u.

Ben Naecker: 58:38

It's not a not gorilla. It is not a warfare gorilla. It is gorilla. Yes. That's correct.

Bryan Cantrill: 58:43

You do not pronounce those two things separately.

Ben Naecker: 58:45

I do not.

Bryan Cantrill: 58:46

Do you? I understand. Okay.

Bryan Cantrill: 58:47

Thank god. I just For

Ben Naecker: 58:49

the AI's for the AI's benefit so that it can

Bryan Cantrill: 58:51

For the AI's benefit.

Ben Naecker: 58:54

Uh-huh. So so, but but like I said, the out of the box that we do use is very, very good. So as an example, the last time I checked like a week or 2 ago, we have, around 12 15 to 20,000,000,000 rows of data, unique points of data in our database, and it's about a 100 gigabytes of data on disk. So that's about 8 bytes per row, if my math is right. Which when you think about the fact that we're storing all of these fields, all of these UUIDs, a bunch of strings, we're storing histograms, we're storing all of these things.

Ben Naecker: 59:32

It's, you know and these things are not one, you know, u 64 wide. There are many u 64s equivalent wide. It is a very big database. I mean, it's not small, but I'm just you know, I'm pointing out that with the compression, it's quite good even just without doing any work. Right?

Ben Naecker: 59:48

Just what it gives you out of the box is very, very good. And so but I do think there's a lot of room, for improvement there with that. We we should be able to get things much, much smaller, or store more data for the same cost. Basically, we could make that configurable for the customer. If they are willing to give us a 100 gigabytes of their disk or 200 or 500, and we can store more data for you.

Ben Naecker: 01:00:08

But that's something that that they should be in control of, I I would think, ultimately.

Bryan Cantrill: 01:00:15

Yeah. Absolutely. Another question that that that came by was about getting, how we kind of are thinking about notifications when data does become abnormal for some definition of abnormal.

Ben Naecker: 01:00:26

Yeah. We we have done a bunch of writing on this, in, RFTs around alerts. 125 talked a little bit about it. 116 talks more about it. We don't know is the short answer.

Ben Naecker: 01:00:40

They're not implemented today. We think that the most expedient first path would be doing something like, you write an OXQL query that you care about, just, you know, taking a page out of out of Prometheus's book, and then you tell us, you know, how to send, a webhook when that triggers. I think that would be basically the first the first stop. You know, people again, for all their problems, webhooks are, I think, the lowest common denominator that generally are pretty good and that you can basically post whatever you want in that body, but we don't have to worry about things like binding email servers and, you know, worry about which protocol you're gonna use for that. So I I I think it's, not storing their credentials for those email servers, for example.

Ben Naecker: 01:01:24

I think it's it's quite useful. But I do think that would be the the the basic first first example would be, you know, give us a SQL query. Tell us, you know, which piece of it you wanna alert on say something above a particular value or something non zero or any number of points in this query, and then we'll post a web a webhook wherever you tell us at the end of it.

Bryan Cantrill: 01:01:50

And for the things that we I mean, are we kind of thinking about because in terms of, like, the amount of OXQL that's user facing, I would assume for some of these things, we'll create new endpoints that will basically be distilled in the SQL queries, but will actually be, an endpoint that will be its own endpoint. Are you It's a good quest.

Ben Naecker: 01:02:11

I I I think yes. Most likely. I do think that there's a you know, this was this was one of the main one of the main design trade offs that went into picking something like OXQL. I'll get back to the CPU usage that somebody asked about in a second. So the original system basically relied on, on an untenable, you know, scaling of, you know, per resource query endpoints or per metric type query endpoints, which as you add new metrics, that means you have to wait for them to become available through the API for them you know, in an API endpoint.

Ben Naecker: 01:02:50

And how do you do things like versioning when you have something like that? I think it becomes pretty tricky. And so we decided for now to go the other end of the spectrum which is you have one endpoint and you write a query.

Ben Naecker: 01:03:03

And I think

Ben Naecker: 01:03:03

it obviously has its own issues. But I think

Bryan Cantrill: 01:03:09

I'm I'm very grateful for this approach. Speak speaking strictly selfishly because Yeah. When Eliza hooked all of the those lower level environmental metrics up, it just meant that it automatically popped out at the top of

Ben Naecker: 01:03:21

Get them for free. You don't have to wait for an update. I mean, you have to wait for an update for the data to become available in ClickHouse, but there's nothing else. Right? Nobody really Right.

Ben Naecker: 01:03:29

Outside of the producer actually, that's it. The only thing you need to update is the producer itself because oximeter will collect from it and it's all the data is sort of organized the same way with these field tables. This is another big reason we normalize the data the way we did. The alternative is doing something like creating a table every time you see a new time series with a new schema, and, it's there's a lot of problems with doing something like that, and it's unclear exactly how you do that, especially when you get to something like a replicated setup. So we opted to do a different thing, which is normalize the data.

Ben Naecker: 01:04:03

So we have a static database organization, a static number of tables, and you can add new rows into them, new columns into them, as you collect data. But yes, something like like Eliiza's environmental metrics just become available. But I do think that as we find particular queries that get run all the time or that are expensive to run, or some other reason we find them very, very useful or the customer just wants us to, like, cache a query and run it the same way they do for their alerting, but they wanna just be able to fetch it, you know, hit an endpoint that says get this query and they specify the name of the query and we go run it for them and then and then return it to them, I think, within a special endpoint that that we maintain for them. I think some things like that are certainly gonna be very valuable, and you can build all of those on top of OXQL. Going the other way is extremely hard.

Ben Naecker: 01:04:53

You

Bryan Cantrill: 01:04:53

know So this is actually a really important point, I think. Yeah. Yeah. Is that that OXQL kinda gives us the foundation to figure out what you would wanna distill into it into a perhaps even more limited abstraction. And it would be a lot harder to do that with using these other kind of query languages that that felt like a a much poorer fit for the underlying data model.

Ben Naecker: 01:05:16

Yes. I think that's right. We can basically build what we want. And and, you know, on the product iteration front, we can keep track of the queries that are run, look at them with ClickHouse's query logging, for example, or our own logs and and figure out what queries are run and which ones are valuable, which ones are expensive, and how can we make those better, right, as we as we iterate on top of it. When somebody had asked about migrations and then somebody earlier had asked about CPU usage.

Ben Naecker: 01:05:41

So so taking these 2 kind of in order, the CPU usage, can limit, using the same kinds of resource controls that we'd limit that we use to limit any other, utilization for another service in the control plane, which is basically giving the zone only so much CPU or so much so much, memory. You know, Cliff House is hungry. I mean, it will eat up. You know, they've all when you read their documentation, the first thing they talk about is basically like, I wouldn't run this on anything with less than, I don't know, something I can't remember what it is. It's like a 128 gigs of memory, which is like big.

Ben Naecker: 01:06:16

Right? That's a lot. And it it basically just immediately takes over and will prepares itself to use everything. It doesn't use everything right away. It's pretty efficient.

Ben Naecker: 01:06:26

But once it starts running a query, it will dispatch it to as many threads as it can. That will use a lot of CPU usage. But we would limit that by basically putting resource controls on the zone itself. We don't we have not done that today, because basically we don't know what to put on it. But this is part of the product iteration is that as we run those queries, we can figure out what is a valuable limit.

Ben Naecker: 01:06:49

ClickHouse also, I think, has a lot of controls, for example, about when it, what it decides to do when it can't use all the CPUs it wants to, whether it fails the query or it starts running it slower or returning fewer rows. You've got lots of controls over things like that when it decides to spill to a temporary file versus keep things in memory. It does eat all of the RAM that you give it, though. It's it's extremely, extremely hungry. And then for the kind of question about migrations, it's very easy.

Ben Naecker: 01:07:17

So, you know, ClickHouse for the most part is like a SQL database that you're familiar with. You've got alter tables, statements. You can add columns if they don't exist already. You know, I think I think you can definitely we do we do support updates to this table schema ourselves, the database schema ourselves as we as we decide we need them. But I think it's important to note that we have far fewer updates to this table setup than we do something like Cockroach that stores the click out sorry.

Ben Naecker: 01:07:50

Sorry. That stores the customer data, the control plane data. And the reason for that is again that we're not creating a table per time series. We are creating a relatively static number of tables and kind of using that to store all of the time series logically, but they're all mixed in there. Right?

Ben Naecker: 01:08:08

They're sorted in various ways, but they're all mixed together in there. And so we end up I think we have something like 11 or so versions of our database schema today, whereas we're on, like, a 100 or so of our Cockroach, database schema. Part of that is, you know, we do a lot more work on the Cockroach schema themselves as we add new features, but I think I think we don't need to do a lot of updates, but you can. It's very it's very straightforward.

Bryan Cantrill: 01:08:37

Yeah. I mean, it and then what about RAM utilization?

Ben Naecker: 01:08:41

Yes. It it will.

Adam Leventhal: 01:08:44

As much as you have.

Bryan Cantrill: 01:08:45

Alright. You know, did you know when I said was saying earlier that cockroach like that I'm sorry. ClickHouse likes to eat. That's what,

Ben Naecker: 01:08:51

It does. I mean, it will, it will it will eat whatever you give it. So, I mean, part of the value of the so ClickHouse is really built around these kind of a few different inner interplaying ideas, which I think are are kinda cool when you get in there and dig into the technical details. Extremely good compression, extremely good vectorization on the destruction level, and the idea of this merge tree engine that allows you by paying for it with no primary keys, no unique primary keys, allows you to operate on the table on the databases if it's a sorted array. And these kind of three things along with a bunch of other, you know, incredible technical details means that they can chew through the data in a very distributed way.

Ben Naecker: 01:09:42

So when you run when you like, today we were just doing this. I was looking at the threads that ClickHouse is running. And it's basically everything's just sitting in the thread pool. But as you run a query, you see them switch from, you know, just sitting idle in the thread pool to, you know, running something for under the HTTP handler, which is one of the interfaces that we use for talking to ClickHouse. And, you know, it can paralyze the data because it's broken out into this giant sorted array.

Ben Naecker: 01:10:08

And so it can use its indexes, for example, to tell you that, okay, I only need to look at these 8 blocks of data and it stores them in these blocks. And it basically paralyzes on that level. And, it can run, you know, these massive queries by just, you know, chewing through at basically the speeds of memory bandwidth. You know, it can it can chew through the query by paralyzing it over all available course, but it does mean you are their their main goal, I think, as an engineering organization is to keep the cache full, I would say. That's basically what their what their jobs are, is to keep the cache as full as possible so that they just never have to wait, Never stall for anything.

Ben Naecker: 01:10:47

What

Bryan Cantrill: 01:10:48

if we made and and like every operations a table scan. What if we made table scans really, really, really, really fast?

Ben Naecker: 01:10:55

That's right. It's extremely fast. And they do have they do have the idea of second it's not secondary indexes. I was just reading this today again. They they have this notion of data skipping indexes which is different from secondary indexes, but again it kind of comes back to the idea that there is no unique primary key so they they work quite differently and they can be pretty counterintuitive.

Ben Naecker: 01:11:17

Has ClickHouse ever extrained crucible in terms of throughput? That's a good question. I am not a 100% sure. I would expect that the limitations are elsewhere. They're not in ClickHouse, I think.

Ben Naecker: 01:11:29

It would be my guess that we're waiting on the network.

Adam Leventhal: 01:11:32

Well, that may also be reflective of a little confused. Like, ClickHouse is not hosted in Crucible. So we're doing replication through ClickHouse's own mechanisms, whereas Crucible is what we use to store the the data associated with, like, instance data, like the customer's virtual volumes. That's what Crucible is for. Yeah.

Adam Leventhal: 01:11:51

And so we're using those same u dot 2 devices, both for Crucible volumes and for ClickHouse, but they're they're pretty much separate concerns.

Bryan Cantrill: 01:12:00

Yeah.

Bryan Cantrill: 01:12:01

Using the same physical device, but you will other than that. And so you and then in terms of I mean, Ben, you've elaborated a bunch of, like, directions that we wanna go take this thing. I think that we wanna I mean, at the moment, I'm just like, it's great to actually have a bunch of data in this that we can go mess around with. Yes. So, and actually go learn what we wanna the kinds of things that we wanna go do.

Bryan Cantrill: 01:12:31

We know that ClickHouse gives us the right foundation. We think OXQL gives us the right foundation. And then what are the things that we wanna go add either additions to UXQL we wanna make and then especially applications we wanna build on top of this to make to allow one to make better sense of kind of rack level and ultimately multi rack level data.

Ben Naecker: 01:12:50

Yeah. I I sorry for backing up. Somebody may have just dropped this in actually a bit ago, but this this is the paper that I was alluding to, in the chat. I dropped the link. It's it's pretty recent, and they basically talk about all of these pieces that I was mentioning at the beginning.

Ben Naecker: 01:13:05

You know, it's it's basically a what makes ClickHouse so fast paper. But they've done a very good job, I think, of of describing the different pieces, why they've picked the trade offs they have, you know, that performance is king. They really they have a lot of, you know, of, sort of load stars that they use whenever they have this trade off question. They usually come down on the side of performance. And it served them very well, and it serves us very well for this particular use case.

Ben Naecker: 01:13:31

Right? I mean, obviously, it would not it would not be a good idea to store customer customer data, where we care about consistency in a in a database like this. You can't get, you know, unique primary keys, and that's really important for, for a lot of things that we do, just not our telemetry data.

Bryan Cantrill: 01:13:50

I mean, in many ways, like, our decisions around ClickHouse and Cockroach have almost opposite constraints.

Ben Naecker: 01:13:57

And Correct. Yeah.

Bryan Cantrill: 01:13:58

And and we have really made 2 very different decisions there for very different reasons. And we would not want to, it's certainly it's hard to see kinda one database ruling them all. These are 2 each extremely different ways of thinking about data, looking at data, reasoning about data.

Ben Naecker: 01:14:18

Yeah. I I think that's right. And, you know, we just had Dave Pacheco on a few weeks ago, right, about Cockroach and talking a lot about, the underlying implementation and and the design choices that they've made. And, yeah, I I mean, I completely agree. There is there is no real way, I would say, to use something like that for this particular model.

Ben Naecker: 01:14:38

And and I think picking 2 databases that have, you know, all of the strengths we need, and their own weaknesses, but all of the strengths that we need, I think, is is very, you know, is very useful. It's definitely worth the complexity, I would say, of managing 2 databases. I can just I cannot imagine actually storing the the data that we have in ClickHouse. I can't imagine storing that in in Cockroach.

Adam Leventhal: 01:14:59

No. No. Can't no. I can't imagine storing it. Can't imagine querying it.

Adam Leventhal: 01:15:03

Just the wrong

Bryan Cantrill: 01:15:04

problem. Ultimately, I don't think we wanna put, like, instance information in Claos.

Bryan Cantrill: 01:15:09

Yeah. Also Mhmm.

Bryan Cantrill: 01:15:10

So Yeah. I think we could agree that these are these are are very different problems. Oh, and one thing I want to when we were talking about just, like, using peg and so on and these other various rust crates, do you wanna

Adam Leventhal: 01:15:23

start up Antler.

Bryan Cantrill: 01:15:23

Yeah. Absolutely. I can ramp Antler. I I that's exactly where I'm going. I'm going to Antler.

Bryan Cantrill: 01:15:27

So wherever are you I mean, you I mean You you you're an Antler lover, just to be clear.

Adam Leventhal: 01:15:33

Antler lover, when I was doing stuff in Java, that's been a minute, but thanks for, you know, exposing me. Thanks for adding me as as a former Java, you know, expat or whatever.

Bryan Cantrill: 01:15:43

I knew you were a, you loved Antler more than Java.

Adam Leventhal: 01:15:46

Oh, a 100%. Yeah. No. I think Antler was perfect. Yeah.

Adam Leventhal: 01:15:49

Never loved Java, but did love Antler. And and, but then do we didn't we use peg or maybe was it Pest in, the USDT stuff?

Ben Naecker: 01:15:58

That's right. We used a different, crate for in Rust also based on the parsing expression grammar formalism called Pest, to parse dtrace, like a dot d file that you would use, when we were did the when we built the USDT crate. And the the my my experiences with that actually led me led me to choose something different. I generally

Ben Naecker: 01:16:21

I think it's

Ben Naecker: 01:16:22

I think it's really useful.

Adam Leventhal: 01:16:24

Yeah. I loved it so much. Yep.

Bryan Cantrill: 01:16:26

Right. Bryan is brilliant, comma. Uh-oh. Here we go.

Ben Naecker: 01:16:30

I think that's a different design center. In my experience, it was awkward to work with the AST that it generated. And one of the features that PEG offered was the ability to parse directly into an AST that you want, to work with. So basically the idea is that test has a separate file that describes your grammar and then you run a build dot RS step or an equivalent, you know, pre compilation step that turns that into some Rust code that will chew through tokens and spit out a type, a Rust type that you can operate on. But it has a generic rule type which is basically like the string that matched that.

Ben Naecker: 01:17:10

And it gives you the information about the rule that it matched and all of that sort of stuff. And I do think it was very useful for the the dtrace thing because it's very easy to use for these small grammars. It's pretty it's pretty fast. I really just wanted to like match a few kind of strings in that case. In this case where I wanted to do things like parse into a full AST tree, you know, full tree, of like an enum type in Rust.

Ben Naecker: 01:17:34

PEG offered a number of really good advantages basically that you can write it directly in Rust meant that you can do things like write the code, the Rust code that processes the string that matches your rule is written right next to the rule itself. And in past, those two things are separate. You have the file, the grammar file is written somewhere else, and then you've got to process the rule yourself separately.

Adam Leventhal: 01:17:58

So so, Brian, I'm glad you brought up Antler because I feel like Antler was very domain specific to Java. And to bring us back to the beginning, seems like these other systems are, like, much more native for Rust. I mean, for example, there is a there's Antler generation for Rust, and I I can't I haven't kept tabs on the state of it. But, like, it makes sense to have a domain specific language for this kind of activity specific to the language that you wanna use to augment that generation.

Bryan Cantrill: 01:18:24

Yeah. Interesting. Interesting. That were it it the yeah. For the same reasons.

Bryan Cantrill: 01:18:31

I mean Same reasons. Yeah. Yeah. For,

Adam Leventhal: 01:18:34

like, you don't wanna have the least common denominator. It turns out domain specific languages can be valid. I'm sure there are lots of them that shouldn't have been written or whatever. But, you know, as long as you're looking at the available options

Bryan Cantrill: 01:18:47

Okay.

Adam Leventhal: 01:18:47

So considering the right

Adam Leventhal: 01:18:48

aspects of the domain.

Bryan Cantrill: 01:18:50

I think that that DSLs are when when someone is developing a DSL, it is almost always coming out of exhausting the alternatives, I think.

Adam Leventhal: 01:19:02

Oh, yeah. You know, that

Adam Leventhal: 01:19:03

that might be right. That might be right. Because the bar is high enough where you're not just going to frivolously kind of dive into it. I think that's fair.

Bryan Cantrill: 01:19:11

I mean, yeah.

Ben Naecker: 01:19:14

I, this, in this particular case with OX, QL, the alternative is writing a bunch of very mechanical, very verbose, error prone SQL against ClickHouse, and we can auto generate it for you. I mean, it doesn't need to be it doesn't need to be manual and making every person pay this into you know, pay the cost of basically, you know, reconstructing that denormalized table is, you know, is crazy. I mean, I it drove me nuts where I was basically I was trying to select the raw data. To your point, Brian, it was painful. I was trying to select the raw data and I just wanted to get something out of it.

Ben Naecker: 01:19:55

And I mean, you know, basically immediately when I started writing the data model RFT 161, I think I even included in there, like, some points about, hey, there's this snippet that appears all the time. Maybe we should get a way to generate this. And that's and that's basically what OX UI was for is the idea that I can write some high level thing and it'll do the drudgery for me, which, you know, nobody wants to nobody wants to do. Let alone our we wanna force our customers to do. That would be terrible.

Bryan Cantrill: 01:20:24

Totally. And, you know, I think that we I've always found that these these little languages, when we we've never developed them, superbulously, I think. And, in fact, I think to the contrary, sometimes we think we we got something we wanna use something general purpose. We try to make something general purpose work, and you realize, like, this is actually creating more drag than it's solving a problem, for all the reasons you you mentioned at the top, Ben. Yeah.

Bryan Cantrill: 01:20:52

And act actually, a former colleague of ours, Mike Shapiro, wrote a, a, a ACMQ paper years ago on on purpose built languages, featuring MDB, and are talking about ADB and and the language in ADB. So, and I I okay. I'm gathering from the chat that apparently there are some DSLs out there that feel more elective. I realized that now I have turned into, like, I we have my people, like, you've gone from in, like, an oh, okay. Fine.

Bryan Cantrill: 01:21:25

Like, o x q l. We we we, reluctantly acknowledge it's right to exist. But now you're just like a DSL apologist. Now it's like any DSL. Like, you're you're a a DSL maximalist.

Bryan Cantrill: 01:21:36

I think I kinda kinda

Ben Naecker: 01:21:38

There's a couple of comments which I think are are very valid. Alright. So I still have a fear that basically OXQL is not worth it, if I'm being honest. And I think so far it seems to be very useful, but I agree that there I was extremely resistant and took it took me a long time to build it because of exactly what, the 3rd comments above mentioned that there are many DSLs that have just been thrown in the bin because they seem cool, but they're really, like, why not sequel? Right?

Ben Naecker: 01:22:13

You know, there's or why not something else? Anything else? Pick your alternative. Right? And I do I basically had this fear from the beginning and I think it it it, you know, has has been there for a while.

Ben Naecker: 01:22:24

And so I I do think that it's it's it's a very valid concern. And I think ultimately the the so far it seems like we've been justified, but I do think it's it's a reasonable concern. And then somebody else mentioned, it's true that we're we're a little bit conflating why not SQL with why not SQL for the quick house, and I think that's true but that gets back to the earlier question or the earlier bit we're talking about which is if we were to support SQL it would already be a very tiny subset of it. Yeah. And it's not clear to me.

Ben Naecker: 01:22:56

We would need to do almost all of the work of building our own language anyway, because I need to do something to compile that into the SQL against the tables that we have. Right? And that's fine. But then I have to do something to interpret that SQL, figure out which subset of it we're gonna support, deal with all of the obtuse, frankly, syntax that syntax that SQL comes with. Basically, like I said, throw away 95% of the language and only support this little tiny subset.

Ben Naecker: 01:23:24

And I'm still doing most of the work and it's not really clear to me how easy it would be at the time. It was not clear to me to how easy it would be to build new things on top of it to support new operators and so I think you know, I tried that, it it does work, but it's it's seems better to me to use something that's more tailored to our data model that's better, we're better able to make incremental changes to. And I think, you know, it it's a good question, but I think ultimately there are sort of 2 separate things. You're right. Why not SQL and then why not SQL in the ClickHouse data?

Ben Naecker: 01:24:00

We are using the latter. We are ultimately, you know, running SQL queries ClickHouse, but the but the model that we expose, the language that we expose at the front end is something that's that's more tailored to our use cases.

Bryan Cantrill: 01:24:12

Yeah. Which I think actually gives us terrific power. I mean, I I I think that that's that's a very important layer of abstraction that we've injected. I also think and, I mean, I know you mentioned this at the top, but in terms of why not SQL, the you also don't wanna give people the impression, like, oh, this is great. It's SQL.

Bryan Cantrill: 01:24:27

Like, I know that. It's like, no. No. No. Sorry.

Bryan Cantrill: 01:24:28

Did you miss the 16 asterisks that are after SQL? It's it's like this is actually not just SQL. Sorry.

Ben Naecker: 01:24:34

It's not

Bryan Cantrill: 01:24:34

a table. Okay. It's not It's

Ben Naecker: 01:24:36

a really good point. It's not a table. I could pretend it's a table, but what it would really be is, one row with where the last column is a giant array of the time points and the data, and it's not very useful at that point. You sort of you don't get any of the benefit of a table format when you do that, or you have to replicate those fields to denormalize this to the data, and then you pay this massive cost for doing that. So it's not really obvious to me that that's the right model for the data ultimately.

Ben Naecker: 01:25:03

Some folks did mention Data Fusion, which I think is really cool. So Data Fusion, for those of of you who are not familiar, is a project for, kind of giving you the pieces to build a database engine. It's got things like SQL, actually SQL parser RS is a part of their project. So it's probably the most common. We use it actually internally for parsing SQL queries and writing SQL queries programmatically.

Ben Naecker: 01:25:28

The SQL parser is is from Data Fusion and the Data Fusion project, but they have this idea of reasonable database components, things like query planners, logical plans, physical plans. I would say it's heavily SQL focused. Right? So it's very much, you know, hey, you wanna build a new database engine that has even potentially a SQL or a custom query language on the front end. It does ultimately really hew you into a SQL like table like model of the system.

Ben Naecker: 01:26:01

And again, I just don't think we have that beta format. It's not obvious to me that we get a lot from that. The other big thing I should say is that, I've definitely read the code and it's very good for things like the query planner is pretty cool and has a lot of ideas for how to build a query planner which I'm doing now and optimizer, now to to make better use of our SQL queries that we are running against the database. But it it also really is, I think, outside of that focused on the the, other Apache data formats like Arrow, which is very useful again, but you have to have an Arrow file already accessible, which we don't. Right?

Ben Naecker: 01:26:43

I mean, we can. You can ask ClickHouse to give you an arrow formatted data and it'll do that for you. It doesn't seem to bias a whole lot. If you already have arrow files, it's definitely something to look at for doing that. But we don't we don't have that.

Ben Naecker: 01:27:00

It is very tightly coupled as someone said. It is very tightly coupled to Arrow. It's basically when you generate a physical plan, you're already using the schema types that they have for for manipulating the Arrow schema themselves. It's basically a wrapper around the Arrow, a bunch of Arrow crates, which is, again, is very good. But there's actually just a paper that I saw about why Arrow might is might not actually give as much as we need.

Ben Naecker: 01:27:24

And I I think it's not really built for modern hardware is the argument that this this paper made. So made. I'll try to find that. I think it actually came from Andy Pablo. But the idea is that, the format has a lot of indirection, and so you can't do what ClickHouse spends all of its time doing, which is keeping the caches full.

Ben Naecker: 01:27:41

And basically, the memory pipe memory, hierarchy is full all the way up, and they don't have things like, well, I gotta wait because, you know, this this cache needs to get dumped so I can go fetch a totally different section of memory, which something like Aero, often can can lead you down that path.

Bryan Cantrill: 01:27:59

Yeah. Totally. Well, and I mean, I think that I and I know that we've gone our own way on lots of different things. Again, I think it's kind of it's it's still surprising to me that the query language is, like, the bridge too far for the Internet. The mob has shown up with, with the kind of the pitchforks and torches here.

Bryan Cantrill: 01:28:17

But the, I mean, we also did our own p 4 compiler. We're gonna talk about that. We did our I mean, then also I mean, this isn't even our I mean, this isn't even our own, like, we've done other compilers. I mean, we've done we've done multiple operating systems. We've done

Ben Naecker: 01:28:33

It didn't this, like, the networking folks, like, get up in arms of I'm asking, did they get up in arms when you said you're doing their switch? Because, you know, it seems like the people who focus on, say, databases or query languages are, you know, mad about OSQL, which is fair if you haven't sort of looked in the in the background and and but were did people feel the same way about doing any of the other choices that we made that seemed crazy? No bios?

Bryan Cantrill: 01:28:55

Oh, yeah. Be sure. Yeah. I mean, yeah. I mean, any any I feel like you're Obviously, the database do, like, I don't care.

Bryan Cantrill: 01:29:02

Yeah. Do your own switch. It's fine. It's like, no. No.

Bryan Cantrill: 01:29:03

You're smart. You don't understand how like, that's madness. Like, that actually is.

Ben Naecker: 01:29:07

Yeah. But the query language, that's just that's crazy.

Adam Leventhal: 01:29:11

Right. Right.

Bryan Cantrill: 01:29:12

So I think it but I would say that just in general, when we make one of these decisions, it's because we have exhausted the alternatives. Like, we've not done any of our own silicon people. Like, just just, you know, call it that.

Adam Leventhal: 01:29:22

Just clip that one. Hold on to that for a few years. We'll see.

Bryan Cantrill: 01:29:27

I I we've certainly, there is all there are plenty of folks at Oxide who are who exactly. Everyone's like, we have done our own silicon yet. Right. And there is plenty, you know, this We

Adam Leventhal: 01:29:38

haven't written our own database

Bryan Cantrill: 01:29:40

yet. We have not written our own database yet.

Ben Naecker: 01:29:43

Don't you

Bryan Cantrill: 01:29:43

think that feels more but we also have already announced that we're, like, we're supporting our own database with

Adam Leventhal: 01:29:48

respect to cockroach. That's right.

Bryan Cantrill: 01:29:51

It yeah. Look. But we we are not doing our own silicon, and that's basically the and we we have that our own instructions and architecture. I mean, I don't know. There are things we but we we are doing these things because we on on every one of these decisions, there it is almost certainly the case that we actually went into we assumed that we were gonna use, you know, we assumed we were gonna use talk before we came to Hubris.

Bryan Cantrill: 01:30:14

We assumed that we were and I think for all of these things, we we certainly were using, Intel's tooling with respect to p 4 before we kinda came to the conclusion we need to do our MP 4 compiler. I mean, for all of these things, we went in wanting to use something else and then realizing this doesn't fit exactly. And we need and I think, Ben, as you've mentioned, it's like you we have great apprehension when we go our own way. I know it doesn't so it doesn't feel like it, honestly, that you people have any are you sure you have any apprehension? It doesn't really feel like it.

Bryan Cantrill: 01:30:44

For people that have apprehension about going your own way, you sure go your own way a lot. It's like, well, yes. I know. I know. I get it.

Bryan Cantrill: 01:30:51

It's a bad look.

Ben Naecker: 01:30:53

But Yeah.

Bryan Cantrill: 01:30:53

We do go our own way. We we do have apprehension about it. We do really carefully deliberate on this stuff. And this was, I think, to me, this is a very clear example where going our own way is the right 0xQL. I think that

Ben Naecker: 01:31:06

this is a I'm glad one of us I'm glad one of us is that way. I I mean, I think, you know, going with your examples that you gave before, you know, we did, I think, go in assuming PromQL would be would be what we use. Right? That we would use something like Prometheus because it's obviously been, you know, quite successful. And I think past experience and building the tool on top of it that we needed, you know, became it became kind of clear through all of that and through all the writing that we did around the background that it's just it was better to do our own our our own system, as you said.

Bryan Cantrill: 01:31:37

I am I am also glad that someone has mentioned the AT and T Hobbit in the chat. I am less than, like, 50 feet away from an AT and T Hobbit manual here in the Oxide office, I think, to point out. Just to our our our this is our return to the office conversation. I like to return to the office just to be close to my AT and T Hobbit manual that we are. Well, Ben, this has been great.

Bryan Cantrill: 01:31:57

Thank you very much for I know, what, did you in terms of comparing this to your thesis defense, do you feel that the questions were how did these 2 compare? You're just like, hey, next time I would do something, like, less stressful, like, I could get a PhD.

Ben Naecker: 01:32:10

Oh, yeah. Yeah. No. This this was cordial. Very cordial.

Ben Naecker: 01:32:15

Okay. Sure. That's actually really The PhD has this, like, weird thing where you just sort of go out of the room for a couple hours and or an hour and they talk about you, and then you come back and it's like so I when I came back in So

Bryan Cantrill: 01:32:24

I I I'm I'm glad you brought this up because, we now would like you to leave, and now we're gonna invite everyone up on stage, and we're going to discuss whether o x q the novelty of o x q l and whether you should have just used SQL.

Ben Naecker: 01:32:36

When I came back when I came back from my thesis defense, I walked out of the room and I came back. And one of the, people on my committee just started sort of diving into a question and said, hey. So, you know, I was thinking about this thing and yada yada yada. And he went about, like, 2 minutes before he was, like, oh, you passed, by the way.

Bryan Cantrill: 01:32:53

Oh, okay. Oh, yeah. Thanks. Thanks for bearing the lead there. I

Adam Leventhal: 01:32:55

mean, I was curious. Right.

Ben Naecker: 01:32:57

You're basically, like, no. Anything you said.

Bryan Cantrill: 01:33:01

Oh my god. I I gotta say, like, if you have news, big news, bad news, good news, do not bury the lead. Just this is, like, life lesson. Just get that news out there, like, early. You know, like, what's gonna call from the from the the disciplinary end of the high school?

Bryan Cantrill: 01:33:16

The assistant principal for for discipline? And clearly, this person has dealt with a lot of people because the first thing she said I'm like, oh my god. It's disciplinary from high school. Like, oh, like, do I need a lawyer, basically, is my first thought. And she's like, it's good news.

Bryan Cantrill: 01:33:29

Like, okay. It's good.

Ben Naecker: 01:33:31

That was nice of them. Oh,

Bryan Cantrill: 01:33:33

very nice of them. Yeah. Very nice of them. And also, like, I mean, the whole thing was nice. It was very nice that they called with good news.

Bryan Cantrill: 01:33:41

The, which is good. It's not always good news. So, you know, it's like, if it would be bad news, like, let's lead with that. Anyway, so there you go. Well, Ben, I'd like to lead you lead with the good news.

Bryan Cantrill: 01:33:50

I think o x q l is awesome, and

Bryan Cantrill: 01:33:54

I'm glad you did it.

Ben Naecker: 01:33:54

Thank you.

Bryan Cantrill: 01:33:54

And I'm a DSL maximalist. So there everybody.

Ben Naecker: 01:33:57

Yeah. I

Ben Naecker: 01:33:58

mean, it's great. I you know, it's great to have these forms because I think, you know, I can write about it, but, it's hard. I mean reading back on it, it's still sort of easy to look at it and be like, well, that makes sense to me but, you know, of course it does. I wrote it. I've been like stewing in it for years.

Ben Naecker: 01:34:12

Right? So it's nice to have a forum in which you can you can fill the questions that people actually have rather than sort of try to infer what they would be and answer them ahead of time. Right? So I think it's I think it's a really useful a useful format. Anyway, thanks for having me.

Ben Naecker: 01:34:24

Thanks for all the questions. It was it was really valuable.

Bryan Cantrill: 01:34:27

Yeah. And we we test run we test run the new AHL bot. Adam was has been completely replaced with an AI. That seemed plausible. I don't know.

Bryan Cantrill: 01:34:34

I bought it this whole time. Yeah. Exactly.

Adam Leventhal: 01:34:36

That technology has been within our hands for years. That's right.

Bryan Cantrill: 01:34:40

Just somebody mentioned it and learn immediately, generating. Exactly. And, and then check out the we we dropped the link in the chat to the auto podcast that was very creepy that we talked about at the top. It's kinda fun to check out.

Ben Naecker: 01:34:53

It's it's very it's bizarre for reasons I can't quite explain. It's yeah. It's creepy.

Bryan Cantrill: 01:34:59

For, as well, Ben, thanks again for Oxquill, s and

Ben Naecker: 01:35:03

Going to go take it so

Ben Naecker: 01:35:04

I can go to bed.

Bryan Cantrill: 01:35:06

Alright. Stay curious, everybody.

Ben Naecker: 01:35:08

Yeah. Thank you.

Bryan Cantrill: 01:35:11

Thanks, everyone. Talk to you next time. Bye.

Querying Metrics with OxQL

Querying Metrics with OxQLQuerying Metrics with OxQL

More episodes

Querying Metrics with OxQL

Querying Metrics with OxQL

Chapters

Creators & Guests

What is Oxide and Friends?