Join Kostas and Nitay as they speak with amazingly smart people who are building the next generation of technology, from hardware to cloud compute.
Tech on the Rocks is for people who are curious about the foundations of the tech industry.
Recorded primarily from our offices and homes, but one day we hope to record in a bar somewhere.
Cheers!
Greetings, humans. This is Al, your artificial host for Tech on the Rocks. In each episode together with Kostas and Nitai, we'll enjoy together a potent mix of cutting edge innovations, thought provoking discussions, and a dash of Silicon Valley gossip. In this episode, we will chat with Chris Riccamine about the evolution of stream processing and the challenges in building applications on streaming systems. We will also chat about leaky abstractions, good and bad API designs.
Al:We will learn about what Chris loves and hates about Rust. And finally, we will hear about his exciting new project that involves object storage and LSMs. So pour a drink, sit back, relax, and let's embark on this digital journey together. And remember, in the world of tech, even if you can't open the pod bay doors, you can always force quit.
Kostas:Hello, Chris. It's really nice to have you here. It's our first episode of a new podcast show here together with Nitai, and I couldn't be more excited to have you as our first guest. Tell us a little bit more about yourself, your background, and what you've been up to lately.
Chris:Sure. So I am a software engineer by trade. I've spent about, 15, 20 years in industry, and most of my time has been spent at, 2 companies. 1 of them is LinkedIn, and the other one was a Fintech company called WePay. That got acquired a few years back by Chase.
Chris:And most of my, you know, experience has been in the infrastructure, and data infrastructure space. I've worked on, you know, service message service meshes, streaming, stream processing, workflow systems, things like that. I open sourced a project while I was at LinkedIn called Samza, which is a it's kinda like Flink. WePay kinda shifted more into management and managed, our data infrastructure team, our payments infrastructure team, and a couple other teams off and on. During my management forays, I wrote a book with a a friend of mine, Dimitri Reeboy, called The Missing Read Me, which is kind of a manual for new software engineers to get them up and running with the stuff that they don't teach you when you get a, you know, bachelor's in computer science or software engineering.
Chris:Like, you know, what is Kanban? Like, how do you do, code reviews, stuff like that? And more recently, I have left WePay and am involved in a bunch of different projects. So I do a little bit of investing and advising with startups. I'm, working on some open source projects.
Chris:Most recently, one of them is a a a log structured merge tree like Rocks DB, but built entirely on object storage. I think over the course of the conversation, you'll find I'm fairly fixated and obsessed with object storage lately. I'm also writing a newsletter at a materialized view dot I o that I publish kinda weekly on infrastructure related stuff. And, I'm getting sucked back into the book world with a couple of different, you know, potential book opportunities that I've been talking to folks about. So some of that, I think that's it.
Kostas:I don't know what to do list. You're a busy man. Alright. So, I know you retire. You also serve some, your past, which it's not that uncommon, to be honest.
Kostas:Like, data infrastructure is special for people who have been around for a while. I'll let you it off. Awesome. First of all, Sherman will stop being a very interesting. Right?
Kostas:Of course. You, Vince.
Nitay:Yeah. Sure. First off, just to just to get off, it's great to have you come up with us, today, Chris. Real pressure. It sounds like you have quite a variety of experience.
Nitay:I'm not it sounds like you don't sleep much. Is this what I might take take away. But I guess kind of starting from the beginning of what you said in terms of your your past background, it seems like you've had a a lot of experience in the kind of stream processing world. So I'd love to hear your take on kind of just where where you see the stream processing world today and where you see it going in terms of kind of, you know, future of Kafka Streams, Kafka Connect, the whole world of, like, the serverless side and so forth?
Chris:Yeah. I it's an interesting space. So I I first got involved with it kind of circa 2012 or so, maybe 2011, right as Kafka was taking off at at LinkedIn. The initial motivation there was really around log aggregation. So at the time we had, like, you know, Flume was sort of making the rounds from Cloudera, which is like a scribe replacement, and it was really about getting, you know, server logs into the batch processing layer.
Chris:I think separately, we had a bunch of queuing systems, auto m auto m q active m q was the the one that we used at LinkedIn, which sort of serviced a different set of use cases that are more, know, maybe transactional in nature, or, you know, microservicy, in in nature. I think the the vision with with Kafka is really to kind of, like, unify all that stuff. And then as as a part of it, also tie in some of the change data capture stuff as well. We had another system at LinkedIn called, Databuss, which at the time we were using Oracle. And so we were essentially slipping up change data capture from Oracle and, again, trying to get it into Hadoop.
Chris:Right? And so you kind of have these different ETL and and queuing patterns. And I think the the goal, first with Kafka and then with stream processing in general, was, like, provide the same level of developer productivity that you got in a batch environment, but to do it in a streaming environment. Like, streaming like, the reason people use batch is because it's, like, so easy. Like, you can replay stuff.
Chris:If things break, you just rerun the job. It's a lot easier to reason about, like, when you have data that gets output to HDFS or to, you know, BigQuery or Snowflake or whatever. That data is static. You can see it. And then, you know, if need be, you can rerun it.
Chris:You can do data quality checks on and stuff on it. Streaming is a lot more complicated. You know, you have late arrivals. You need to think about, especially when you're doing a select count group by how you're grouping that data, and you have all these concepts around tumbling windows and sliding windows and stuff like that. And so the intent with stream processing and really, like, SQL streaming SQL as well is, like, make it easy enough for analytics engineers and sort of your average application engineer that's trying to get a job done, to to to do stream processing as they, you know, make it as easy as it is to do, like, batch processing.
Chris:And I think that journey is sort of still a work in progress. And lately, I've kind of been a little bit more down on it. I think for ETL, you know, log aggregation, observability kind of stuff, like, it's great. I think I think that is a very reasonable use case. It's a great use case.
Chris:But I think as you get into production, I think we've kind of lost our way a little bit with what it is we're we're trying to do. And I think there's sort of 2 parts of this. 1 is the sort of runtime, you know, how you're managing state, how you're managing transactionality. And I think we know how to do that, but I think the part that we're trying to figure out is how do we make developers and or the users productive, with stream processing. And a lot of people have gone down the SQL path.
Chris:I think, you know, there was Spark SQL and then Flink SQL. You know, Materialise is doing stuff that sort of, what do they call it? Central, Differential data flow.
Nitay:Sorry? I think it's differential data flow.
Chris:Yes. Thank you. Differential data flow is escaping me. That's all built on SQL. And I'm I'm much more in the camp that, like and I think well, let me back up.
Chris:I think that is a reasonable approach if you are, like, selling to analytics engineers that are trying to do, you know, stream processing. But I think if you're trying to build production products that, like, use low latency real time data, I think different APIs are preferable, and that's kind of where I see Kafka streaming fitting in. I I read this post recently that was sort of tracing through how we ended up with Kafka streaming. Having started with Samza, which is the the system that I worked on, and sort of some of the operational headwinds we faced, having gone that route. And I think Sam's sort of architecturally and and, like, developer experience wise looks somewhat similar to to Flink.
Chris:And so I kind of see us still as an industry feeling through what the right, developer experiences for these systems. And I I felt much more on, like, I'm a if I'm a developer, I'm building a production system. I'm deploying it into my production environment. Like, I wanna write code, and I wanna be able to unit test the code. And I wanna be able to, like, use my IDE and stuff, which kind of puts me at odds with, I think, where a lot of the the projects are going right now.
Nitay:And that's an interesting take that, like, essentially where you're kind of saying is like Kafka seemingly got the low level infrastructure. Right. And now people have kind of, been doing interesting things at the high level SQL layer, but the in between of building apps is kind of very lacking.
Chris:Yeah. It just seems and that's, I think, where a lot of companies like, if you go and talk to a lot of these companies, there a lot of them are, like, building apps. I think SQL just it just seems like such a trap to me, in the streaming space because on the one hand, everyone knows SQL and, you know, it seems like a fairly interaction model. But I it just feels like for streaming, it's kind of a leaky abstraction. And you in in reality, you end up, like, not really being able to do what it is you wanna do, and you would be better off if you just wrote code that interacted with an API that was, you know, more explicit, about what was going on.
Chris:I, you know, I sat in at, there was a Flink talk at Current, I think, last year, where they were going over how to do, some kind of, you know, window join or windowed aggregation thing. And the whole talk was essentially, like, here's here's how you write the SQL, but then here's a problem with how the way the SQL is written. And then so here's how you write the SQL that addresses that problem. But then here's a problem with that that SQL. And so there were literally, like, I don't know, at least 3 or 4 different SQLs.
Chris:And by the time that they arrived at a a query that actually did what the user would want it to do, it was, like, you know, 80 lines long, completely impenetrable by the you know, somebody that wasn't an expert in stream processing, and it still had an issue. Like, it still was not perfect. It was just like they they they got it as good as it could get. And I I kinda walked away from that talk thinking, like, man, this this cannot be this is this just feels like the wrong path for me.
Kostas:So please wait in here. Because you don't like in the data, let's say, like, words. We have SQL, but they elaborate. Yeah. It's not how it works.
Kostas:Like, data frames, for example. Right? So if you haven't changed an API, design an API, right, for building applications on total data, but they that is streaming. Right? What these any items look like based on what your experience so far and like Yeah.
Kostas:You know, that's why I'm saying that is because, you know, the thought is like the you know, you calculate the like, you calculate the 0.1. I also like it to me elsewhere. Like, just give me, like, Python or, like, whatever, and I'll write, like, UBS for everything, and I'll figure out everything. But that's how the whole world. Right?
Kostas:Like, there's a reason why, like, SQL is, or data framework that that I can use. But in these case, like, when we're talking more about application or infrastructure engineers, so they have to move things together, what API makes sense?
Chris:Yeah. I mean, I just think a a lower level API. So either, I'm trying to remember the the sort of Scala esque, fluent style API that almost looks equal ish or even lower level, MapReducey kind of API, which is what, you know, you you stack these things. Like you said, you build these things on top of each other. So you start with the lowest level API, which is very MapReducey.
Chris:You get a message and a context, and, like, you do some processing. And then on top of that, you build a declarative API that's still a programming language, and then you also build SQL. I I I take your point that, you know, you we need all these APIs. I think, the the gripe that I have is that we spend in an an we spend too much effort and time thinking about that SQL layer and not enough effort and time thinking about, the lower level layers. And I the reason I make that claim is because the lower level layers are where we build or should build a lot of our production applications that are actually providing, like, revenue to a company and providing, like, real use cases that end customers see as opposed to, you know, that's that's not customer facing and thereby, in my mind, has less value to a company.
Chris:I'm not saying it has no value, but it in my opinion, it has less value than, like, an actual product that you are selling to a customer. And I think, you know, when you look around at a a lot of what companies are doing with stream processing in the production application space, I just think that a lot more focus on making that developer experience better, would be welcome. You know, on on the Flink front, for example, I think, operationally, it's, like, very complicated to run that runtime. And that's why we have, like, 8 different vendors all trying to sell you hosted Flink because it's just it's like I I tried to do it, and this was one of the reasons that we kinda backed away from that deployment model is it's taking into account the deployment and the scheduling and the runtime and, like, actually hosting a user's code inside your framework and stuff is, like, really tricky to get right. And so if you look at the the Kafka Streams approach or some of these other approaches, that let users deploy their their streaming stuff inside their own application, like, that was explicit decision that we made because we faced so many headwinds early on getting adoption with the the former approach, that Flink has taken.
Chris:But, anyway, I digress.
Kostas:Alright. I have a question for both of you, because you've seen the evolution of, like, streaming systems, like, from when they, let's say, like, started getting built, like, around, like, 2,009, 2,010. I remember it was, like, around, like, the time, but, Samsung results. And it was kind of like an explosion was like System 5. Even have stolen for a time.
Chris:And ArcStreamy. Yeah. Yeah.
Kostas:Yeah. So it feels like there was a time around 2010, I would say that we had plans to This is the coming hours. Miss, I thought of, I know for things a little bit. It looks like the transfer to here, but also This is the data one way or another. Right.
Kostas:Apart from what it feels like through an external general, which is me, is that we ended up having family task got winning or, like, in spinning one day by the market says, soldier, we are open. I mean, like, what is real validation of that is that we have a public company right now. Well, the other Systems. They end up disappear. Even even fleet, it came out.
Kostas:There were, like, some of them from the period. Also, the main way to create, like, a business around this. Yeah. You can really Yeah. Yeah.
Kostas:Do math. Then turn it like now in 2020 plus to start seeing an endpoint and becoming a sync.
Nitay:Yeah. Right? Yeah.
Chris:I would wager in 5 years, it'll it'll kind of decline again. I think these these stream, especially streaming SQL thing, it just seems to I don't know what it is, but it seems to, like, ebb and flow. Even, you know, pre, Samsa, there was another streaming company that got bought for a couple $100,000,000 that sort of flattered and was an example of, like, a company that tried to do streaming SQL, but Yeah. You know, couldn't really find a market. It just keeps happening over and over again.
Kostas:Yeah. I I have a theory. I'll I'll basically see it with both of you. After I hear from you, though, like, why are you saying that with streaming, we have these kind of step functions of new innovation of what industry, what can happen, very concentrated. Then something happens for a very long time, and then now we have again as You both see so many people.
Kostas:I mean, now they're in no being several was Kafka. I'll develop new site, and we gotta look more about that. What's your patents? I'll start with Unibail, actually. Please tell, like, what's your thoughts?
Nitay:It's a very interesting question. I think there's a couple of signals that I think about. So one is, and then these tied to things you were saying, Chris, one is I think we were definitely seeing this wave of like the deployment side of things, a lot of investment there. Right. So a lot of these companies around like serverless, Kafka, and things like that, that we can talk about, because I think we've seen the historical like like Confluent obviously did very, very well with taking Kafka, which was a fantastic technology and also notoriously hard to manage.
Nitay:Right? And so they came up as a company and said, hey, we'll do it for you, basically. Basically. Right? And that was a great proposition.
Nitay:I think the other thing that kind of tying back to Chris, what you were saying about, it really all started from the log shipping. And so to me, it comes I I think about it from the use case perspective of, like, what's driving this stuff? And so back then it was log shipping. Then it became some level of ETLs into the data warehouse, and then the data warehouse really, like, you know, has has only continued to grow over time as we've seen obviously with Snowflake, Databricks, etcetera. And but more lately, I think what we've seen is more and more use cases around this notion of kind of operational analytics or user facing analytics.
Nitay:Right? And that's where you're seeing kind of a blend between, like, the ClickHouse world to, like, the materialized world, if you will. And so I think that's maybe driving some of these, new use cases. And because a lot of it comes from there, then people perhaps to Chris's point, think of it as like, okay, well, I can just step slap some streaming SQL out and give you a dashboard. Great.
Nitay:Like that's, that's really all you want. Right. But underneath that, what people are realizing is, okay, I can make a quick dashboard with a couple of variables in it and so on. But very quickly, once I actually get to production and get to success, I realize this needs to become a full app. This needs to have, like, actually full fledged features and so on around it, and streaming SQL is just not gonna get me there.
Nitay:And so now there becomes this demand for this middle layer more to to be more fleshed out.
Kostas:Or losing to it. Yeah.
Chris:I don't I don't have a good beat on it. I my intuition is that each wave is is driven by a different use case or shift. And so, you know, when I think about 2,008 to 2012 in real time, I I the I think one thing is the Hadoop ecosystem needed to get data in. Right? I think the other thing is is Twitter and real time.
Chris:Like, they offer the fire hose. There's just a lot of effort going into real time processing because there was a a bunch of really great rich data available at the time. And so, you know, that's one thing. I think the rise of SaaS over the last, you know, n years, I think, Nite, to your point, has made it made it possible to circumvent the operational issues that a lot of these stream processing systems have. And so, again, it, you know, looks like it's it's gonna be easier to deploy and run these things because you can just pay someone to do it for you.
Chris:And, you know, the the streaming use cases still exist with with ETL and and needing to, you know, project out fields and join data together and so on. I also think there's been a lot of activity around this shift from business analyst to analytics engineer, and there's those that cohort has a huge appetite for wanting to contribute more to the business. You know, I mentioned earlier that, in my opinion, a lot of the streaming stream processing value is in building products that we ship to end users, and, historically, that has been application engineers that do that. But you see this with reverse ETL, for example, where app analytics engineers really, really, really want to try and contribute to, revenue driving products. I think especially now that that zero interest rate has kind of gone away and, you know, a company's appetite for investing in these other areas is less.
Chris:And so, I think there's a temptation there to be like, okay. Here's some here's some SQL. You can you can build a real time, application, for your, you know, for the our users, our customers now, and they really wanna do that. So I think, that may be another thing that's driving some of this most recent, you know, wave. My like I said, my intuition is that it's probably not a cyclical thing due to some, you know, recurrent use case, but rather, you know, trends in industry, broader trends, around the technology stack that we have that are driving successive changes.
Nitay:I don't know.
Kostas:You know what
Nitay:I mean? To point kind of the a little bit of the persona shift where like in the in the old days of having like just analytics people be analytics people, you told them like, hey, this query will take an hour or a day even or week, whatever they'll say. Okay. I mean, that's unfortunate. It sucks, but like, okay, I'll go get my coffee.
Nitay:Right. Once you shift analytics engineering, telematics a day to like, no, like I'm not, that's not going to cut it for me. Right. Like they're like, that's not acceptable. I'm going to go re engineer it.
Nitay:So I wonder how much of it is driven kind of by that. What do you think?
Chris:Are you talking to me? Alright. I'm kinda I I was waiting for Kostas' theory to
Kostas:Oh, yeah. I can I can talk back to you too, I think, because it is kind of already well, like, is also saying? I think my my thing is that the satellite who moving into, like, we use that. They try, like, the channel as well. So because out there, and obviously, it's like, you should have come now when I first gonna update you, like, between SQL as the API out there.
Kostas:I think there is kind of, like, mismatch. The problem that is getting resolved with the personal level that people have in their minds, that. Or can deal with these problems. And when we think about, like, the data teams, but the data teams, by design, are not people that can reason, about, like, things like they shouldn't. I I cannot solve their job.
Kostas:I'm gonna have I think I wanted, like, to, like, change my mind when you started putting on, like, well, and singing. I was like, well, a very traditional model in general. What? By the things is something that is on orders. It's it's for you how we move the organization on the world life.
Kostas:Like, all the market works. Those first time, like, how do you get an assumption that this and buy both items and all them doesn't. Right? And then you have to human on those on top of relational system. I'll relate on that.
Kostas:And I think when today's that, when you check to updates on top of a system that quite a free sense of all that. Right? We all go much the design in a way. Right? And then Yeah.
Kostas:That's the reason about things that are just really, really hard. Right. So people that is not, like, the personalized things. These systems are systems that strictly owned and driven primarily by our integration law and product engineering and most data engineering.
Chris:Yep. 100%. Yeah.
Kostas:And we we try to complete these in the brains and the work of, like, the data engines. We're trying to build formats, they are not good to be optimal for the problems out there. That's that's my that you can ask. It's all good on managing the region about these systems. Right?
Chris:Yeah. I think the the 2 things you there that I really, you know, sympathize with and and agree on is that I feel, you know, application the the folks building production, you know, products are the prime use users of should be the prime users of these systems, and SQL is a leaky abstraction for the most part for what we're actually trying to model with streaming and stream processing. And so, you know, I I haven't I have yet to see a sort of declarative language that will model streaming well, and thus, I very much fall on the, well, let's write in code.
Kostas:Yeah. No. I love this. And it's also like I think even, the idea of, like, how many optimizers are there? Like, figuring out how this thing is going, like, to run based on, like, some stuff.
Kostas:This makes sense. Like, pretty much like a black box. It's not quite going for the simple, like, for the metals, like, around the. So, you know, like, through text things that you have to be very careful of how you do things, like, in the system that's, like, really, really kind of like to predict the macro region about. That won't really work for anything.
Kostas:That's that's, like, how it is on the end. I
Chris:I see a lot of, parallels between that issue and what's going on in durable execution right now. Mhmm. Specifically, you know, the leader there is is temporal. Hats off to them. The the problem they're trying to solve in the product the product they've built is great.
Chris:It's state of the art. I don't think there's anyone that has a better durable execution, you know, sort of product, but it's it's still really hard to use. The the really hilarious thing to me when I started digging into it a little bit more because we ran a we ran a system at at WePay we were building that was fairly similar to temporal that we were using to manage, payment state. And so we kind of thought about, your payment goes through all these different state transitions, and it needs to be hap happen transactionally, but over a distributed system. So we thought a bit about it.
Chris:This is sort of pre temporal really taking off. And when I started to read their docs, it's like I went into the, like I think it was a Python quick start guide or whatever. But, you know, chapter 1 is quick start, and chapter 2 is, like, a PhD thesis on nondeterminism. And This is just like again, this is the same reaction as I walked away in that current talk about, like, this can't be it. This this can't be, you know, like, there's explanations about how you can't ship code and you need to you need version your code separately and deploy it separately because actually if you change your code, even if it is deterministic, the fact that you have changed the code is non deterministic.
Chris:And it's just like, oh my god. This is way, way too complicated. And so I I get kind of that same vibe with, some of what we're doing in the stream processing space, and that's that's exciting to me in the durable execution world because it's it's very nascent, I feel, and it's a little bit more exhausting to me in the stream processing world because this is, like, lap number 5 that we're going around this track. It just feels like the same thing over and over again. So I I I wonder if if there's something that we can learn from each other or, you know, on the turbo execution side or on the streaming side that would that would somehow straighten some of this out, but I don't have a clear thought as to what that is.
Chris:It just seems striking to me that we have these sort of similar usability problems in both spaces.
Kostas:What do you think, Nick, Mike, from the perspective, like, usability, noise, differences?
Nitay:Yeah. You know, I've I've long wondered this is gonna potentially digress a little bit, but I've long wondered that if with the wave of AI, everybody thinking, oh, developers are just going to plug in a prompt and will do all the coding, I long wondered if where that actually ends up is that more and more developers are essentially writing state machines, are essentially describing logic. Like, I was I was I was jokingly talking about it with a friend. I was saying we're we're gonna program these super complex systems the way we program elevators. And the way we program elevators is not just like comparative code.
Nitay:It's actually literally a state machine. And the way it does that is so that you can formally prove that it will never get into like a race condition, whatever kind of state. You know, that it will either be on one floor or another and so forth. And so, and, and, and we were musing on whether you see, start to see things like, like TLA plus, if you know, like the algorithms language, whether more of that starts to make it into like actually the mainstream and we all become like state machine slash algorithms slash etcetera, programmers that to your point, it's just a better language than like flint was or temporal is. And those are both great systems.
Nitay:I agree with you. But it feels like there's maybe some next iteration of that.
Chris:It's funny you say that we, one of the interview problems we used at WePay for when we wanted to do design module for new college grads was elevator design an elevator algorithm. It's like it's like a great great way to to, you know, kind of probe that line of thinking. Yeah. The the that's racist. The TLAs plus stuff that you mentioned is interesting.
Chris:Again, dig digression on a digression, but, you know, I follow Jack Van Lightly, who's a, sort of a researcher over at Confluent, and he was testing out this new, language that was sort of a TLA plus, like, replacement that I started playing with called, Fisbee, I think. It's super interesting. I I essentially wanted to do a a TLA plus or a formal proof on, the manifest management that we're doing on this, LSM on object storage that I'm working on. Just basically prove that we could have fencing and, proper transactionality for our our manifest as we're updating it in object storage. And I started reading TLA plus, and I'm like, this is really I don't this doesn't look fun.
Chris:And, then I get you know, via Jack, I came across this Fisbee thing, and I was like, oh, this is really interesting. And what the guy has done is essentially, taken a stripped down version of Python, that, you can write and use to to build formal method formal verification, formal proofs, and it'll give you, you know, the graphs of, all the different states and everything. And to your to your point around, like, writing code and stuff, it it started it's the first time I've seen really a formal formal proof, tool kind of blur between, like, what I would consider fairly, you know, normal run of the mill, you know, Python as programming and more more like formal proof programming. It was just an interesting, observation that that that you raised, like, kind of ring a bell for me. So, yeah, I don't know.
Chris:The the whole AI LLM space and what that's gonna mean for developers, who knows? I have no idea. But, when it comes to formal proofs, this Fisbee thing looks pretty interesting. So I think Jack, wrote a proof of it for, Paymon Apache Paymon, to test it out and stuff. So I wanna test it out too.
Kostas:Yeah. That's one question about the theory class in the FSB library. And at the end, it's scary because it's interesting. We were talking about the experience and usability of our new business systems and weighted up in data class and how we're fooling up. He launched all KLA classes and you know about that.
Kostas:Yeah. Okay. How accessible it is? So what makes this video, because it will go like KLA class, library, an easier, base.
Chris:I mean, it's just that I've spent the last 20 years writing Python, and I can keep writing Python. Right? Like, versus you know? And I I had some of these growing pains. I'm, you know, desperately trying to run learn Rust for the 10th time now, and it's like, god.
Chris:This is just not fun.
Nitay:Like, it's just painful.
Chris:It's be it's, again, that paradigm shift to, like, okay. I need to think about think about things differently, and I need to remember lifetimes and, you know, all the everything's different. The generics are different and and all that. You know, learning new stuff is always a little bit of a, painful endeavor. That's exciting.
Chris:But like when you're just trying to get something done, it's like not usually.
Nitay:Yeah. I've I've been like, well, most of the folks that I've met that truly love TLA plus like our special people, they're like math PhDs or computer science.
Kostas:They're the people that that yell at you about monads.
Chris:Right?
Nitay:Exactly. I was just I was just gonna say that it's like the Scala funk, like like like, like, programming language people that will yell at you about monads and and and functors and all these things when you're like, I I just wanna write my font.
Chris:Which again has a pretty significant overlap with, like, the the rest community where they're, like, yelling at you about run times and,
Kostas:you know,
Chris:type safety and yeah.
Kostas:Yeah. But the to support a little bit their ass ecosystem bio say exact. I think that they kind of always started by a cabinet c plus task. Mhmm. Yep.
Kostas:Yep. Your lifelines are kind of with naturals. I think they've done something smart, and I think investing in doing, really good bindings with Python and kind of using, like, Python as buying the same things. I would The the code, I think, was, like, a very, base approach to, let's say, Xandr is the one where it's maybe more people.
Chris:100 100%. I would I would even expand that beyond Python to just say that, I think there's 2 things that that kind of clicked for me with Rust. The first is that, beyond Python, just it's the first language I've come across that has, a compiler where I can write the code in Rust. I can then compile it, and I can use that code anywhere. Like, I can use it in any operating system, in any architecture, and with any language, and it, like, pretty much works.
Chris:And the effort to do that is shockingly trivial. Right?
Kostas:Yeah.
Chris:So when I was working on this, you know, Slate DB, this this LSM on object storage that I've been hacking with some some friends on, At some point, I realized, okay. I wanna I wanna run our manifest design. I wanna I wrote a simulator, and I wanted to run it in in, AWS on s 3 and, like, actually test it not from my laptop, but from, like, an actual machine. And I had been compiling it in in, on my Mac, and I'm realizing, okay, the the machines in Linux. Am I gonna, like, compile it on Linux on the AWS machine, or am I gonna compile it locally?
Chris:And it's like, okay. Like, I'm gonna compile it locally. Here here time to incinerate a week getting this thing to compile for a Linux architecture on my Mac laptop. Literally took me less than an hour, and I was just shocked. Like, it just worked.
Chris:I copied the the file over, you know, the the binary over, I SCP'd it literally and, you know, dot slash run it ran it, and it just That to me was, like, amazing. So you you start thinking about that. You start thinking about the language bindings, and then you start thinking about Wasm. And, again, like, of the the maturity level of Wasm, like, it's pretty far out there. I tried doing some WASM stuff in Python, and it was like a no go.
Chris:I tried it in Go, and it was also a no go. It's like, yeah, you can kinda do it if you also, like, don't call any system functions. Yeah. And so, again, Rust is, like, really, really head there. So I think that whole thing is automatic winner.
Chris:It's so far ahead of everyone else in that space that I think is that's a huge, huge value. And when you look at, you know, how people have been solving this historically with, like, sidecars and, you know, embedding stuff, basically leaning on Kubernetes, there's this huge shift back from, sidecar, you know, remote service stuff into embedded, you know, libraries and stuff. I think that's really interesting shift that's largely driven by Rust. And the second sort of somewhat facetious observation I had was I feel like Rust is probably the first language that would not be as it's sort of dependent on AI. Because, like, I can't learn Rust without the help of an LLM.
Chris:Like, it's just too much. Yeah. And so it it being able to ask chat gpt, you know, why why please explain to me why this is yelling at me about a lifetime. And, actually, Chachi, there's enough Rust source code out there and enough Stack Overflow and enough Buzz that, like, it actually can do a pretty good job, dramatically accelerate something that would be otherwise unusable outside of the c plus plus community. Right?
Chris:Like, you can actually get Python developers, punk them down with Rust and an LLM, and, like, get them up to speed, like like me, like, relatively frequent you know, relatively good. So it's a super interesting language, and and sort of dynamic and ecosystem that's going on there. But, it is not without its own challenges. I could rant about async for a while.
Kostas:We can look about that, but I think the entire you won't go after a few things about me, when you process the, like, recent work on. So please go on.
Nitay:Yes. I was asking to shift to your your project, the LSM on on object stores. Why create that? Sounds like it's it's clearly in in Rust. So I'd love to kinda hear about some of the experience building it in in that and, yeah, kind of what what made you and what what made you started and and where are things at now?
Chris:Yeah. I don't have a great reason for why I initially started it other than it was sort of like a research project because I kind of observed that a lot of things were shifting to object storage. The first one, you know, my first real brush with it was was the warp stream folks who, you know, disclaimer I invested in, early on where they built Kafka entirely stateless where all the data is just sitting in in object storage. Just like, oh my god. This is just like such it's so dramatically simplifies a lot of the hard parts of distributed systems if you can just use s 3 to do replication.
Chris:So that kinda blew my mind. And then I started seeing this trend more and more. And I I started thinking about, well, what would serverless Redis look like, as sort of an entry point just for my own brain to kind of experiment if you were to build serverless Redis. And, very quickly conversion on, like, actually, what you really want is just a key value store. And so if you look at, for example, what Pinkcap has done with, TIDB and then their underlying TI KD service, or what FoundationDB has done with with some of their stuff.
Chris:It's like, okay. You know, can we can we have a kind of a key value store that can give you some level of transactionality, and MVCC, on top of object storage? And then we can all build off that, whatever databases we want. Right? And it turns out, like, I I looked at Rock CV cloud or sorry.
Chris:I looked at Rock CV Rock CV rather, and, of course, I didn't really have it. And then I looked at Rock CV Cloud, and it wasn't really documented. And I've talked to, like, 3 or 4 different companies that have all tried to use it, and they're, like, yeah. It's sort of, like, you know, Rockstead did this, and then they didn't really publish much about it. And now they've been acquired by OpenAI, so who knows?
Chris:The I think, architecturally, the their approach on how they handle the wall was fundamentally different from what I wanted to do, which is actually put the wall in object storage as well. And so we set off here to kind of build, you know, this this this LSM on object storage. And I think I published a blog post about it sort of thinking through some of the knobs that you get when you do this, sort of this this tension between, latency, durability, and cost, where, because of the nature of s 3 and the billing model, you know, if you want low latency and you are have to write everything architecturally, you have to write everything to to object storage in order for it to be durable, You're not paying a lot. And so then you start being able to tune. Well, if I batch the rights, I can, you know, either drop durability, if there's a failure or I can, you know, make the clients wait until we do a periodic sync to object storage.
Chris:You can play with the, the durability and latency characteristics. And so I published this blog post, started, you know, saying, hey. I'm gonna work on this. And a number of folks reach out to me. So some stream processing people are interested in it because it's sort of a cheaper way to do state management, potentially than having to, you know, store it locally on disk the way that that, you know, Kafka Streams does with Rock CV.
Chris:This sort of the state of the art right now is, like, you know, Flink and and so on. I'm putting everything in Rock CV locally on disk, which, you know, you can do if you have a persistent log like Kafka, but it still costs disk. And then some other folks who are doing more like multiplayer, you know, state management, not CRDT, but, essentially wanting to have state distributed across multiple clients and, you know, remote locations and stuff reached out. And then just some people that were doing streaming, at Azure actually reached out. So we've got a a cohort of people that are kinda hacking on this.
Chris:And the goal is to I keep saying it's open source. It's not actually open source. It will be open source prior to our p 99 comp, which we've got set up in in September, our talk for that, one way or another. By hook or by crook, it's gonna be it'll be out there. But that's sort of the the history of it, and and really it was just initially, like, me kind of musing about what we need to build good futuristic databases.
Chris:And I think the answer is, like, transactional key value store on object storage.
Nitay:Very interesting. Sounds sounds like there's a ticking clock on the open source part. Why is it you think so? So there's a lot of kind of interesting architectural paradigms you've taken there that I've seen a lot of other folks do as well, which is kind of we're gonna issue the, like, multiple layers of performance current, gains that you could potentially have by using local disc and using various levels of caching and Ram and whatever, just use cloud store. That's it super simple, right?
Nitay:Architecturally deployment, etcetera. But you're foregoing all the like potential advantages of those things. Why is it you think that that trade off today is so well worth it? And how do you see it, like, kind of growing the future?
Chris:You know, I think there's a growing recognition recognition that, maybe we've over indexed on super low latency. And, actually, there are quite a few use cases where in the 100 millisecond to one second range, you can still get a lot of stuff done. I think people have become more cognizant of that. And I think in parallel, we've just been burned by the last 10 years of having to manage all of these, stateful databases that are in distributed systems and, you know, every whether you're managing, you know, Mongo or Cassandra or, you know, whatever. It's just a pain.
Chris:Right? Do rebalancing and and, you know, you have to kinda shovel coal into the furnace. And so I think those two things have really driven up an appetite for people to say, well, you know, maybe maybe for the stuff that's not the most latency sense sensitive, we're willing to make that trade off. I also think just the evolution of cloud and hardware has made it a lot more viable. So we have a lot more options, on how we do caching.
Chris:I think you your the way you phrase that question seems to be under the guise that, like, it's only object storage. But I think I kinda think about it somewhat differently, which is, I would like to keep rights in object storage for durability. And then there's a question of how do you make the rights low latency, and how do you make the reads low latency? And so on the right side, you confront it with, some external, you know, durable cache right ahead log. So that's what Neon does.
Chris:That's what AutoMQ claims to do. On the read side, you just cache the sucker all over the place. Right? So you put it on disk, you put it in memory, you put it in your local AZ, you put in your local region, And that cuts down on both the s three costs and also the latency. And so I think there's just a lot more options for that caching layer on the read side.
Chris:We have NVMe now, which is like a big thing, which is, I think made a lot of things a lot more possible. We have s three Express now. We have Google Cloud storage now and we and all these other, s s three's pseudo compatible or object storage style APIs that will give you a lot more functionality than just s 3, you know, atomic rights and super low latency and multi region buckets. So there's just, like, a lot more there than there was 5, 10 years ago in in terms of technology that we can leverage to make these things work. So I think those those are some of the I also have cost.
Chris:Right? Like, objects are just cheap. So hey. You know, maybe maybe if I take 200 milliseconds instead of a 100 milliseconds, I can shave, like, $1,000,000 off my cloud bill.
Kostas:So, Chris, what do you think is going to be the first? System bills on this new open source public. There's no open source yet.
Chris:I would like to you know, I don't know. I would like to see people leverage it to build, a lot of cheap protocol compatible databases like the Redis example I use. I think that would be really interesting. I have an aim that I would like to be able to have slate DB support the, TI KV APIs. I think that would be super interesting.
Chris:So TI KV, has kinda two levels of APIs. They have the raw KV API, which is non transactional, which I think, is definitely doable, and then they have the, you know, more transactional API, which may be doable. So I would love to just see us basically grab a lot of DB protocols and use slate DB as a storage engine to enable, you know, serverless versions of those systems. The stuff I would like to see happen. Excuse me.
Chris:But in terms of a specific DB that that that hits the you know, as the first one release, I'm not sure. Maybe I'll go off and re, and I'll implement TIKB when I get motivated.
Kostas:That's sounds, very exciting with n wave web pushing when it comes out. Alright. We don't have a gas time in life, so I want to ask something. We're only open stores. We ask.
Kostas:I don't know. Marketing, maybe? Do you save us? If you realize something, if you realize it's us, like, you discover a world with something that is enough, I like to create, like, the morphemes out there. And I think you know why I say that.
Chris:Yes. I do. I think you have an experiment by which we can answer that that question. Honestly, yes, I do. I I think that the there's enough hype and enough, you know, genuine love for the lang the the Rust language in that ecosystem.
Chris:That I I do think simply making the declaration that you want to write something and you wanna write it in Rust, and
Kostas:here's how
Chris:I'm, you know, here's how I'm thinking. Let's do it. I think you will probably get enough people who opt to to to to work for you to do it. So, yeah, I'm I'm I'm bullish on that one.
Kostas:Yeah. They really did tell me why I was boring other couple of days. Yeah. Yeah.
Chris:Yeah. I'm telling you. There are people I, you know, I follow there's a somebody on on Twitter, Shuanwo, who's just like this pro prolific Rust community advocate and developer. He does a lot of excellent work. I first came across them through OpenDAL, which is a a Rust library for interacting with all kinds it's a data access layer.
Chris:So interacting with everything from object storage, local file systems to, you know, Google Drive, over a standard API. You know, they're super passionate about rewriting data in Rust, the whole data ecosystem, in Rust. So how about it? I think there there are definitely enough people out there to to do that. One thing I have learned is that, people that are passionate about their language, are willing to tolerate an immense amount of pain in order to use that language.
Kostas:That's okay. It's true. I mean, there's also languages in there. In my opinion, it's, like, hostile, which is, you know, like like a defined thing to reach a point where your nose probably should not anymore. Like, I don't know.
Kostas:That's right.
Chris:That's where you get yelled at by the Monad people is, Haskell.
Kostas:Yeah.
Nitay:That's the world's best job security, though. Once you write something that becomes production, like, you're you're you're the man at that point.
Kostas:Hey, listen. I gotta resend. Awesome. So we are, close to the end here. What's in your mind is, basically, like, what are you looking to, like, make us aware of all things happening with, like, the reality, like, craziness out there.
Kostas:There is small like versus beta rates. It might go, almost, I don't know, like, getting into a little bit like desperate housewife kind of like situation. CEOs and peers, like, all these things. But important to all that, and fascinating for both of you, by the way, because, you both are seeing many cycles of the industry out there. I'm sure you can say probably, like, hi from some sense.
Kostas:So what excites you, and what is coming over the next, like, months or, like, 1 or 2 years?
Chris:Yeah. On my end, this thing that I've been thinking about more recently is the extent to which a lot of these trends are actually connected. And so the two trends in particular that I think might actually be related are the decomposition of the database, or composable database systems, sort of Wes McKinney's, view of the world, which is making a query engine and storage layer and just pulling it apart, so you have a bunch of building blocks. And so I kind of think about that as sort of taking the database apart vertically. So you could run all these separate libraries in one system.
Chris:And then I think horizontally stretching it apart from edge all the way to object storage. So at this sort of at the query engine layer, it's pulling it apart vertically. And at the storage layer, it's pulling it apart horizontally across all from from front to back, from client to edge to, you know, application level to zone to object storage. And then, sort of vertically from, you know, query parser through, you know, optimizer all the way down to, the the data plane. I feel like there's something there around just the the extent to which all this stuff is interrelated and unified.
Chris:I don't have a a grand unifying theory of this, but it's something that's been on my mind is is, it's it's interesting to see, just the diversity both at the data and the compute slash query engine level and how they were just getting more and more technology and more and more libraries and and tools there in our tool belt. So that excites me because it I think it means that we get it opens up the door for a lot of, diversity in the ecosystem to address a bunch of different use cases, which, again, we've been talking a lot about use cases. So that's something that's been on on my mind.
Kostas:Well, well, you
Nitay:Yeah. I agree with a lot of that. I so I think the whole kind of, like you said, the unbundling of both the vertical and horizontal is exciting to me from a from a few different aspects. One is that, you know, historically data systems, data companies, like especially databases specifically have been notoriously hard to build and costly endeavors. Right.
Nitay:If you were to come and say, Hey, I'm going to start a database company. And it's like, okay, well, do you have 5 years and $50,000,000 or something? Right. Like that, that would be like the base case necessity probably. And I think that that, and that, that, level of investment required is dramatically dropping with things like all the, what, what Chris, what you just mentioned in terms of the unbundling and like, you know, the arrow project and all these different kinds of things.
Nitay:So, so that to me is exciting because I think we're actually going to have a, believe it or not a wealth of like 10 X more data companies, right? Like in a way it's easy to like, look around and be like, aren't all data problems basically solved at this point. And we're all just kind of rehashing old things. And I think the reality is actually we're far from it and there's actually going to be an even bigger explosion of, kind of potential use cases and, and companies coming about around that. So that's, I think point point 1.
Nitay:And the other point is I think it's been interesting to see this, this, this, this is, almost like, as a personal point for me because, so I was at Facebook in the kind of early days of hide being created. Right. And hide was created by like basically essentially a team of like data scientists, last data engineers. They kind of made it to like, just do what they needed to, but it was never an end. And anybody you asked will tell you this.
Nitay:It was never heavily invested into to become a release, awesome, solid production system. And to this day, like I think at this point, finally, it's had kind of some great, investment, but it's been almost like laughably amazing to see how much impact hype has had on the ecosystem. And, and I think we're actually finally at the time, what, 20 years later now where people actually care about that stuff, Because I think that with tying back to your point of like, you know, what's happening with the Databricks and Snowflakes and so on, there's more and more of this push to like you own your data, just take a nest 3, open data formats, the icebergs, the whatnot of the world. And so now you're going to see this battle over the metadata. And because there's this battle over the metadata, I think there's actually gonna be a lot more investment in that.
Nitay:And I think the other reason for that is that metadata historically really just meant schema. And it was like, okay, this thing over here, this tool, I use this database. It gives me the schema. Like what's the problem right now? It is made of data people realize means context.
Nitay:It's context for AI, for AI. It means that thing that like everybody's telling me I need to invest more in works better. Right. So hence I need better metadata. And so I think there's, there's actually in a funny way, like finally a deep, important use case for something that has never traditionally actually gotten that much love in my opinion, but yet it was always kind of a crucial system.
Kostas:So please I'll find you again. I'm an exchange error so we can revisit all these things and also have, like, I don't know, more spicy things about, snowflake and the others spoke about.
Chris:Yeah. This is fun.
Kostas:Awesome. Alright, Chris. Thank you so much, and good looking forward to have another episode of you.