Tractable

Robbie Walzer, VP of Engineering at SingleStore, talks through SingleStore's journey and its evolution as a distributed SQL database. We discuss the technical architecture and optimizations of SingleStore, formerly MemSQL, and how it handles both transactional and analytical workloads simultaneously. Robbie sheds light on the challenges associated with characterizing performance attributes of databases, the growing relevance of semantic search, and the importance of testing for ensuring database reliability and correctness. He also shares views on the future trajectory of the database industry.

What is Tractable?

Tractable is a podcast for engineering leaders to talk about the hardest technical problems their orgs are tackling — whether that's scaling products to deal with increased demand, racing towards releases, or pivoting the technical stack to better cater to a new landscape of challenges. Each tractable podcast is an in-depth exploration of how the core technology underlying the world's fastest growing companies is built and iterated on.

Tractable is hosted by Kshitij Grover, co-founder and CTO at Orb. Orb is the modern pricing platform which solves your billing needs, from seats to consumption and everything in between.

Kshitij Grover: Hello, everyone. Welcome to the Tractable podcast. I'm Kshitij, your host, co-founder, and CTO here at Orb. I'm really excited today to have Robbie Walzer here with me, who's the head of engineering at SingleStore. SingleStore is a distributed SQL database built for both transactional and analytics workloads. And we have a lot to talk about so really excited to dig in with Robbie. Robbie, welcome to the show.

Robbie: Thanks. Yeah. Great to be here.

Kshitij Grover: Awesome. Let's just dive right in. Tell me a little bit about your background and how you got to where you are today.

Robbie: Yeah. So I've been at SingleStore for a better part of a decade at this point.

It was actually my first job out of school; I joined when, that team was around about 20 people. And yeah, I've been here ever since. As we've grown the company and you know product has evolved.

Kshitij Grover: I know that it's been a long time since you joined, and I'm sure you've worked across the whole stack since then.

And I'm sure this is actually the first question you get when you present SingleStore's thesis. Certainly the question I had: which is how is it possible to do analytics and OLTP workloads in one solution? So maybe with your lens, having been at SingleStore and seeing that architecture get built up.

Give me an overview of what that architecture looks like.

Robbie: So one thing was like (important concept in context) is, SingleStore wasn't always SingleStore -- originally founded as MemSQL. The idea at the time, as you can tell from the name, was to be an in memory database. Memory prices had been getting cheaper exponentially for the previous 40, 50 years and to really take advantage of that, you needed to have a new database architecture from the ground up. The bottlenecks had moved away from being disk based, and you couldn't take a legacy database, put your data, your index in memory because the query execution engine wasn't designed around the data being resident in memory.

At the core of that for us was having an in memory skip list and so that allowed for really high throughput updates and point lookups while still having pretty good scan performance.

Long story short, the in memory database market never really took off. Okay. It was, great initial product, but then a couple of things happened. The memory prices stopped going down around the early 2010s and data volumes kept getting bigger. So it was just extremely limiting to limit yourself by that. So over time, we evolved the product to be a lot more flexible.

So at the core, we still have the in memory row store. The next thing we added was a disk based column store. And so that initially looked very similar to the column store that you see in analytics only data warehouses. So if you're not familiar, the two main storage technologies that are used inside of databases are row stores and column stores.

In a row store, all of the data for all the columns for a given row are stored sequentially so it's really great for seeks . In a column store, all of the data for a column is stored sequentially. So that is really good for scans but really bad for seeks. So over time we kept getting pushed by our customers who wanted the the best of both worlds.

They didn't want to have a really expensive... too expensive for OLTP. And they loved the analytical performance of the column store and that it was low cost of ownership because it's highly compressed and stored on disk. What we ended up getting to was actually a technology that we initially called single store and then renamed the company after - was to take the column store and first make it seekable.

So rather than having to do a big batch scan if you wanted to read one row, you now just had to to read the data that you needed. And so we went through all the various encodings and redesigned them.

Then the next thing that we did is -- so typically and if you want to do a high throughput, a lot of updates, you need to have fine grain locking. And so a lot of column stores are going to lock at like the the block level or even table level. So the next thing we did was move some of - enable some of - the optimizations that we had for row level locking on the row store on top of the column store.

And then the next -- so that's seekability and really updateability-- and then the next thing is a further enhancement of seekability that's just adding secondary indexes to the column store. And so those 3 things combined really revolutionized were what enabled us to actually change our default table type to column store because we were so confident that the vast majority of workloads could use that for both transactions and analytics.

And we actually ultimately renamed the whole company after that project. So we had to rename that project to universal storage. That's what you see when you look at our docs today. It's called universal storage, not SingleStore. Yeah: that's the secret sauce.

Kshitij Grover: Okay, got it. To reflect that back at you, you were able to add a lot of properties that you would get in a row storage format to this column storage underlying storage engine, right? That's interesting, and it sounds like you're able to get some of these query characteristics. Is it also true that you're able to get the same sort of transactional guarantees and the properties that you might expect?

Robbie: So the nice thing is that our metadata is actually still in a row store table - an in memory row store table. And that's obviously - I say in memory - but it goes without saying that's still durable to disk through write ahead logging.

But what that means is you get all the same transactional guarantees around acid and isolation, and that's where, the ANSI isolation levels are implemented and... as a relational database, that kind of goes without saying.

That's table stakes.

Kshitij Grover: Okay. That makes a lot of sense. And one of the things you said there was your customers really wanted to have the flexibility and as you were saying the low cost of ownership of a traditional analytics data store so they can compress all of this columnar data, but they also wanted these guarantees and seek performance that you get out of an OLTP store.

But in some sense, I would imagine that these audiences are traditionally pretty different, right? Like the person looking for these analytics workloads is the data science team probably at an organization, whereas eng and product, the kind of production data stores looking for something like Postgres.

How do you think about that? And are those buyers distinct? Or have you seen those kind of converge over time?

Robbie: I would say yes. So traditionally in the database markets, you had your application, which was maybe a CRUD application - we'll call it OLTP - and all of your analytics happened in your data warehouse which was a separate team.

You had an ETL process that moved data from your online application database to your data warehouse. Not only because they were separate systems, but also because you wanted to have a different schema for analytics querying versus operational querying.

Kshitij Grover: Exactly.

Robbie: To a certain extent, for a lot of things, that is still true.

But what we see in a lot of our customers is that the application itself is needing more analytical capabilities, whether that's a real time trading application. If you're making a billing application you want to do aggregations to accumulate or figure out how much usage somebody accrued within some time period, right?

And that's where Postgres is a great database, but then once you have a certain level of data volume and you're doing analytics where you don't want that delay of going over to a more analytics focused system, that's where the power of having that both transactional and analytical capabilities comes in play

Kshitij Grover: Interesting.

So it sounds like what you're saying is that production applications themselves are moving to a world or products are moving to a world where they need that analytics capability in this sort of sub second query performance. They need to not have something like Redshift or Snowflake where, you know, not only might a single query take two or three seconds to just spin up, but I think also some of these data stores have pretty large like maintenance windows where you just can't rely on them to be backing your production application.

Robbie: Yeah, and another thing is that once you introduce Multiple systems for the same application. That's like an operational overhead. You have to train your engineers. You have slightly different-- everybody's implementing ANSI SQL -- but everyone has like a slightly different flavor of it.

So you might not have like fully portable queries between these systems. They're going to have like slightly different behaviors and feature sets that they support. In some cases, like our operational - we don't run our operational workloads for like the control plane on the same deployment as our internal data warehouse.

But it's nice in that - because we want them to be isolated - they are different schemas, like I was saying before, but it's obviously very convenient that you can run a query against one or the other and it's all in SingleStore.

Kshitij Grover: It sounds like the thought is not that it's literally one database, right?

Because you're saying you might want to workload isolation. You might literally have different performance guarantees that you want out of multiple deployments of this product across your organization, but you don't want to deal with different like query syntax, even as a starting point across different systems.

Robbie: Yeah, that's just one example, but then when going into operationalizing like a datastore you've got to monitor it. You've got to deal with upgrades and things like that. It's going to be different in all these different platforms.

So if you have a single system it's just lower overhead.

Kshitij Grover: Yeah. And I want to push you a little bit on trade offs, right? It sounds really good, right? I can run very different sorts of queries on the same datastore, but what am I losing going down this path? Is it that I'm trading off performance?

Is it that I'm trading off some more sophisticated queries that SingleStore has to compromise on to, to support these requests?

Robbie: So we have like a design tenent and that's, this is like part of what's allowed us to survive past being an in memory database was that we didn't want to pick design decisions that precluded us - even if almost we were really confident that was like 100 percent the right thing to do for everybody - that precluded us from being able to support other types of workloads.

For example, for us - compared to- let's say- a dedicated cloud data warehouse, we have separation of storage and compute for a lot of our object files but actually metadata is still to a certain extent tied to compute. So that allows us to run those transactional workloads and have really low latency writes while still having pretty good elasticity of storage and pretty good elasticity of compute.

But on the other hand, it's not going to be as elastic as a system that is 100% designed to be something like BigQuery, where it is completely separate from and isolated. There still is a benefit to specialized systems in some cases and, for another example: like we have time series capabilities and I think our time series functions are probably good enough for a large portion of people... like most people aren't going super deep in time series, but then there are going to be workloads where you need the full, every possible query feature of time series. You want the most optimized storage format specifically for time series. And then it makes sense.

Go use that. But I think our observation is most people don't have super specialized applications like that.

Kshitij Grover: Yeah, it's

Robbie: a more general

Kshitij Grover: purpose workload for the large majority of people. And I know, for example, that talking about in-memory data stores...the folks at DuckDB have this thesis that your data is not as big as you think, right?

Like usually the queries that you're executing are operating on a very small subset of your data. And it sounds like there's a kind of similar thesis here.

Robbie: Yeah, it's a corollary. Yeah. In their case they're saying everyone was obsessing about big data, but you don't really have big data.

Yeah. And I think like the corollary is, okay, everyone's obsessing about like specialized databases for X, Y, and Z, but your workload is actually not that specialized. Yeah.

Kshitij Grover: So actually speaking of specialized workloads, I know that the new hot thing right now is AI and semantic search, right?

And so that's another case where there's a crop of specialized data stores popping up. And so maybe before we even dig into how SingleStore thinks about that landscape of tools, maybe you can give me a summary of what's semantic search and why is it hard? Why would you consider, picking up a new database just for this problem?

Robbie: Yeah, great. Great question. So first before going to semantic search is traditional search. We just did keyword matching which works pretty well. In semantic search, the idea is that we take some object - whether it's a picture or a sentence - use an AI model to transform that into vector space.

And then now if we have two of these we call them embeddings, right? We can see how similar they are based on their distance, whether that's dot product or Euclidean distance. And so now -and that is basically saying like- how close are these two things in meaning? And so at a very basic level, you can just do that with a dot product or Euclidean distance function with brute force, but that is too slow.

If you're, scaling to billions of objects and higher dimensions.

Kshitij Grover: And higher dimensions

Robbie: Exactly. Yeah.

Kshitij Grover: Okay. Yeah. So that makes sense. It sounds like that there's a real algorithmic bottleneck or I guess a performance problem people are trying to solve. So what's SingleStore's take on this? Is it that maybe if you're doing something at a very large scale, you might need a very specialized data store, but for most AI applications, even then you can use a general purpose store? Is it that same flavor of thesis?

Robbie: A little bit different. I would say the core of this is that you need to have an approximate index or to do approximate nearest neighbor search.

That's what a lot... so you have open source libraries like Faiss and newer full databases Pinecone and Chroma. So our position on this is that an ANN index is essentially just another index type. So we're really good at building indexes and adding new storage types.

And so that's one thing that we're working on, which is just adding that as an index type to our columnstore table, which is the default table type. Where I think that's going to differ, and like our strength relative to the specialized systems is that if you want to combine, say, your vector search with the actual where the data is originating and do that in a single query.

That's a lot easier when you're doing your vector search in the same system as your application database. Yeah. Another thing that I think is interesting about semantic search compared to classical search is that if you look at a system like Elasticsearch or Lucene, which Elasticsearch is based off of... there's a lot of complexity in both. There's configuration around stemming, around boosting to get the scoring algoritm properties that you want. And like a lot of complexity goes into having to configure all that. And so like for us to build out all that functionality would be massive.

Whereas the interesting thing, I think, with semantic search is now a lot of that complexity is out of the database.Whether you're a specialized vector database or a more general purpose system, it's moved into the model. And I think long term, I think, less of a need for a specialized system compared to classical search.

Yeah.

Kshitij Grover: So it sounds like you're saying the secret sauce has moved into generating the embeddings themselves, right? And you do have a fairly similar problem that you can solve across databases without needing all of that configurability. I think one way I think about it is presumably you might reach for a new data store if you expect the workload to be very different or even like your tolerance to failure modes to be very different.

Do you think that does exist? Do you think there's a, the ecosystem or even the architecture surrounding the database would be different for something like a vector store?

Robbie: Yeah, and obviously I think potentially different architecture... that's something we can work around but it just adds some complexity.

For example, it's really important to have efficient build times. You need to train your index on a GPU. It's going to be way faster and, we're obviously fully CPU based because for most database workloads it doesn't make sense. So there's definitely some benefits in specializing there. Another thing: if all you support is semantic search, you can have a really simple API for semantic search.

From a developer experience, you might be able to have an easier developer experience relative to a system that supports a lot more functionality but is SQL. So whether you like SQL or not is -obviously I'm biased- I like SQL, but not everyone's a SQL person.

Kshitij Grover: And do you think that in this domain, like SingleStore has an advantage against something like vector search in Postgres or one of these Postgres extensions that are being developed that again still leverages SQL but maybe bakes it into the Postgres query engine. Is it the storage layer that you think you, you all would have the advantage in?

Is it just like how you construct the index yourself or the search algorithm? Yeah.

Robbie: So I don't think... Compared to Postgres the vector indexing algorithms to a certain extent are- they haven't maybe quite yet but over time will just be commoditized.

You already have open source libraries like Faiss that are just well known. And to a certain extent, this happens to a lot of the underlying like storage technology in databases.

Everybody knows how to build a column store to a certain extent, right? Where we differentiate from a system like Postgres is in - which by the way, I think Postgres is a great database - is in our ability to scale. There will be a point where, if you're running your workload on Postgres, it just won't be able to keep up anymore.

And you hit that a lot sooner, obviously, if you're doing something like analytics. But if you are a scale out system with customers running on clusters with thousands of cores... that's just not something you can do with Postgres.

Kshitij Grover: And actually that's, that general point is something I wanted to get into where, you, as you were just saying, you've been at MemSQL and then SingleStore for almost a decade now.

And what I want to get your insight into is how you unfold the complexity of building a datastore over time. So at the beginning: to what extent did you hand optimize workloads for your largest customers versus trying to be as general as possible on day one? So what was that trade off story then?

And then how has that changed now?

Robbie: Yeah, that's a great question. Especially in the earlier days around when I was starting... when I started for the couple of biggest customers we got that year, we were trying to push out of being just a special purpose database where we didn't want to just be an in memory row store.

And so as a result, we were trying to get customers that were like beyond our capabilities at the time. To do that, we had to essentially really understand what specifically what their workloads were doing. And do we think - first like sniff test it - do we think this is something that other people are going to do? But yeah we did build in our query optimizer... there are optimizations that were specifically for those six distinct queries that customer really cared about. I think, on the one hand, obviously you think: "oh, you did all this work to get one customer that maybe doesn't generalize". I think it was useful for us because obviously we got that customer and they needed it, but also you learned a lot about what mattered.

So then when we, it was very, both fun and when we later went back and generalized a lot of those optimizations to make them useful to a lot more workloads, we knew that the work that we were doing there was actually worth it because we had already validated that with one customer.

Kshitij Grover: I was just going to ask, did that change a lot going from an in memory data store where you were optimizing workloads that depended on that architecture versus, thinking about scale out and SingleStore as a practice?

Robbie: Yes and no. Obviously there are a bunch of different query optimizations that go in into a database like ours. From a distributed query processing perspective, that was all orthogonal, but from optimizing the types of query execution strategies and optimizations that make sense on an in memory row store versus a disk based column store... so for example, like nested loop join is really fast on an in memory row store. You basically never do a hash join because the skip list is so fast.

Whereas with a column store...like hash join... you want to have the fastest hash join possible. And your optimizer needs to know how to use that. So there is like a very different set of query execution and query optimization techniques that matter.

Kshitij Grover: Okay, yeah, that makes sense to me.

And so now, fast forwarding to today, I imagine that presumably you're not hand optimizing for specific clients, or although maybe even on the enterprise side, that still makes sense. How do you think about that ?

Robbie: Yeah.. happens a lot more rarely. And today it's more of... because query optimization and like database workload optimization is very interesting because you can have 99 percent of the workload really well optimized, but you have a couple slow queries and it blows the whole thing up. Like you can have every query under sub second and one query that you pick a bad plan on takes a couple of minutes and it just doesn't matter that you're so fast on everything else.

So it does at a certain point..it feels once you get to that point in query optimization development that you are building a lot of special cases. It turns out over a large enough number of customers that they aren't special cases. Like even if it might be something that shows up in 1 percent of customer queries, that's still enough that it's worth it to have that optimization because the alternative is just way too slow.

Kshitij Grover: And optimization versus configurability is an interesting question to me. I imagine that some of the optimizations you want to do have to conflict with each other, right? Like just being sure of different workloads. Sometimes you have to prioritize one thing over the other. how much does that make its way into the overall architecture versus that just being something that you can configure within SingleStore, right?

Like, how do you think about, okay, this is like a database level trade off we have to make versus this is something we can let our end user control and say: "Hey, I want to opt in to this sort of workload, because I know as the person in control of that... that's where my queries are going to lean".

Robbie: Yeah, so for us, for better or worse we are extremely configurable. Okay. We have many different table types... or we have the row store and column store, and each of those have different have several different index types, shard keys, sort keys, full text indexes, hash indexes and so it gives people a lot of tools, a lot of options to... make the system work for them and that's for better. For worse is that's a lot of complexity. The long term vision obviously is- and this is a research area- I think that is really interesting for applying AI to improve databases is to make a lot of that more automatic.

You say: okay I want an auto table. You give us some suggestion and tell us your constraints and we optimize that for you based on the queries that we see. Obviously we don't do any of that today but that's something that we're starting to think about.

Kshitij Grover: And actually, I've always wondered that. Why don't databases generate or at least suggest indexes and constraints for you?

What's your intuition on, maybe even outside the context of SingleStore, why hasn't that been an obvious improvement to existing databases?

Robbie: It is a huge research area. Okay. I think it's starting to pick up steam. It's just so hard to get right.

Yeah, and the other thing that's really interesting is people are generally very risk averse in terms of their database workloads. If I don't want to do... I care much more about -if I make some optimization that this tool suggested - I'm much more concerned about losing 2x performance than gaining 2x performance.

Because, now my users are going to notice that we're 2x slower versus, if I have 2x faster, they're going to be happy. They're going to be way more angry than they would be happy. So that loss aversion is definitely something that I think we deal with.

Kshitij Grover: Yeah. Shaving 200 milliseconds off of a query time may not be as appreciated as, how angry someone gets if add a couple seconds, right? Yeah. Because you've got some optimization, right? Actually that's a topic that I imagine you have to deal with a lot because even internally or, we think of our database as a tool and not necessarily a service. The difference that I see there is I think of the database or the datastore as a core part of my architecture, not really something outside of it. And my sense is this has some pretty interesting implications for the reliability contract between the core service and the database or datastore that you choose to use.

So how do people think about downtime or even performance as it relates to SingleStore? It feels like just having an SLA is not enough. Or I imagine that conversations get tricky even if you do have an SLA. So yeah, tell me a little bit about that conversation.

Robbie: Yeah, obviously, having that SLA isn't enough... even if you're meeting your SLA, if you have an outage or you had a performance problem, people are very unhappy.

I think the way that we see it is that our... obviously we still have a self managed product. Most of our new customers grow on our SaaS product. And the reason for that is that you just... you're getting a lot more capabilities when you're running a database as a service versus when you're running that as just like a piece of technology within your stack. You have a whole team of SREs obviously in the background that specialize in that database, but that also comes with upgrades.

There's somebody looking at when an upgrade happens... did it... is there a regression?And you don't really have to worry about that and you don't have to specialize in understanding that. It comes with managing the storage for you.

So I think you're right but at the same time...because you can just get so much more out of - and this isn't true for all database services - but you can at least have the capability of offering a lot more and removing some of that complexity that for a lot of people that aren't database specialists that is a reasonable trade off for them.

We do still offer our self managed product and there are some customers that like, they really love databases. They consider themselves like database people themselves and they want to run their own database. And that's more power to them.

Kshitij Grover: Yeah. And I imagine that when we think about something like performance, obviously it's a function of the customer workload and that being said, if my database isn't keeping up and I'm using a service like SingleStore the managed product... I imagine that as a customer, I'm still going to go to you and be like: "Hey, why isn't this working?"

So is that something that you encounter a lot?

How do you navigate that?

Robbie: Generally, it's always, guilty from the customer's perspective...it is guilty until proven innocent. So what we have done is just built a lot of profiling and observability tools so making it very visible, how much cache you're using... how much CPU you're using... what are your most expensive queries and then within that what are the most expensive operators and things like that.

So that should make it pretty clear okay you're saturating CPU. That's why your query latency is going up. You need to scale or you have, this one query is taking up, 50, 70 percent of the resources on your cluster, you should go be going and investigating... and sometimes that query being slow is the result of a missing index in which case we're exonerated and in some cases the query optimizer makes the wrong choice in which case we go and make a fix. But it's definitely guilty until proven innocent. And so we want to make it really easy for the customer to understand where their bottlenecks are and big adjustments.

Kshitij Grover: Yeah, I guess it's about giving them the reasoning tools to be able to say what are my next steps? Do I actually have to go to SingleStore and be like: "Hey this query plan doesn't make sense.", or do I have to throw more money at the problem or whatever? And so we've talked a little bit about reliability. Of course, maybe the thing that even comes above reliability is correctness.

That's super important in your business. How do you approach testing something like SingleStore, especially over the course of a decade...it evolved so much and the sorts of surface areas you're saying with the configurability is...

Robbie: Yeah for a database, reliability and correctness are our bread and butter. If we can't do that, we have no reason to exist.

So we've always been pretty fanatical about testing. So we have a service - internal service - that we call Psyduck. So it's about a couple of thousand or maybe a thousand or so servers. It's actually in a colo in Emeryville. So it's actually, it's a workload that would be - for us - it would be pretty cost prohibitive to run in the cloud because it's not elastic. It pretty much runs 24/ 7. During the day you have engineers that basically can submit any test run and then get results - any of our like 200,000 or so tests - and get a result back in in an hour or so.

And then over overnight it runs I think about a hundred, 150,000, 200,000 tests every night. And so that's really powerful, and the tests run the gamut from being things that are as simple as: insert some data... create a table, insert some data, run some queries and check the results compared to maybe MySQL or Postgres or some safe results. Or, like at the end of the spectrum, it's going to be something like run a full workload simulation for an hour that creates a bunch of nodes and it's like randomly killing them, creating failovers or like specific scenarios. So it allows for a lot of flexibility in testing and scalability of testing.

So that's definitely allowed us to...been really valuable for us over the years.

Kshitij Grover: I have a couple of questions. Maybe I'll start with: why not run that in the cloud? Like you said, it's in Emeryville. I imagine that's not a very traditional choice.

Robbie: It would just have been... when we costed it out... we originally ran it in the cloud and because it's running... cloud is great for elasticity. But it is a workload that's pretty much running at full saturation all the time. And so it's not elastic, so we don't really aren't getting like a cost benefit there. And it's also obviously it has an internal SLA to our engineers, but it's not a production service.

So it was a little bit more tolerance there though, obviously our tolerance to outages there has decreased over time. It used to actually just run in a closet and as we've grown now, it's actually in a colo. So as a result, we gave up some flexibility and a little bit more responsibility on ourselves, but just saves over the years... probably millions or tens of millions of dollars. Yeah.

Kshitij Grover: Do you have interesting strategies for how to generate these tests? Is it like... okay, I'm introducing nested fields or a new type of index, and I go and write a bunch of tests as the engineer working on the feature, or is there a way to generate a more comprehensive test suite that you all have thought of over the years?

Robbie: A couple of different strategies, depending on the feature area. For some things it can very simple. It can be sanity tests for query testing. We have what we call random query generators, RQGs, and depending on which RQG, you can think of that as basically a reverse grammar to generate SQL queries, right? So you have the corpus of specific queries that we're trying to generate or permutations. And then it's just going through and randomly, putting them together to create different queries. And so that those find tons of bugs in query execution and query optimization

In storage and clustering,that's more of a type of stress testing I was talking about before where you're creating a simulation and killing nodes, creating failovers, trying to create scenarios where you might lose data.

Kshitij Grover: Yeah, that's interesting.

I think the reason I'm asking is because to some extent, we're thinking about a similar sort of problem where, as you can imagine, even in something like billing, you can be in thousands and hundreds of thousands of permutations of states where a subscription might have been canceled in a very specific way and you have the series of plan changes.

And so it's tricky to figure out how do you test that surface area comprehensively. I imagine you all have put a lot of thought into... the syntax on the testing front, which is how can you get engineers to do the datastore set up and write these queries and create the kind of underlying boilerplate for these tests in a way where it's not expensive to add to the testing service.

Robbie: Yeah, so definitely like the RQG is those are really nice in that you have the framework and every so often we need to build a new framework because we're doing something like substantially different. We have built up a number of them... enough of them over the years that rarely happens anymore.

And you just add a new mode that's testing your specific query feature. And then it's... okay I add a new feature and it's actually like pretty simple. I can now test the full orthogonality of this specific operator with basically just about every other query execution feature that's in this RPG.

Yeah, that's that's really nice about those and that's something you could never do with manually writing tests.

Kshitij Grover: Yeah. So we've talked about reliability and correctness. I think that was probably the right place to start for a datastore, but I'm curious in your lens, what are the hardest technical problems that, that you and the team are working on now?

Maybe this is something that's coming up in SingleStore or something that's like an ongoing technical challenge.

Robbie: Yeah. One thing we've been working on for the last couple of years is we started as a self managed in memory database and have been moving towards more of a general purpose cloud database.

It's moving a lot of the services that are currently, you think of as core services within the database, and moving them actually outside of the database. First thing that we did obviously was building separation of storage compute so that object files are now resident on S3, but actually the metadata is still entirely resident on a cluster.

The next stage for that was now building like... metadata for those object files are now being moved outside of the cluster. So that enables things like actually data sharing now. Now that because that metadata is separate, you can share data across clusters. You can do things like branch a database, create a new cluster on that branch, and they're now completely independent.

And like other services that we'll likely start to do, are moving index construction off of an index maintenance and into a service. That feels like a pretty core database operation but when it's built as a service, now you can throw as much compute at the problem to keep your indexes up to date without disrupting the actual running workload at all... so provide like a way better way and way more predictable performance than we could otherwise. Another one that we're kicking off is doing database ingest as a service. So if you want to do like a big load into the database, it will just go use as much compute as it needs to do that load and do it really fast. And you don't have to like to think about provisioning that. So those types of things is I think those are like the things that I'm at least on the database side, the most excited about and I think are the most technically interesting that we have on the horizon.

Kshitij Grover: Yeah. I was going to say that those feel like...even if you have the perfect data store, they're the operations around how you productionize it and how you use it on a day to day basis. With branching, how do I create replicas without having to copy a bunch of data? With something like ingest... great, this datastore works, but now I have this problem of migrating to it, right? And potentially backfilling it with a lot of data.

So would you say that's the right frame to think about it? Cause I also heard a little bit there around performance predictability, maybe about durability, maybe about even just scaling the database.

Do you think that there's like a single focus around these features or is it a little mix of all of those things?

Robbie: So I would say if I would give a single focus, it's that we're enabling experiences and use cases that weren't possible in an on-prem or a self managed database.

And so the goal is to have a better overall developer experience. And the theme that ties those together is we're doing things that just weren't possible before.

Kshitij Grover: Yeah, that makes sense. So I think the last question I wanted to ask you was... there's a bunch of datastores around today especially when it comes to things like real time processing...what we were talking about before of real time analytics workloads.

And I'm sure there's design tradeoffs in all of these datastores. Thinking more at the macro level, how do you think about the trajectory of this industry? Are these specialized datastores going to survive the decade? Do you think there's going to be some consolidation? Do you think that it really is about getting to some sort of completeness milestone?

Like, where does this go 10 years from now? I know that's a big question.

Robbie: Yeah. I won't make predictions about who will survive other than I think we'll survive. But I think the interesting thing is that there have been a couple of waves or like explosions in the number of databases.

You saw this kind of in the early 2010s with the NoSQL explosion and now the lone survivor is really MongoDB. There are a couple other small ones, but MongoDB is by and far the biggest one. So I think like the market... and MongoDB actually isn't even specialized anymore, like it has analytics and SQL and all that stuff... so definitely the market can support a couple specialized systems. But I think the key thing is that, like I was saying, I think I said this before, is that a lot of the underlying technology is going to, if it hasn't already commoditized, it will commoditize.

So the ones that survive and thrive will be the ones that can provide the best developer experience and put the pieces together to provide that. I think different systems are going to go after that in different ways and pick a different set of tradeoffs... some for the easiest way to get started, you can "pip install" your vector store and get started. On the other hand, like for us, it's the fact that you can do all these things together. I think we'll be interested to see like which of those matters more. And to a certain extent, I think both, but yeah, we'll see.

Kshitij Grover: Great. Yeah, that's a good answer. And I think that also to me centers SingleStore in a new frame where it's not about performance or workload unification in this like abstract way. It's about the developer experience of having to run these workloads in separate datastores and separate environments is really not good...and so consolidating that. It's really about: how do you make the developer at a company not have to do as much work and just have a better experience working with the data? Yeah, absolutely. Thanks so much, Robbie, and it was really great talking to you and appreciate the conversation and your insights.

Robbie: Yeah. Thanks. Really enjoyed this.

More episodes

Chapters

What is Tractable?