Tech on the Rocks

Nitay (00:02.188)
Ben, it's pleasure having you with us today. Thank you for joining us. Why don't we start with, give us kind of a background and your past work experience.

Ben (00:05.624)
Yeah.

Ben (00:11.268)
Yeah, happy to be here. Thanks for having me. Starting back from the beginning, out of undergrad, I started a startup called Data Nitro. We were integrating Python with Excel. So this came out of doing a little bit of work in finance and having to use Visual Basic. It was a kind of good enough idea, like...

painfully good enough where we went through Y Combinator, we raised a bit of money outside of that as well. And we had a bit of traction and some really excited users. A bunch of people learned to code using our product, but it wasn't really a great business. So worked on that for, I think, a little too long based on how enthusiastic people were. After a couple of years maneuvered into

freelance data science consulting, which was really fun. Learned a lot about selling and like understanding how to make sure you're doing something actually valuable, not just how to write a bunch of code. That's like good, clean code. Eventually ended up realizing I had like built myself a full-time job doing sales that I didn't really want. So I went to Uber for a year.

worked on distributed systems there, then moved over to Google, worked on Search for about three years doing machine learning, natural language processing, which was really fun, and then switched to systems stuff in Google Storage Infrastructure org. So doing just high performance code and making sure the storage that underlies like GCP, but also like all of Google's products was fast and efficient.

And then about a year and half ago, I left that there with the idea of combining the ML parts of my background alongside all of the cool stuff happening in ML and AI today and the systems performance part and started a new company, which is what I'm working on now called Espresso AI. We use machine learning to optimize compute.

Ben (02:32.036)
We're currently working on Snowflake and helping customers save tons of money on their Snowflake bills.

Nitay (02:38.958)
That's fantastic that sounds like quite a journey We'll get to the espresso stuff in a minute. Certainly. Why don't we start with a little bit in the beginning? You mentioned something interesting there with with data nitro I think a lot of people don't realize how prevalent visual basic actually is So I tell us a little about that. What do people use a visual basic in Excel for and what were you trying to replace it with?

Ben (03:01.59)
Yeah, I mean, we started out, this was like during an internship. So I went to school at MIT and I did a kind of quant fund internship where everyone was doing modeling in Excel, but the fund was largely MIT alum, so everyone knew how to code. And so everyone was extremely unhappy with Visual Basic. But ultimately, you know, we were pulling data in from Bloomberg, we were

building stuff around it. that's kind of, instead of having a proper API and a proper programming language, you know, this was the UI, this was what the traders understood. And so that's kind of what we had to work in. And this was still a little bit past the financial crisis. So there was a lot of upheaval going on in finance orgs at the time thinking about

you know, how can we do better? And for example, like one of the things that I learned was that Bear Stearns ran a lot of its books in Visual Basic. And, you know, there are a lot of stories about what went wrong there, but like, I imagine that didn't contribute to things going really well. I remember also hearing at the time that Boeing ran some like

light simulation calculations using Visual Basic in Excel. And, you know, it's not a good sign, right? Like that's, I'm not excited to get on the airplane that was built in a spreadsheet with VBA. And I think what we saw was that, you you had people who were effectively doing software engineering and real power users, real power users of Excel trying to

Nitay (04:35.406)
You

Ben (04:55.618)
get better and extend what they were capable of doing. And that's how they ended up, you you start recording some macros for maybe listeners who aren't aware. When you're in a spreadsheet, you can press record and you can record a macro, which it'll just, you know, play back whatever series of actions you took at that step. That gets generated in a language called Visual Basic, which is a, it's not the best language you've worked with.

you know, it's missing quite a lot of the stuff that you might want, but then you can go in and you can start editing those macros. And that's how a lot of people get into it. And in fact, like a lot of people get into programming that way. So yeah, the idea was like, this is terrible. You know, how are people using this in this day and age? This should obviously just be Python. And that's what we tried to build. you know, about 10, 11 years later,

There was a Python integration into Excel released by Microsoft. And a lot of people reached out to congratulate me for my, you know, mistimed vision. So there's a better alternative now, but VBA is still super, super prevalent. And it's just people building complex models, like consultants doing things and all sorts of, the long tail of use cases is crazy. Like people running like,

you know, like literal factories out of like spreadsheets that they've written with, you know, no testing, like no validation. Like you look at a large enough spreadsheet and it's just sort of like the only thing I know about this is that it's wrong. There's no way this is correct. It's insane. I can't tell you where it's wrong, but this is not, a hundred percent does not do what you think it does.

Kostas (06:41.867)
Ben, quick question. So I think what you're describing right now is very good example of how important timing is, right? Like many times you have the right idea, but the time is just not right. And I think the industry is full of these stories. I think many people can tell you that, I started doing this 10 years ago and if I did it today, I would probably be the next big thing.

Having gone through that, do you think there is some kind of playbook or some way that you can tell if the time is right or not?

Ben (07:22.754)
Yeah. So when I did YC out of undergrad, I got the same advice every single time, every time I talked to any of the partners, which was, you know, you have to go talk to users. And because I was straight out of undergrad, I was like, I'm not going to listen to anyone about anything. like, don't, I don't need anyone's advice. And so that's, that's a big part of what we got wrong.

Like we had a small number of interested people, but we didn't go out and do market research, right? You just kind of like, literally you need to go out and ask and you need to make sure there are enough people interested in what you're building that you can get to an MVP in a reasonable amount of time. And this is, you know, this is more doable with some things than others, right? I think like the iPhone is a great example of

like, man, it's hard to mock that up and you just kind of have to take a bet. But with a lot of software, you can go out and do that. The current company is an example of like, it's something that is working really well now in a cost cutting climate. A few years ago, people wouldn't have cared, right? With zero percent interest rates, like costs were just not a concern.

And even today, like we talked to some big companies and if they're growing super profitably, like it's just not a priority. So yeah, I think that like, if you're, you know, if you're sitting down and you're building something that is obviously going to be a slam dunk, right? You're sitting down and you think I'm going to write a database and it's going to be 10 times faster and cheaper than Cassandra. That's probably going to work.

Right. If you're sitting down and you're building AGI, like I think Sam Altman had a great quote early on where someone was like, well, what's your business plan? And he was like, well, I don't know. Like, we'll ask the AGI. It'll like sort itself out. But for 99 % of products, like you really ought to be having people very, very excited about what you're building.

Ben (09:36.654)
I actually have an example of that from the tail end of Data Nitro. So we always had this sort of lukewarm interest in the main product. And I thought that that was real user interest. And towards the very end of working on that, we built like a completely different thing. We built an add-in that would just make spreadsheets run faster. all this, the only thing this did was this took

V lookups for people who are into Excel minutiae and just like cached and sorted the table that you're looking at so they would run in n log n time instead of quadratic time and management consultants were blown away because you would have like quadratic algorithms are really bad so you would have these people you know talking about how they spent their day clicking and dragging down this you know formula like

100 rows at a time, then going in and getting a cup of coffee and coming back. And that would be eight hours of work. And then they'd install our ad in and it would just work. And that's something where, you know, we launched a prototype. We never followed up. We were just like out of steam. But I got people reaching out literally for years.

And I would always be like, how did you like, the company is clearly defunct. We don't have a website. The email, like, I don't know where you found this email. How did you hear about it? And, you know, it'd be like, well, you know, someone had a binary floating around and like, they emailed it to me and there's like LinkedIn, internal newsletter, like a year and a half ago. And I was just thinking to myself, man, it would be great if this worked. Like, I know it's out of business, but like, do you have it? Can you, can you send me an updated version? And I was like, no, I don't.

Like we're not doing that anymore. I have a different job now. But I think that's the example of the sort of enthusiasm. Or with our current company, the first thing we did before we had a product was we asked people for data so we could start training models and understanding what was going on. people would straight up read our instructions, never speak to us, email us like CSVs.

Ben (11:50.294)
of all of their snowflake blogs. that was awesome to see. I think if you get something like that, that's a great sign that you're onto something. Really extraordinary interest and enthusiasm. And I think an analogy is like, it's a little hard, it's kind of like, most people will tell you your idea is good. Most people are not straightforward or mean. But if your idea is good, if your idea is timely,

It's like the difference between like a good first date and a bad first date, right? no one's gonna be like it's hard to explain to someone who hasn't had both but like it's so qualitatively different that you if you're wondering like you probably haven't You probably don't have

Nitay (12:38.702)
Yeah, that's such good advice. find often that, as the saying goes, it's easy to get praise, it's hard to get the truth. And this is why often it's much easier to get somebody to say, you know, I love this. yeah, I would totally use it. And then when you ask, okay, so how much did you pay for it? And they're like, well, hold on. And then you start to get the real answer. And that's such a fascinating story. I mean, you guys basically built like indexes for VLOOKUPs in Excel, essentially. That's amazing.

Ben (12:55.48)
Yeah.

Ben (13:06.18)
Yeah, it was fun. The sad thing is there was so much work on on our end. would have been like, not seeing the code base, I probably could have gone to Microsoft and made that change in like an hour. And it would have been massively, massively impactful for so many power users. And maybe they fix it by now. I know. It's been a while. I don't know.

Nitay (13:30.35)
Yeah, I I imagine you guys had to do it in some like hacky built on top reverse engineer way because I highly doubt there was some clean API there that's like, here's how you plug in a different storage mechanism.

Ben (13:40.74)
We made a fake function. I think we used the Greek letter O in VLOOKUP. So we fished the VLOOKUP function and then had a button that would automatically change all of your VLOOKUPs to our fake VLOOKUP using the wrong alphabet and just run in the background that way. I'm sure they're probably...

Quite a lot a while ago in terms of engineering experience. I'm sure we were like, you know, had tons of memory leaks and you probably would have had to restart your spreadsheet after a while. But like, yeah, it was not straightforward.

Nitay (14:19.374)
That's amazing. And then it sounds like post-it and actually it sounds like you hear the YCA device because you mentioned you went into consulting and so at that point I imagine you were working very closely with clients. Tell us a bit about that.

Ben (14:31.894)
Yeah, so anyone who is like between startups, think doing any for anyone technical, I think doing some amount of software freelancing is a really, really good experience. I think I learned a huge amount about sales in the first six months and just a ton about like, you know, what part of what you're doing as an engineer is actually

delivering real value to someone. And it's the sort of thing where like, you know, you can always like hear about it for sure, but like, customers don't care five unit tests, right? Like the customers care that the software works, right? They don't care what language you're using or, you know, like what, like react, like what JavaScript framework you're on. Like they care about like, is the software good? Does it solve a

And I think just like the way that people use what you can do is also super informative. But yeah, it was lots of fun. It was more of a data science angle. I would sort of like, you know, like I had a big project with a book publisher, and we were just kind of going through

how they did certain projections and how they did inventory management and they had some heuristics and it was sort of like, you know, they were like, it so happened in that case that there were maybe 10,000 different scenarios and like you could just brute force the answer in a Python script, right? And so most of that was talking to really non-technical people, understanding like what are they doing and then figuring out like what you can build to make things.

better for them. And I think that's the core of like what you do when you start a tech startup also. It's less, you know, as a consultant, there's a bit more of like, I just need to find one person with one big enough problem to like help me pay rent for the next few months and less about like, you know, is this scalable? Are there 10,000 people that would be interested in this? But yeah, there's a lot of interesting one off projects.

Nitay (16:53.55)
How much to that point, like how much, I don't know if infrastructure is probably the wrong word, but like patterns or repetitive things that you build up over client to client that was always, each one really truly its own spoke thing that you start from zero.

Ben (17:08.164)
For me, it was really bespoke. A lot of the initial conversations came out of Data Nitro, where we ultimately found it easier to really quickly... My co-founder and I were significantly better than the average VBA developer. And so we found it much easier to just build stuff on top of our platform and sell that. And there was more...

revenue doing that and selling the underlying product. So was a bit of an out shoot of that. eventually through like referrals and word of mouth, it just got to be a bunch of pretty isolated standalone projects. My, I never like had a formal approach to finding your prospecting clients. I got enough.

random intros that it worked. And my mental model of this was like, if you talk to a technical friend at a party and you tell them that you're doing consulting, know, like with some probability, someone will eventually ask, like, like with some probability, technical folks get asked like, Hey, do know anyone doing consulting? And they will just route it to the last person they spoke to that said that they do. And so that's kind of like how I got.

most of the intros, it was really completely like, hey, like we talked for 15 minutes at this thing. And then like, you know, my like friends, like boss is like actually looking for someone to come in and like do some machine learning. And then a lot of those conversations would start with like, what do you think machine learning is? And you kind of go from there.

Kostas (18:51.371)
So what made you leave consulting behind and do the next thing?

Ben (18:59.35)
Yeah, had spent like, kind of figured out how to get better at consulting. And to make it run as a business, the kind of direction was like, I want to go up the gradient of like people who know less and less about technology, but have more and more money. And so you want to start solving like higher

leverage problems. And then the next obvious step past that is probably like, well, like none of the tech is very hard. So then you hire engineers and then I would just be like running a business where, you know, I'm arbitraging, like engineering hourly rates and trying to reel in like bigger and bigger customers. And I just like didn't like that part of the process at all. And I also felt like, you know, so I was like,

You know, in my mid twenties, like no CS degree, like self-taught engineer, and consistently like had not worked on a team other than with my co-founder, who was also fresh out of undergrad. And, you know, consistently I was like the most experienced, most senior engineer in the room. And I was just sort of like, this is not like, I'm not going to get better at what I want to do. And I don't particularly like doing sales, so I should go do something else. and.

Yeah, just like not having to worry about where's the next deal coming from, you know, that that sort of thing was really appealing. And just honestly, like doing more tech.

Kostas (20:42.131)
Yeah, so what was next? What happened after consulting?

Ben (20:46.54)
Yeah, so I sort of decided that I didn't want to go to a startup and I wanted a little bit more stability. So the plan there was I should probably just go to like a big tech company. And while I'm in big tech, I should work on the sorts of things that you can only work on at big tech, because I probably won't have the opportunity to that again.

So the things that I was interested in at the time and still should in today were distributed systems, which is more fun to do at scale and machine learning and AI, which now, obviously every startup is an AI startup and probably every deli is an AI deli with the extent to which this is penetrated. But

At the time, really seemed like you need a lot of computing, a lot of data, all of the exciting research is happening at bigger places. So talk to much different companies, ended up joining Uber just based on like where I thought the most interesting team and most interesting work was. And I worked on Uber for a year on observability and monitoring, helping build out their like distributed system stuff. That was really fun.

the, the team that I worked on, spun out later to start a company called Chronosphere. so they're a pretty light, like quite successful company trying to build a better, cheaper data dog. I don't know if that's still their pitch, but that was kind of the initial pitch. and you know, I basically ended up leaving after a year because there was, a lot of,

stuff happening at Uber at the time that did not like lots of people leaving and the whole, you know, thing with like tons of senior leadership changing. And it just sort of seemed like I don't want to sit through this for no particular reason. But yeah, that was fun.

Kostas (22:55.741)
Awesome. And what happened after that? I'm trying to get us, you know, to the, to today, but I think like each step at the end of the day gives us like the, the journey that took you to where you are today. So what happened after Uber?

Ben (22:59.64)
Yeah.

Ben (23:14.316)
Yeah, absolutely. So after Uber, went to Google. I spent about three or so years working on natural language processing and search, more specifically in Assistant. And that was interesting. It was kind of really fun. It was a time where a lot of search was transitioning from legacy stuff to actual neural net based approaches.

So this was, you know, like kind of, was there when BERT came out, which is the really groundbreaking early transformer model and something that eventually ended up, you know, being extremely widely used. Lots of interesting progress in the field at the time. And yeah, like towards the end of my time there, I also did a bit of research with DeepMind.

So I did one project on using reinforcement learning to train neural networks more efficiently across multiple GPUs and was a little bit involved in an early project to train neural models on code. Both of those actually worked really well in our kind of

Kostas (24:12.587)
Mm-hmm.

Kostas (24:25.385)
Mm-hmm.

Ben (24:38.484)
reasonably directly informative of, you know, espresso. But yeah, after a few years there, like the team was sort of changing and ended up just moving from like, search is, you know, reasonably slow moving, I would say. So, you know, despite spending a really long time at Google, I was definitely like,

Kostas (25:00.02)
Yeah.

Ben (25:06.062)
felt like a stranger in a strange land and always was sort of like, okay, when am I going to go do the startup? So ended up moving to a systems team, which was very, very different set of challenges. when I left, a lot of the people on the search team were like, why are you going to systems? They've got locks and threads. How are you going to deal with that?

Kostas (25:33.48)
You

Ben (25:34.916)
And then on the system side of things, I tried to convince people to use ML and they were like, we don't understand what you're talking about and we want you to just go do your job and stop talking to us about this stuff.

Kostas (25:41.535)
Mmm.

Nitay (25:47.276)
There's some interesting points you made there because you know people famously say like people these days you're either building ML for systems or you're building systems for ML and people think that you flipped it to it it's a similar kind of work but it turns out it's actually a widely different kind of work. So I'm sure that I'm sure kind of the first half of the experience you had at Google is very different from the second half and maybe for the listeners here give them a sense of like when you're working at Google and one of these like neural models for search like you said or even the codes thing.

Ben (26:00.235)
It's...

Nitay (26:13.528)
How does that even come about? Like are you the only team and they're like, okay, you guys, we want you to figure this. Is there 12 teams trying 12 different mechanisms? Like how does it even work?

Ben (26:23.806)
you know, I think that it's, commonly, I mean, this is like a Google specific thing. I don't know that this is like how I would advise people to do it, but you know, usually at any given time, they're like two plus official projects trying to do the same thing. And maybe a couple other teams working on it unofficially, very common to have the one team without a lot of

resources that is also tasked with maintaining the legacy system. And then the new exciting team with no legacy burden and way more headcount for some reason, you know, and then of course, like depending on which team you're on, it's either great or terrible. But for the team doing the legacy maintenance, it's always like, I don't understand why YouTube people couldn't like maintain the old thing and also do as much work as these 10 other people over here did when they had no real, you know, customer required.

So that, you know, the internal politics, I think, would not recommend to anyone. But generally, there was a lot of interest in the time in doing things better. And it was just very clear that, you know, these models could do things that are very hard to do with classical approaches, right? So

Speech recognition is one example, which is not what I worked on, but like there has been in industry generally, there's just this drive towards end-to-end models where there's this old, very traditional way of thinking about it of you get a recording and you process it into phonemes and you process that into words and you start replacing

bits and pieces of this with neural nets. And eventually you're like, okay, well, we actually have a ton of data and a ton of compute. Let's just train it from audio straight to text output. Right. And similar stuff like that, where just rules-based approaches are like, it turns out if you throw like thousands of engineer years at rule-based approaches, they can get you really far. I think that the scale

Ben (28:51.832)
that I saw that work at was super impressive and mind blowing in a lot of ways, but ultimately like what sooner or later, like the networks just catch up and they get better. So it's always like, well, you know, like we have these like queries that are parsing well and that we understand, but like it's been super hard to sort out these other set of things. And like, we just think a neural approach is going to be better. So.

You typically start from like, okay, well, let's deploy it and only use it in this limited set of things where we know it's better. And then you kind of work your way up to the chain and replace the original stuff. I think there's a lot of like error tolerance there. I think that running a very large service such as Google search, you have to be...

slower and more cautious. And, you know, if you're running something like perplexity, you can just like, kind of put stuff out there and it's okay if you have some misses because you're also going to have some huge wins. So you don't have to go incrementally. If you're at a smaller company, you can just sort of like really go for it and try to nail the landing right away, which like is more fun for sure. And, you know, I think leads to a lot of progress.

Nitay (30:14.839)
And there's two There's two really interesting things you brought up there one one and I saw this myself So that so the thing you said about kind of the operating model of having two projects at once actually to me doesn't sound that bad because I I've seen this at other big orgs fact I remember a story Facebook at one point. I think we had at least five different

key value stores that did basically the same thing, or at least were used for essentially the same use case. And I remember having discussions about this with some of the kind of senior leadership at the time. And one of them, I remember, made a really good point that stuck in my mind, which is he basically showed me that there's been like these business studies and then that's a well-known thing, that once an orc gets to a certain size, it's actually the most efficient way to operate is to do things multiple times.

because it takes so much more effort to try to corral everybody to do all in one way and consolidate all in one. And that one way might not end up being the most optimal. But it's actually best to do this kind of like almost like free market kind of thing, where it's like we're just going to give four or five teams the same agreement, essentially. Go build whatever you need to do. Or if you're building a product X and you're building product Y and each of you needs a key value and you think you know the one that you need, build your own. Go for it.

And then sort of organically, if you will, the one that catches on to more more use cases eventually naturally takes over. Now, that's like the perfect view of it. In reality, what you end up having, as we've all seen, is one system takes 80 % of use cases and the other four take 20%. You can never get rid of them. And so there's a lot of other stories I'm sure people will tell. But it's a very interesting thing because you get to this like,

whatever like n squared thing where the collaboration between n teams just gets to be too inefficient and it actually becomes the smart operating model which is interesting in big tech to think about. The other interesting thing you pointed out there that I wanted to ask you about was you talked about how, you know, it seems like in AI we've been making this static progress if you will or maybe kind of...

Nitay (32:18.347)
shift change between we used to have deep deep experts like NLP experts writing rules and we moved from that to kind of feature engineering if you will type of thing and now we're moving to neural nets that are essentially inferring the underlying features to whether we know it or not there's a lot of latent signal that the model itself is bringing out.

and having it be end-to-end. Do you think essentially all AI processes or all AI kind of workloads eventually move to the model taking the full end-to-end? And at that point, as you do that, how do you, your point, still maintain any level of tuning and introspection, understanding of what's going on? Any thoughts on that?

Ben (33:01.794)
Yeah, I think that these are all kind of very concrete areas of research. I think that definitely moving towards end-to-end systems where, in particular systems where you have a lot of data versus a lot of like intelligence and thought put into the system itself is definitely...

going to be, it isn't very much the trend and will continue to be so. There's a really great essay called The Bitter Lesson by Richard Sutton, who is just one of the foremost AI folks. And what The Bitter Lesson says is that over the time scale of like less than 10 years or roughly the length of one PhD,

The way to make the most progress on a particular ML problem is to sit and think really, really hard about the field that you're in and build a really good expert system that can leverage custom models and custom data and really kind of get great performance that way. And over a slightly longer timeframe,

the best approach is almost invariably throwing more data and more compute at it. And so you have a lot of people who've spent an enormous amount of time, in this case, literally their PhDs and their dissertations, building stuff that's really outdated just a few years later, because you can just come in and brute force it. There's a variant of this conversation that I have a lot with RML engineers, which is,

GPU time is expensive, compute is really expensive. And so the natural inclination is to come on and say, well, I want to train this model or that model. And I think I can do it faster and more efficiently with these series of experiments and projects. as you both know really well, engineering time is actually incredibly scarce in comparison to GPUs. So there, it's always like the conversation that I have is don't worry about

Ben (35:25.796)
how doing this in a clever way, like try a really simple approach where you throw, we're lucky enough to have a ton of data, throw a ton of data at it, that's your baseline. Maybe do one small sanity check, model somewhere else, but your baseline should be just take Llama, fine tune it, run it for a couple days or whatever, and see how it does.

instead of, you know, hand massaging something for two weeks. I think things are going to trend more and more in that direction, right? I think like what end to end means needs to be like a little bit clarified. So end to end is just where do you have this large volume of data? And for speech, it's really you have a bunch of audio and a bunch of like, like captioned audio. There's just a ton of it.

Right, so compare the amount of captioned audio you have to how many annotations do you have of like audio to phonemes to words, right? There's much less of that. And so that's why end to end wins out. Making sure models are doing the right thing is, you know, there's a lot of like, it's just like ML observability. Like you want to be checking.

that that's true. You want to be checking that new models are actually better. And there's a lot of work you can do there. and also like understanding what your models are doing there. It's very different from like, you know, running a debugger in a code base and seeing, and like stepping through line by line what's going on. But there's a lot of exciting research on how to do this. one impractical, extremely expensive technique is called like.

leave one out training where you literally train like a version of the model for each, like with one data point left out. and you do that for each data point and you kind of see, okay, like how did this one data point influence what the results are? And should I be using this to do, you know, should I cut like if you have something you're trying to fix or something you don't like, or just something you're trying to explain.

Ben (37:49.956)
you can kind of start figuring out what to cut out. Now that's like incredibly expensive. It's not like a super practical technique, but, people are working on practical variations. and also since I started the company, I've gone from like reading ML papers to like reading ML abstracts to like reading titles to now, like, I see people post stuff in Slack sometimes. So like, I'm not, I don't know what happened over the past year in terms of research.

But that is like a technique people are looking into. People are also looking into things like, you know, we have all the neural network weights. like, what, you know, like, what is the model actually doing? Can we use that to explain things in some way? So yeah, people are trying to sort it out.

Kostas (38:36.329)
Yeah, Ben, one last question about your journey actually, before we get into Espresso and what you are doing today. So you mentioned when you went to Uber, one of the reasons you decided to do that was because you wanted to do distributed stuff. That's what's fun, right? You went there, did observability into like database systems also, because I guess like you were working on the, I think the name was M3 of the database there. Yeah.

that then was after that, like got open source and then Chronosphere also started. But then you went to Google and in Google you got into like a different domain, right? Like we spent some time now talking about ML and AI. How did you decide to make the transition if it is a transition for you, right? Maybe in your way of thinking, you actually are not transitioning, but...

Tell us a little bit more about that. How do you decide to make this change?

Ben (39:39.628)
Yeah. So prior to Uber, I had been doing data science and ML and most of the roles that I had applied to were some variant of ML role. And I had this one distributed systems role, which is a different thing I was interested in. And I felt pretty unqualified for. And I'm a fan of accepting job offers that you feel unqualified for. I think that's like a good way to learn stuff.

So I just did that because it seemed interesting and I felt like, you know, I did this whole job search, clearly I can get an ML offer anytime I want and I can always switch back, but it's going to be more work to get another distributed systems offer. and also just, you know, kind of the like vibes of like the different teams that I was talking to and location and all that stuff. and like going back and doing ML after that was.

sort of the same thing. was really like, hey, like, this is a different thing I'm interested in. You know, I feel a little bit rusty on it at this point. It's been a year since I've thought about it. I have some idea of what distributed systems is like. You know, I want to try doing ML at scale. And it was honestly pretty similar logic to go from there to storage at Google, which again, I had done an internal

team search, were a lot of different ML teams I could join. But there was one systems team that had a really cool project that I felt like I would just learn a ton. So I've always kind of like, if something, if I'm curious about something and it's exciting, like I have a tendency to go for that. You know, I think in hindsight, like,

It could have been like good to cycle through things a little bit faster and specialize sooner. but honestly, like it was fun and I was learning consistently. think like if you're as an engineer, like if you're learning stuff, you know, day to day, and obviously there's stuff like comp and other considerations, but you know, if, those all check out, like being on a team where like the people are really good and like exciting stuff is happening, like I think, and you're

Ben (41:58.68)
learning, I think that's a good spot. yeah, that was a lot of how my thinking went at the time. And it's usually like, the flip side is like, if I start getting bored, like that's when I start looking around for what do do next?

Nitay (42:13.198)
So speaking of that, what made you then go from Google to decide I'm going to do another startup? Tell us a bit about the kind of founding of Espresso and then coming to today.

Ben (42:21.856)
Yeah. So the team that I was on at Google when I joined was extremely exciting. It was a bunch of very experienced engineers, like just looking at the literature and implementing like super high performance stuff. And I was learning a ton. And then the project got really successful and my role turned into, you know, a lot more meetings and a lot of the senior folks started leaving to the next project.

And we went from, let's build this, you know, super high performance thing to like, well, okay, now like it's really working. So like, we need to replace the like feature by feature, replicate the old system so we can replace it and deprecate it. And that's like a, you know, five year journey. And then we were, and you know, like you launched the thing into production and now people want things like tests and documentation.

rollout schedules, right? And it sort of felt like the really interesting high performance stuff. One, like I said, I was excited and I was learning. And two, I felt like, okay, like, this is really something where I'm contributing. And then towards the end of it, it was like, you could probably replace me with like three junior engineers and it was, it would go faster because it's just like so much stuff to do. And at the same time, like,

So that was one part of it is I didn't want to, you know, I was starting to spend a lot of time sitting in meetings and just writing not exciting stuff. And it turns out like leaving a company to go be a CEO is not a good way to sit in fewer meetings. But I did not think that went through. The second part of it was I had been thinking about like, I want to go do another startup.

and so that's something I've been considering on and off the whole time. I was at Uber the whole time I was at Google. and this is also around when chat GPT came out. So. You know, I, the, all of this sort of came together for me with look like I've been saying, I want to do another startup.

Ben (44:41.924)
I've been saying, I want to get back into ML. This is the most exciting thing that has ever happened, maybe. Right. And if I'm not going to go do this now, I should just acknowledge that I'm going to be a Google for the rest of my life or, you know, like hop around. like, if there's, if I don't go do this at this moment, like there's never going to be a better climate for me to just go take a huge swing at the main, like.

something I really, really want to do. This is also pretty different from how I envisioned it. Like I sort of had planned to just like go bootstrap a thing and not have to, you know, worry about investors the second time around. But like, this is less of a, I bootstrap a company and more of a, this is such an exciting, like,

just massive change in what the technology can do that this seems like, and people are, you know, at the time were very amenable to handing out resources for AI companies. It just seemed like, okay, I should take a big bet. Like, I think this could really be super, super impactful if it goes well. So all of that kind of put together, I was chatting, like,

That was kind of the high level. So I was sort of starting to think about, you know, one of the things I was considering was working at OpenAI. And I was in New Jersey at the time and emailed Sam Altman, who I know like very loosely know from IC, but he's like super nice and like really responsive over email.

And I was just sort of like, Hey, do you guys hire remote? And he was like, yes, we totally do. And he introduced me to some people. And after a series of conversations, turned out that like they theoretically were hiring remote, but not really at that time. And I was just talking to someone and the guy was kind of like, well, you know, like, it seems like, like that didn't really work out, but like, what else were you thinking of doing? And I was like, well, I always thought I would go do another startup. And he was like, great. Like you should just talk to our venture arm. And like, what would you work?

Ben (47:06.932)
and the, you know, for the first time in a while, I took that question and like sat down and really thought about like, okay, what would I build? and for me, the most exciting thing then, and still now is like, I think applying ML to code optimization and compute optimization is just super interesting. And like,

really under explored. Like I think the same way that we have much better autocomplete and much better tooling in editors, I think that there's equally room to make better optimizers and better compilers. And as we have been figuring out over the last year and a half, better schedulers.

Kostas (47:58.901)
So tell us a bit more now about Espresso, what Espresso is about and how it relates to all these things that you told us about optimization and using ML for compute and compilers and all these things.

Ben (48:17.858)
Yeah, so the original idea for the company was just build a better C++ compiler. you know, one of the general conversations with investors around this were great, but there was always the like, hey, has anyone ever made any money on a compiler? You know, so the answer is basically no.

We will be the first. So really the idea there was some of the stuff that I was doing in terms of C++ low level optimization and the stuff that the team I was on was doing was very like re-architect the system and rewrite it. like that I think is beyond the capability of something like a transformer as the tech is today. But a lot of it is like,

Go benchmark, like this function runs a trillion times a second across the fleet, go benchmark it and see if you can make it run better on the CPU. Like that looks a lot like a reinforcement learning loop. And I had had this experience years earlier, a deep mind of using reinforcement learning to make things run faster and it worked, which is cool. So like on the one hand, you had these systems that like a lot of the intuition,

for me for this was just from playing with copilot and similar tools. And it just really feels like these things understand code in a way that previous systems don't. So on the one hand, you had that. And on the other hand, you had a approach for taking code and making it run faster through a feedback loop. So yeah, original plan was to do C++.

really quickly were pointed at Snowflake by Niteye, by your co-founder. Shout out to Tasso for extremely good advice. And for a bunch of technical reasons, like SQL is easier and more approachable to optimize. So that was second plan for the company. Use ML to optimize SQL instead of optimizing C++.

Ben (50:36.556)
And so the great thing there is like the deployment strategy is really straightforward. You just proxy people's queries and rewrite them on the fly. There's no like the verification strategy is really easy. So like if you're optimizing code with an ML model, you want to make sure it's correct. And you can't really like a compiler that hallucinates is not like massively useful product. So there's

different approaches to this, but SQL in particular, there's more work on formal verification because it's, you you can represent it with relational algebra. So there's just a little bit more approachable than the general problem for C++. Although like some fraction of people who talk to us about working here always go like, don't you have to solve the halting problem? And the answer is like,

No, like it is possible to prove that some code is equivalent to other code, just not all the time. So technically it was more approachable, although still really interesting. And so that's what we started working on and raised a bit of funding early on from Matt Turk at First Mark. That was enough to kind of get us off the ground.

At the same time, initially I thought we would need a million dollars for training. But between when I started sketching out the idea and when we started working, there was a massive amount of progress in fine tuning and open source LLMs. So that essentially went from, think we need an enormous pile of cash to, actually, I think you can do this on AWS credits. And eventually we ended up doing a lot of the early work on just like...

a server we bought like, Amazon, like GPUs were limited. so Amazon had a, like a one 40 90 per person policy. like there were initially four co-founders and, we each ordered one and then shoved it into a server. so that's like still sitting in our office. so yeah, that was like plan, like plan a C plus plus like pivoted off that really fast.

Ben (52:59.62)
Plan B, like optimized SQL, got some promising results. And about a month in, I went to Snowflake Summit, the big annual Snowflake conference. And that went really, really well. Just, you know, I like didn't have a booth, like bought a ticket, showed up, had a pile of business cards.

was showing people like our hacked together demo, literally in like a terminal and just got a list of like 40, 50 interested companies that were like, no, like we will send you data. We want to talk to you about this. Like we would buy this if it works. Timing here's like I mentioned, super important, right? Like we're just out of the ZERP world and

you know, people have been spending a lot of money and for the first time in a while, people are looking at budgets and saying, hey, why is your million dollar snowflake bill growing 20 % year over year? So just a lot of people experiencing kind of the pressure and urgency.

As we started building this and started getting data from early customers, we found a lot of headroom in scheduling. So the way that Snowflake is configured is you have a bunch of workloads and you route them statically to clusters.

And so for all intents and purposes, you just have a bunch of logical assignments of compute. And this is highly, highly suboptimal, right? Like just because you're running a bunch of dashboards doesn't mean all those dashboards should execute on the same size cluster. And just because you're running, you know, a DBT job, if you have some dashboard workloads come in, they don't need to start up a separate cluster. They can go onto the DBT cluster probably.

Ben (55:07.7)
And the analogy is like Google has like Borg, right? And externally, this is Kubernetes. It's not that like there's a Gmail data center somewhere and all the Gmail workloads run there. like, if you spin up a, you know, and then you construct a YouTube data center in a physically separate place, right? You mix all the workloads together to get as much utilization out of the computer paying for as you can.

So yeah, early on we just started doing, actually like people started asking us, if you speed up our queries, that's great, but we're not paying for queries, we're paying for uptime. So how much money are you actually gonna save us? And we didn't know the answer. So we sat down and we figured it out. And in the process of figuring it out, we also saw that people were spending the average like 40 to 60 %

of compute on Snowflake is usually idle. So on average, half of your bill is going towards servers that are sitting there and doing nothing. And as amazing as ML is, it's a lot easier to bring down idle time by 80 % than to optimize 80 % of your workloads. So we started working on solving that problem with just better auto-scaling, better scheduling. And that's

kind of where the product is today.

Kostas (56:34.795)
Okay, that's super interesting.

many questions here actually. We probably need at least another two episodes just to go through that part. But let's start first with the optimizer for SQL. And you mentioned that you you wanted like to use reinforcement learning like to optimize SQL. Optimizing and the part, the optimizer part of like

Database is always an important part of it. one of the, I'll say that the parts that are very esoteric in a way to the databases, it's very common to hear that they're like just very few people that they know how to build optimizers. They are also kind of an expert system at the end of the day. It's like, it's primarily like a bunch of rules that you need to be very deep into the system itself to understand why they work at the end of the day. While,

they do something kind of crazy to be honest, if you think about it as a developer, which is they rewrite your code, right? I think like most developers, they write codes, they don't think that, like there's something here that's going to rewrite it and feel comfortable with it. So what's the difference between using ML to optimize a SQL query and using a Cascades architecture?

query optimizer or any other type of like, like cost based optimizer, all the standard, let's say literature that there is out there about optimizing SQL code.

Ben (58:21.368)
Yeah, so I think that like this is a pretty common question we get from engineers and it's entirely analogous to a compiler. And I think that that's something that people are a little bit more comfortable with. And on the one hand, obviously compilers are super, super sophisticated. They've had an enormous amount of engineering effort poured into them and they work really, really well. On the other hand,

Like no one's intuition is, man, like the compiler is so great, I can write the code however I want. And there's going to be no headroom here. And there's no point in thinking about performance, right? Like there's lots of things that engineers can do to write more or less efficient code. And the compiler will catch some of them, but not all of them. And this is the same, like the headroom in SQL optimizers for

using ML is very similar. And the idea is that you can get it to make changes that look like what a human DBA would do. Right. And so there's different sorts of kind of directions here. But one of the early things that came out of that demo that we had was a bunch of proposed optimizations to Postgres, some Postgres SQL. And we had the chance to talk to a

optimization expert at CockroachDB, Spencer, the CEO there is one of our investors as well. And it was a really interesting conversation. It sort of answered this question in kind of the way that I was expecting. And the question for us was like, well, why doesn't your optimizer do this already? And the answer was sort of like, you know, here, like the heuristics kind of imply that this would be faster than that. And in this case, it's not true.

but that's how our heuristics are structured. And here, we could build this compiler pass, but we don't think it's a big enough use case. So we just don't have that written in. And here, I'm not even sure that this is something that would be easy to express in a compiler pass in our system today. And so I think that, one, there's headroom because

Ben (01:00:48.824)
There are some changes the compiler can't make, right? So for, for language reasons, but that you as an engineer looking at a piece of code are, you know, can say, yes, like you can take this out of the loop and that's okay. Even if the language semantics don't quite allow it or, know, this algorithm is quadratic, but like it's functionally, you can rewrite it to be linear and it's, the same thing. And the compiler just maybe won't do that for you.

There's some stuff where the optimizer can just be wrong in edge cases. And there's some stuff that's just like super complicated and basically the compiler can't handle. ORM code is a really good example of that in SQL. Like ORM generated code is just harder to reason about, not just for humans looking at it, but like for compiler designers, because people don't write that way and they don't think about it that way. So this is kind of where

we see a bunch of headroom of, yeah, like this is maybe something a compiler should have done or the SQL optimizer should have done, but for whatever reason, like it did not catch it in this case.

Yeah.

Kostas (01:02:02.111)
Now that makes sense. And how do you make them work together? Because at the end of the day, you will have a developer there who used to work with one black box, which is the optimizer of Snowflake, for example, right? And now adds another layer there, which is, let's say the Espresso optimizer on top of it. So from a developer experience, trying like to, because

Okay. In theory, like everything works always perfect. Everything will run faster, blah, blah, whatever. But then some points, something goes wrong and you need to go and figure out what goes wrong and fix it. Right. So how, how do you deal with that? Because one of the things that we kind of have with the mail and like even more with LLMs, like for example, is that

They are like these huge black boxes that you throw something, most of the time something good comes out, but sometimes also something bad and then you're like, okay, I mean, what to do?

Ben (01:03:04.6)
Yeah. So I think there are a bunch of generic answers that can be given here and we will do stuff like MLOps. honestly, one of the great things about code optimization is that I think it's amenable to formal verification. And so that's what we're trying to do. When you're optimizing code, you know what you want it to do. It should do whatever it did before.

and there are, you know, there's like room for interpretation there, right? I think something like transpiling Python to C plus plus is a good example where, man, are there like a bunch of edge cases. and it's, hard to define what you care about and what you don't. but for SQL it's, you know, this query should always give the same answer. Like two queries are equivalent if they will always give the same answer on any set of data period.

And that's what we try to do. We try to make sure that that's the bar we hit and that it is really compiler level accuracy. Now, for many people, the adage, I don't know how to say that. People know that compilers don't have bugs. And then if you get low enough into the systems world, like, it turns out compilers do have bugs. And more often around performance than correctness, right?

And so then the question for us is, well, is our formal verifier never going to have a bug in it? And that's the plan. Maybe it will at some point. But at least it's a software engineering challenge, the same way that any other complex systems code is a software engineering challenge.

Kostas (01:04:54.743)
That makes sense. Okay, one last question from me and then I'll give the mic back to Nitai. After you promised that you will be back because we have many more questions here I like to ask. So you mentioned ML and there's also like AI and GPUs and all that stuff. So what are you using to create these models? Are we talking about

Ben (01:05:05.314)
I'd be happy to.

Kostas (01:05:24.299)
Let's say more traditional ML approaches. I don't know, like let's we're using logistic regression and there's nothing bad about it, right? Like it might be doing the job and it might be like the right tool to use at the end of the day. Probably not as sexy as like an LLM today, but doesn't matter. Or you end up like using LLMs and you fine tune them. Like what's the underlying technology that...

allows you to do these at scale rewriting.

Ben (01:05:57.75)
Yeah, so we do different things in different parts of the system. For query rewriting specifically, it is LLMs. I think the leverage that you get there is you have a model that at a very high level understands SQL. And so that's a great baseline to start with.

You can use smaller models, you can use things like code transformers, and we have kind of different experiments going in, you know, when is one better than the other? But fundamentally, it is something that has been pre-trained on a huge amount of SQL so that it understands kind of what's going on here. There are other models in other parts of the system, so especially when you get more into the scheduling part where

Smaller models might do just fine. And it is good to have those kind of sanity checks of, hey, are we getting a good amount of signal out of this really large, expensive model? Or can we kind of do some feature selection and get something just as accurate that is smaller and easier to train and more understandable and all of that?

Especially for optimization, yeah, it is all arms.

Nitay (01:07:30.83)
So perhaps one question kind of looking to the future as you mentioned just even in the time since you started Espresso things have moved very very quickly right like you called out you know.

you thought you'd have to raise millions and it turns out actually you don't need as much because so much of the training work has been done for you, so to speak. Looking forward, how do you see kind of the AI specifically around this kind of code optimization and work developing and how are you guys as a company like deciding where you focus your time, where you're kind of leveraging and just waiting for the market to go 10x on something? How do you think about it?

Ben (01:08:05.484)
Yeah, so we're not in the particular space of like verified code optimization. I don't think there are a ton of other people working on it right now. Well, hopefully your podcast is not so popular that we spawn, you know, a dozen competitors here. So that part, I think we're leaning into very hard. But, you know, certainly we're not

trying to do things like train our own foundational model, right? That's like there's headroom there for sure, but it's massively expensive and Facebook is just going to release, you know, Llama 4 and it's going to be better and larger than anything we can do. Same thing with kind of, you know, techniques, right? So there's a lot of stuff that we could be doing, you know, better in terms of

different architectures and that sort of thing. like, not that we could be doing better, but like obviously can be done better. There's tons of progress there, but like we're not an R &D lab, right? So that sort of thing, we're trying to keep an eye on the literature as well as just leaderboards and see, you know, what's going well. But mostly we're trying to take existing models and get them to work well in our system for our customers.

And actually one thing that we did early on was do a bunch of benchmarking across lots of different models. And this was kind of difficult for a few reasons. One is, you know, every model that comes out claims to be better than every other model on their own, you know, coincidentally on their own benchmark. And two is that the differences are so, are super minute. And we ended up going with Lama pretty early on.

not actually because the veteran benchmarks, but because it was just better Q aid. And so like you can import two libraries from hugging face and study, you know, and they'll work together and the work with llama. Whereas many of the competing models, they would instantly crash and you would spend half a day debugging of like, you know, why doesn't this like Laura adapter thing work alongside this like other thing that we're using? So, you know, this is another one of those questions like,

Ben (01:10:31.812)
I guess the question of like, which model are you using and then how much fundamental work do you pay attention to? That the most important thing for us, like the huge lift is not necessarily using the best model, but like getting something in front of customers that uses any model at all. Cause I think the real win is going from not having this kind of system that's leveraging, you know, a lot of real machine learning research to having one that does and not like one that's like 2%.

person.

Nitay (01:11:04.426)
that's great advice and sage advice. Especially I like the part about, you know, making, choosing lava because it really works. I've heard a lot of companies essentially kind of take a similar route of the best models, the one that can be engineered to build a product around. Not necessarily the one with the highest XYZ benchmark, especially to your point. ML benchmarks seem to be the latest wave of taking a page out of DB benchmarks, which famously did this where every database vendor comes about.

Ben (01:11:19.267)
Yeah!

Nitay (01:11:33.422)
And magically, they're the best on this benchmark that they produce. And so I think we have kind of a repeat wave of that. Cool. Well, we're probably wrapping up on time here. Any last advice or anything else on your mind you want to share?

Ben (01:11:52.388)
Bye.

I'll just echo what I should have listened to from Paul Graham, like go talk to users. If you are listening to this and I assume you have a good number of founders or people interested in being founders, go talk to the people you think are gonna use your product and see what they have to say.

Nitay (01:12:17.486)
Great advice, a great place to wrap it. And I love the full life story here because it all kind of almost comes back to hacking virtual basic and VLOOKUPs, like the deep performance work all the way to modern models. So it was great to hear the full journey. Thank you for spending time today with us and we'll definitely have to bring you back and talk to you more.

Ben (01:12:38.158)
Yeah, thanks for having me. Looking forward to it. This was a lot of fun.

Optimizing SQL with LLMs: Building Verified AI Systems at Espresso AI with Ben Lerner

Optimizing SQL with LLMs: Building Verified AI Systems at Espresso AI with Ben LernerOptimizing SQL with LLMs: Building Verified AI Systems at Espresso AI with Ben Lerner

More episodes

Optimizing SQL with LLMs: Building Verified AI Systems at Espresso AI with Ben Lerner

Optimizing SQL with LLMs: Building Verified AI Systems at Espresso AI with Ben Lerner

Chapters

What is Tech on the Rocks?