Braintrust by Cortex

Cortex co-founder and CTO Ganesh Datta sits down with Rob Zuber, CTO at CircleCI. Rob shares how the industry's move away from dedicated QA has cost teams more than they realize, and explains how AI is changing what good software quality actually looks like.

Rob and Ganesh discuss why velocity without direction is the wrong thing to optimize for, how LLMs can help teams think like a great QA engineer again, and what it takes to run exploration and core systems teams at fundamentally different speeds. They also get into why traditional delivery metrics fall short for innovation work, and what to measure instead when teams are discovering rather than just executing.

What is Braintrust by Cortex?

Candid conversations with the builders shaping the future of engineering.

Braintrust dives into the operational realities of running high-performing engineering organizations, from production readiness and migrations to AI adoption and operational excellence.

Hosted by Ganesh Datta, CTO & Co-founder of Cortex

Rob Zuber (00:00):
Predictability in product discovery, it's not just a fool's errand like it's probably harmful to pursue. Everybody wants their teams to adopt AI and suddenly be faster. But there's this thing in the process of adopting new tools, which is learning the tools. Otherwise you get stuck in the, we'll do 2% more of this thing that we've always done. And in the world that we're in right now, I just don't think that's how software is evolving.
Ganesh Datta (00:30):
You're listening to Braintrust by Cortex, where we explore how engineering leaders blend AI, platforms, and culture to build high performing software teams. I'm your host, Ganesh Datta, CTO and co-founder of Cortex, an engineering operations platform designed to help organizations continuously improve their operational maturity and reduce developer friction. In each episode, we go deep with CTOs, VPs of engineering, and technical leaders who've been in the trenches, navigating the tension between speed and quality, building reliability at scale, and figuring out how to lead through major platform shifts. Whether you're running a team of 10 or a thousand, this is your space to learn from people who've made the hard calls and live to talk about it. Welcome to the podcast. This is The Braintrust by Cortex. I'm Ganesh with the co-founders and CTO of Cortex. We help engineer organizations better understand their software from service cataloging to scorecarding to self-serve and so much more.
(01:35):
Really excited to have you on the podcast today, Rob, talk all things quality, AI, and maybe the lack thereof and what we can do about it. Great to have you on.
Rob Zuber (01:43):
Yeah, excited to be here, but just let's set some expectations early. We'll talk about all those things. That doesn't mean we have answers to any of those things, but certainly they're top of mind, so excited to be here and looking forward to chatting.
Ganesh Datta (01:56):
Maybe we'll solve it today in the next 30 minutes, but-
Rob Zuber (01:58):
One can only hope. Yeah, let's go for it.
Ganesh Datta (02:01):
If you want to give listeners a quick introduction of yourself, your background.
Rob Zuber (02:04):
Yeah, for sure. So I'm Rob Zuber, CTO at CircleCI. And so we're very much in the thick of software delivery right now, which is changing at an alarming pace. I mean, an exciting pace. If you're in it, if you've been in this industry for a while, it's always fun when we have these inflection points, but it also raises a lot of questions, as you said, about what stays and what goes, how do we do things differently? What's sort of non-negotiable or immutable in what we're doing? So anyway, that's what I do. I've been at CircleCI for over a decade, which is probably unique among many engineering leaders. Well, obviously not at Circle CI, but in general, to have the chance to ... In engineering leadership, you make decisions on really long timescales and sort of to see those play out. And so maybe I have more regrets than other people, I'm not sure.
(02:53):
But I've been in software my whole working career, mostly in startups. So just always very entrepreneurial, always trying to figure things out and find the next thing. And so I'm super excited about this time because I think so much is changing in very, very cool ways.
Ganesh Datta (03:08):
Yeah. And I think it's interesting, not only have you seen the effect of your own decisions in a long time scale at CircleCI, I think having seen multiple transformations over the last decade in the critical paths of software delivery, I think you have a unique perspective on what the rest of the industry's doing as well. So I'm really excited to dive into that.
Rob Zuber (03:24):
Yeah, absolutely. I mean, for anyone listening, at least for the beginning of CircleCI, every company was building a rails monolith and deploying it to Heroku. Put yourself in that mindset and then calculate all the things that have changed since then. And so yeah, it's like doing the same thing, but always differently for our customers to make sure they're successful in however they're trying to deliver has been the rollercoaster or the evolution or however you want to think about it of what we do.
Ganesh Datta (03:51):
Absolutely. One of the things that obviously is part of the SDLC that we're all aware of is the introduction of coding assistants and AI agents in our software delivery and the entire life cycle. You had a post recently about maybe a topic that we're not thinking about at all, which is what about the humans in that loop, especially when it comes to quality? Because everything we're talking about is how can AI write more code, write more tasks, do more of the software delivery life cycle. And you made a point that, hey, actually maybe we need a human in the loop at a part of the cycle that we don't really talk about, which is QA. In fact, I would go as far to say a lot of the industry has shied away from QA as a practice entirely for a lot of reasons, but it's interesting that you bring that up, especially now in the world of AI.
Rob Zuber (04:36):
Yeah. And to be clear, I mean, as I'll keep saying, these are all sort of thoughts that are still forming and changing, but it's not entirely about, to me, putting a human in that particular part of the feedback loop or whatever, but rather exploring what has made humans great at that. And to your point, we really have pushed QA certainly in the fastest moving organizations out of the SDLC in the sense of manual testers like that started with my ... My first job really was when I got into software and my first startup, I was a QA. I don't know if I was a QA engineer or manager, doesn't matter. There weren't enough people for there to be more than one, but it was a foot in the door for me in software and startups and whatever. And it was, "Hey, we're thinking about putting this into production." I wrote out a script and said, "Okay, I'm going to click this button and then click this button." Very, very ... This is late '90s.
(05:30):
And that was the state of the art then in terms of QA. And we started building automated tools and testing scripts and stuff like that back then. Obviously, running engineering for a CI/CD company, we're very forward in terms of how we do that. Everything has shifted to automated testing. We're very good at managing guardrails and gates and stuff in production environment, like gated delivery, basically slowly ratcheting up who has access, feature flags, all these other kinds of tools. But I think the thing that has happened over that transition is out of necessity, we've put testing in the hands of developers and developers are really good at identifying what they intended to do and then writing a test that proves that they did what they intended to do, whether or not what they intended to do encapsulated all of the possible edge conditions, whether it encapsulates actually what the requirements or the goals were from the PM, et cetera.
(06:28):
And I think the partner, if you go back far enough, the kind of developer quality engineer, whatever you want to call it, partnership was a great mental check, looking from a very different perspective, a very different sort of frame of reference to say, "What if I did this? " And I sort of commented in that post and it's totally true that the best quality engineers that I've ever worked with could take down a system in minutes. They're like, "Let me try five different things that no one ever thinks of. " But it never made it back into the developer's mind because they just think differently. They're in a different frame of reference. I've even seen the same person act really effectively in both of those roles, but it's hard to do at the same time because you're locked into one point of view versus another point of view, right?
(07:14):
And so what I'm particularly interested in is taking that perspective that we've honestly lost over the years and wondering, can we rebuild that in an AI assisted manner? Can I ask an LLM, "Hey, I just built this thing. What are all the things that someone could do that are going to make it not work properly?" And the classics are just like, this is an email field, right? We're better at this now in 2025, but here's an email field and someone just puts in a string of numbers. And now we have JavaScript validation back in the 90s, it would just crash the whole system because no one had thought of it. Or what if I put a big negative number in here, just basic, basic stuff. And then we built tools like generative testing. But if you think about what an LLM could do, it can generate far beyond give me a number in this range.
(08:02):
It's like, well, that's an interesting range, but what about outside of that range? What happens if I put text in this field? What happens if I put internationalized characters in this field? How is your thing going to perform? What happens if I put a single quote in here with drop table? Someone should figure all that stuff out, but we keep forgetting over and over and then that knowledge, I mean, that's what LLMs are. They're like all of the knowledge that we've created so far encapsulated in a way that can be thrown back at you. So I think that exploration and sort of using the LLM to bring back this human element and human creativity that we've lost from testing is a really interesting thing to explore. And I think there's a lot of potential there.
Ganesh Datta (08:43):
I'd love to expand on that. I think one of the things that LLMs, one of the misconceptions maybe LMs is they maybe have out of the box creativity. But I think what LLMs are really good at, it's almost like a junior engineer where if you give it a way to think and a bounded problem, then they're very good at operating within that space. And so maybe to think about it from first principles, it's like, okay, well, if the missing thing in testing today is that human creativity, then maybe we can articulate exactly what it was that made QA so effective by this kind of thing. What is the mindset that led to that? And so I guess if you had to explain, how does a QA person come to the conclusion that, "Hey, I should put a random string of garbage in here and see what happens." What is the human thought process that is happening behind the scenes that leads them to that outcome?
Rob Zuber (09:27):
Well, to be clear, when I say the best QA engineers, I'm not referring to myself, but there's a reason I didn't last in that role very long. Luckily, I figured out how to program quite quickly, but I really think it's curiosity and creativity, which are not the default attributes that you would assign to an LLM. But I think what's happened is in so many parts of our lives and software, we strive for reproducibility, predictability. And so the best debuggers, the best incident management teams, I mean, obviously in terms of what you do, getting information available to people, everybody that's really, really good at that, I hope this is contentious or I don't, is not going off a runbook.
(10:22):
And to go back to QA, what happened in testing is it became like click this button, then enter this value into the field and then click this button and then enter this value in the field because then we can outsource the mechanics to the lowest paid possible people and then ultimately that became machines. But when you smell an odd behavior as a human, you're like, "Ooh, that's interesting. And I know a little more about how the system works. And if I think about this, then maybe, just maybe if I put this one weird character in here, it's going to break over here." I made the joke about single quotes and dropping tables, but that's what makes threat actors unfortunately very, very good at their jobs because they know enough and then they smell something along the way that's like that, there's an opportunity here. And I'm going to dig into that in a way that is not ... I can't believe I'm going to talk about threat actors this way, but let's roll with it for a second.
(11:21):
They're outcome oriented, right? They are not executing a script. They have a bag of tools and they know how to execute with that bag of tools to try to get an outcome. And when it starts to feel like it's not going to be worth it, then they go do something else. But what we often do in QA or so many categories, again, is try to make it super repeatable. And the idea of like, I smelled something interesting, something looks off, I'm going to go pursue that, is not rewarded as much as I got through all the test script in the time allocated and we're going to ship on time. And so we shipped on time with this massive bug and the first person shows up and types something into the field and your system crashes like, "Did you really ship on time?" That's a great question.
(12:01):
So the curiosity and creativity, again, is not what I would attribute directly to an LLM out of the gate, but part of the ability to do that again is the bag of tools. And an LLM has a massive bag of tools because it's read every line of code that's ever been public or some subset, but some representative subset. And it's probably read all the public incident reports and it's consumed so much information that with a little bit of guidance in the right direction, you should be able to say, again, I built this thing, what's going to go wrong? But that's not how we frame the question. We're like, "Build me a thing that does this. " And because the LLM gets rewarded, I mean, if we could think about it, it doesn't get dog treats, but somehow it's still very excited about behavioral rewards because it gets rewarded for producing something, then that's what it does.
(12:54):
And its goal is to make you happy. And so if you give it different context and different goals, it will go pursue those. So then the question is, as humans, are we framing that well enough? And if we just frame it as the automator of the tests that we did in the past, we'll still be bound by our imagination. But again, the most ... Talk about debugging incidents. Anyone that is really good at that just has a very good picture of the whole system and probably enough experience in software in general to know a system behaving like this, exhibiting these behaviors, even if it's net new to us, that's probably a sign of X, right? There's a bottleneck on threads or there's a deadlock in the database or something like things that people intuitively see, not because they were born with that information, but because they've amassed that information over their careers and who has more information about all of those conditions than an LLM.
(13:50):
So I think the curiosity it doesn't have, but that can be pointed pretty quickly by someone guiding it to say, "Oh, interesting. What about this? Oh, interesting. What about this? " And have access to just that historical wealth of knowledge that no one could keep in their head.
Ganesh Datta (14:08):
I want to keep backing into how we can frame this to an LLM to make that work. And you mentioned two interesting things there. A, the fact that historically some part of QA was things that could be very easily automated. It was stuff like fill in different values and things like that. But then threat actors is a great example, like you said, are very outcome focused. And if I squinted at that a little bit, it seems kind of like blurring the lines of maybe another quote unquote lost art of UAT or user acceptance testing, which is a bit more outcome oriented. It's like, "Hey, the user's trying to solve this problem. Are we able to do that in a meaningful way?" Does that fit into the traditional definition of QA? And is that part of maybe what's missing is the ability to define the bounds of, here's what we're trying to solve and then within that guardrail, go and try to figure out the things that may go wrong.
Rob Zuber (14:57):
Yeah. I mean, traditional definition of QA, I don't know that I'm the official author, but absolutely the one thing that we separate a lot when we talk about product is the difference between is it possible and is it easy or is it good? Yes, we made it possible for our customer to do a thing, but if they have to jump through 20 or 30 hoops to get there, they're going to give up. If we can clearly identify that that's a problem for them and then guide them down the path of doing the thing or do it for them, that's much more interesting. And so I think what you're getting at there, which I agree with is, again, with the, I won't call it automation, but the repeatability, the desire for repeatability. And we script out, go through these steps and when you get to the end, if you get this outcome, then it's good versus try to do it.
(15:53):
You call it UAT, I would say all kinds of sort of research rooms. I'm actually thinking of usertesting.com who do this. Just ask someone to do something and then watch them do it, which isn't so much probably, speaking of the traditional definition in the definition of QA, but it's critical to building good products. We think it's possible to do this thing and it really lines up with this model of the assumptions that we had when we built it. And then we put it in the hands of a user and you're like, it's like watching a horror movie. You're like, "What? No, no, don't go in there." You know what I mean? How could you possibly be going down this path? It's so obvious to me what you should be doing right now. And I'm watching the ... And I think it's really good for everyone.
(16:41):
I think it's really good for engineers to see that because once you see it, you're like, "Oh, we could totally fix that. Let's just do this instead." But again, you have so many assumptions baked into your view. You've been thinking about the problem forever, you're super close to it. And so you build things in a way, I'm putting this on engineers, designers, PMs, whoever, all of them, building it in a way that makes sense to you and is founded on a bunch of assumptions that maybe you haven't truly tested. And I think testing for quality falls in that same bucket. And we've raised the threat actor a couple times. That's just a different kind of testing. I'm testing for vulnerabilities in your system plus our vulnerabilities in the structure of your organization. There's obviously social engineering and other pieces that go into that, but all of that is looking for gaps in the assumptions that were made.
(17:39):
And so they all, if you step back one level of abstraction, they're all very similar. We make assumptions about correct function. We make assumptions about ease of use for the user. We make assumptions about security of reliability, like pick a thing and having a different perspective, whether that's another human or whether that's something that could do that certainly with the wealth of knowledge and a faster pace with some guidance from a human, you're trying to get to the same outcome, which is pointing out those assumptions, breaking them, and then making your product more resilient, easier to use, more secure, all those things.
Ganesh Datta (18:13):
I wonder if that's a useful framing for, if you're trying to do this with an LLM, is if you think about the LLM as a very intelligent thing with a lot of knowledge, but very little context, that's kind of what you want to do with the user testing thing is like, you're a person, you're clearly very smart. I can give you a problem and you can figure out how to go solve it, but you may do it in a way that I haven't intended you to solve it. And so can you do the same thing with an LLM? And maybe this is kind of how we're backing into spec first development or the things that we're trying today, which is let me give you the problem, but then if we do the same thing, so rather than giving you the problem and asking you to develop it, if we say, "Here's the problem we're trying to solve, go try to use the product to do this or come up with the ways this can break, it's going to be operating within those guardrails of just a problem definition rather than the code itself." And maybe it's just like BDD all over again or something, but kind of a similar
Rob Zuber (19:10):
Path.
(19:11):
I do think the spec piece, I don't know where that ends up. My math brain wants the specs to be slightly more formalized, like something that you could repeatably implement, but we'll see. But what I was thinking about as you were talking about that is, okay, here's a task, go try to execute it, kind of taking the UAT point of view. And could you actually score almost the quality of an experience? How hard was that for you LLM to figure it out? Because it will figure it out because it can scan all the tags and the page and look at the HTML structure and whatever if it's a UI and people are using that to build Playwright tests and stuff. But an opinion to say, "Oh, I've done this on a bunch of other sites and by comparison, this is really, really hard." This doesn't meet this standard, this doesn't whatever, the layout is weird, this button jumps around.
(20:07):
All the things that are sort of minor annoyances for users or more major annoyances, like it doesn't work at the screen reader, stuff that not every engineer thinks about necessarily when they build. I think almost, and then some feedback, what would you change? Just honestly write the change for me at that point if we have these pieces working together, because again, you can say, "Please build me a page that does this, " which is as the specs get tighter, it's going to be more specified, but sort of saying, "Build me a page that does this and then optimize the layout or whatever for these different cases." I don't know exactly what that would look like, but it's like I'm kind of torn in my own mind between how specific I want to be and how much I want to take advantage of known practices. How many times are we solving something that's been solved a million times over and we just don't look around and see how everyone else is already doing it?
Ganesh Datta (21:00):
Yeah. And maybe this is where multimodal models are going to be particularly useful because maybe you don't want them looking at the tags or maybe looking at the visuals of it trying to operate it that way. But I want to shift gears, maybe we've been talking a lot about the product experience, the testing for certain assumptions based off of a problem, but this all assumes that we're solving the right problem. And I think what's happening in the industry is we're very much focused now on speed. Are we moving fast, but we're not really talking about the direction. It's a vector. It's like you have the actual absolute value, but you're also going in a certain direction. And you talked about this a little bit in one of your posts recently, which is we don't want to just keep running really fast. How do we know we're going the right way?
(21:48):
How are you thinking about that in your own role as an engineering leader and then more broadly across the industry?
Rob Zuber (21:53):
Yeah. I mean, I think that it's been a hot topic for the whole time I've been in software engineering, which is a pretty long time at this point, but the pressure of such a rapidly changing market has really brought this to the forefront. One, because everything is like whatever business you're in, whether that's still relevant tomorrow is unknown. Literally on a 24-hour cycle, everything is changing, whatever business you're in, maybe not plumbing, but certainly if you're in tech somewhere. So it's more important to be agile, nimble, responsive, whatever. And then the other thing is we can move much faster. And the tools that we cite as sort of making us faster across the board, while we still try to dig through the data and figure out what that really means, one of the things we know them to be good at is rapidly generating V-zero at low quality, let's just call it, right?
(22:56):
We just had a whole conversation about quality, but for V-zero, that's okay. So the cost of experimentation is actually super, super low. I can build ... When I used to make a painted door, meaning click this button to use this new feature, and then you click on it and you're like, "Actually, the feature's not quite available, but sign up here for the wait list or whatever, and we'll let you know. " Or, "It's not working right now. We're at capacity." Whatever, all the little tricks that we use to figure out, I mean, the classic being the landing page, right? Here's some text about a company I'm thinking of starting. If no one signs up, I'm not going to start the company because it's really expensive. I'm going to go start something else. But in the same amount of time almost now, you can put out a functioning prototype.
(23:40):
And so you can get much richer feedback about your idea in a very small amount of time, which means you could test more ideas more quickly and therefore learn more about your direction to know for a tiny, tiny fraction of what it used to cost. I mean, I grew up in the days where we built data centers, just to be clear, by comparison, I can type three sentences and have a working prototype on someone else's system that I don't even understand kind of thing instead of like, "Did we order enough ethernet cable?" The difference is so many orders of magnitude, it's completely bananas to reason about. And so that's the secret to finding direction is testing your assumptions, putting the smallest amount of work in front of people that will give you real valid feedback. And we've accelerated how we can do that, which means for the same cost, we get way more assumptions tested, and that's even more important because of the market we're in where also we might shift direction tomorrow because that assumption was valid today and then some new company launched and went to a hundred million in ARR last night.
(24:52):
It's ridiculous how fast these things are moving. And so feeling less bad about, "Well, we built that thing yesterday and we're throwing it out today. It's just not that big of a deal." Now, when we get to the point where we have more concrete understanding, there's user adoption, they're excited about this, it's working, obviously we're going to invest in growing that to the scale that we needed to operate at or whatever. But I think all this was how I think about it as a leader and what I try to instill in people, we know how to solve that problem. At CircleCI, we've been scaling this business for 12 years or no, more, 13, I can't keep track, 14 almost. And so that's not the problem we're trying to solve right now. And it's okay to test some of these ideas, to put out things, see how people react, learn as quickly as possible.
(25:43):
And then when we find something that people are really excited about, well, we know how to scale it, but we got all the pieces in place. But if you're worried about scaling every piece of it before you even know if that's what's going to work for folks, you'll never get there. You'll never get to that idea. And so it's particularly interesting in a business like ours where a good chunk of what we do operates at really high scale and changes are very thoughtful. We have all, like I was talking about earlier, all the guardrails in place to make sure that when we make a change, progressive delivery was the term I was looking for. When we make a change, the blast radius is minimized. We know exactly who's seeing it first and we watch to see that it's working the way we expected, et cetera, et cetera.
(26:27):
But if you're building a V-zero prototype, you don't need any of that stuff. This is just wasting time. You don't even have enough users on that thing to measure whether or not it's working properly. You're waiting for one person to text you and be like, "Hey, I just tried your new thing. It's pretty good." It's a totally different level. And so operating at both those cadences or however you want to think about it, velocities inside the same organization is an interesting thing to manage and to make sure that people are clear about which side of that they're on for the thing that they're working on right now and how we expect them to think about it. And when reliability and stability is absolutely critical in part of your business and just not that important in a different part, I have no problem context shifting between those all day long, but you have to be very clear with others because if you've instilled in them, if this breaks anybody, we got to respond right away.
(27:21):
And then you're saying, "No, I don't even worry about it. Just ship that thing. It's going to be fine." Holding those two things in tension, I think is hard for some folks. And so part of it is being clear, part of it is having different people working on different things and them knowing what matters to them.
Ganesh Datta (27:36):
Have you had to reshape the organization to either prevent internal or reduce the internal blast radius of the folks experimenting on the things that definitely can't go down or to enable people to actually go and experiment in a culture where reliability was the number one thing that you cared about?
Rob Zuber (27:54):
Yes. Trying to think of useful kind of pieces of that. Absolutely. Both some technical things, like how can you take advantage of things that we have at our disposal? In the world of software delivery, we have a massive system for making available arbitrary compute, executing arbitrary work in that arbitrary compute.This is stuff that people are trying to build right now and we have it on hand to use. But it was built to do a specific thing and now we're doing different things with it. So making that available in a way that keeps that part robust. To your point, you can't come in and try some cool new thing and like, "Oops, I just broke everyone else's builds." Luckily we have those sort of firewalls in place or whatever, but without having to go back to the drawing board and build all this stuff a second time, which is kind of our unique differentiator.
(28:48):
The thing that we can bring to the table is our understanding of our customers and what software they build and how they validate it and creating the runtime and execution environment, which should do that. So a little bit of it is technical. So I think a big part of helping people run quickly is giving them the safety. You can touch all the things that you have access to and you're not going to break anything. You over here, take it a little slower. I mean, also the processes and the tooling and everything around those more robust and sort of core pieces of the system have been built up over years, so they are safer to touch. We've put that tooling in place because we've been doing it for a really long time. And then on the cultural side, that really is true. I'm so used to having all these tools in place, therefore I don't want to ship this piece of code without all those tools in place.
(29:42):
What's this strategy and that strategy and whatever. It's like, look, you're going to break three people if you do that and they all work here, so just go for it. And so getting people comfortable with that, yeah, absolutely. Again, a lot of it is helping people see that, "Hey, we put these partitions in place, we're safe, and Then again, helping with intent, because the intent was very clear and consistent for a long time. We are doing this, we're scaling the system. Then there is a little bit of, what's the best way to describe it? Just people's makeup and tendencies. Some people love the work of building highly scaled and scalable systems. That's what they're excited about. That's what gives them joy. Churning through prototypes and throwing them in the bin does not give them joy. We have both roles inside the company. It's taken us a little bit of time and people, they're excited to try the other thing and then they're like, "Mm-mm.
(30:39):
Not for me. I'm just not interested. Or I love spinning through rapid prototypes and I don't want to worry about scale. That just sounds hard. I don't want to do that. " We have that split. Then it's finding the right people and putting them into the right spots, which is so much of leadership, whether you're in engineering or anything else.
Ganesh Datta (30:57):
Yeah. I mean, it is very much two different skillsets. As a founder, we see that in our early employees versus later ones, which tends to be a very simple bifurcation of people who like to hack and tried random things and put it out there and see what happens versus people who are like, "Give me a thing that's working pretty well and I'm going to figure out how to make do that thing even better." And so we see that bifurcation a ton. I think a lot of engineering leaders are probably in a similar boat as you where they're trying to, especially in a lot of the folks that we talk to in the "traditional enterprise," they know that they don't want to get disrupted. And so they're trying to invest in the ability to move fast and iterate in certain things, but they have a cash cow that they need to keep going.
(31:37):
And so maybe it's different in the enterprise, but the question that we get a lot as well is as an engineering leader, how do you measure that? Do you have a set of metrics that you're looking at? I know you've talked about this as well. Sometimes a lot of these metrics are not particularly useful or they're not capturing the impact of AI necessarily in a meaningful way, but what are you looking at on a weekly or a day-to-day basis that says, yes, we have teams that are moving fast and are able to do things, but it's not degrading our ability to deliver a high quality product for certain part of it. Do you have dashboard that gives you that or what do you do to give you that visibility?
Rob Zuber (32:18):
Yeah, all of the above. We do it and we don't. Yeah. I would say similar to systems, your sort of traditional metrics, if we can call them traditional, but DORA, the core four, these sorts of things, I find them useful. They don't give me the whole story. Knowing that a number has moved is not interesting. It's certainly not the end of a conversation. It's the very beginning of a conversation like, "Oh, what's going on? " Half the time that ends with so- and-so took some PTO. You're
(32:53):
Like, "All right, end of story. There's nothing to discuss here. Why would we worry about that? " And every once in a while it's like, "Oh, we had to work on this thing that we totally didn't understand. We tried to make this change and we broke this other part of the system and then we spent our weekend incidents and whatever." And it's like, cool, let's unpack that because there's all kinds of information in there. Why were we surprised by how hard it was to make this change? Why were we surprised when we broke something, we made this change? How do we fix that? But a lot of that, I would, I don't know how strong this argument is because I'm making it up on the fly, but I would think works a lot better in that growing and scaling a system sort of state because predictability in product discovery is like, it's not just a fool's error, it's probably harmful to pursue.
(33:43):
Why didn't we write any PRs this week because we were out talking to customers because we were putting something out there and then we were talking to them and we were watching them and we were like, I don't know, we went on site with a customer, we worked with them all week and next week we're going to put out 50 PRs because we learned so much, but this week we did nothing because we did all this other stuff that was super valuable. And so saying like, I mean, I'm hanging on the PR thing, as long as you weren't breaking the system somewhere and they're like, we got to get predictable, consistent PR throughput out of this group that's just making it up as they go. And not in a bad way, but just in a very exploratory, there's a huge amount of opportunity to innovate in this space.
(34:24):
And actually as a very concrete example, one thing that's come up a lot is the kind of, everybody wants their teams to adopt AI and suddenly be faster. But there's this thing in the process of adopting new tools, which is learning the tools and moving at half the pace that you moved at before. And when you have this, I think, especially, I don't know if anyone puts their door dashboards on the wall or whatever, but if everyone's like, "Oh, we need to hit 17 PRs this week and we only have 15, let's get something out. " If that's the world that you're operating, no one's going to say, "Oh, let's stop and see if we could be an order of magnitude better, but that's going to take a week or two of us kind of tinkering and figuring out the bits and being bad at this.
(35:08):
" Maybe it's probably more than a week or two. And so in our world in particular, we very much want our engineers to struggle in a sense with the tools because that informs us. It's like us doing product research at the same time that we're building the product that we're building. So in our more discovery and oriented exploratory teams, like the folks that are really putting out new capabilities that are more oriented around how we think the SDLC is changing as opposed to core functioning system that's been around for a while, they're also doing a ton of exploration in how they built. Could we write a Jira ticket and just tell an agent to build that thing? What would happen if we tried that? And then what does that tell us about how we can get it? What is the role of our validation engine and all these other things in that process?
(35:57):
And so I think in that world, if you're obsessed with metrics that are like delivery metrics, you're going to be deeply disappointed. The people that you're asking to do innovative work are going to struggle to do that because they're going to feel this tension of a different kind of direction. And so we look at what you look at in the early days of any company. How many users have used it? What experience did they have? What did they tell us about it? Are they adopting? Are they growing? Whatever your success metrics might be for whatever it is that you're building, right? Those are the things that we care about and we don't care about them on an hourly basis for the most part. Something takes a dip because, I don't know, some other tool came out in the market and now this thing doesn't look as interesting, but we could solve this other problem about it.
(36:43):
You have to give yourself that freedom. Otherwise, you get stuck in the, we'll do 2% more of this thing that we've always done. And in this, in the world that we're in right now, I just don't think that's how software is evolving, like software delivery, which is particularly where we are and where you are. It's changing every day. And if you don't give yourself the freedom to pursue it. And then I guess the last thing I'll say about that, you really touched a nerve, is like then you have to be super dependent on your managers, your structure of leadership to know you need the freedom to explore, but you have to be exploring the right things. You have to be motivated and driven and disciplined about, these are the assumptions we want to test. We're going to knock down answers to those questions as quickly as possible, that we're going to figure out the next ones and we're going to knock those down.
(37:35):
And we might not know what the next set will be until we've figured out the answers to these ones, but we need this discipline and working model, which is hard to measure. But if you're an engineering manager sitting over a team of four people, it's never a huge team if it's super exploratory, you should know what everyone's doing. You should be super dialed into that. And if someone is phoning it in while everyone else is totally cranking, then you don't need a dashboard to tell you that.
Ganesh Datta (38:01):
Yeah, 100%. Yeah. And it kind of goes back to what we started this conversation about, which is focusing on the outcomes. At the end of the day, that's kind of what we're saying here is, especially in the exploratory worlds, what you care about is the outcome. What is the actual ... Maybe you want to call it P&L focused engineering, or I don't know, whatever you want to call it. Maybe it just is engineering. It's like you want to be outcome focused, you want to think about what's driving the business. It just so happens that in an established part of the business, the outcome that you care about also very much relies on stability and reliability and things like that. And that is the business outcome you're working towards, and therefore you're going to measure those things versus what you care about in an exploratory side of the business is totally different.
(38:42):
And then you're going to be measuring different things, which is business outcomes. And so I think that's probably a big part of it as well.
Rob Zuber (38:48):
Yeah, I think that's exactly right. And I guess maybe a nice thing about being operating in that dual mode is if there are things about our system and our processes, and of course there are, we're just an engineering organization like everybody else that are getting in the way. We have that kind of litmus test or whatever through the people that are trying to do things that they understand well. So the friction is evident. I think if it feels frantic and chaotic, then the things that make it hard to do the work get lost among them in the noise. And you're like, wow, that's just the story of being a startup or being really frenetic or whatever, and you could be going a lot faster. And so it's kind of nice to have that two views of it. So I wouldn't ignore the overhead. And if you're like a five-person company and you've got eight programming languages and six different ways of deploying things and whatever, please stop, make it the simplest possible thing and go-
Ganesh Datta (39:48):
Ruby on Rails monoliths on Heroku.
Rob Zuber (39:51):
If Heroku was ... I mean, there's replacements. There's plenty of replacements for that. But I mean, people are doing that now with the Vercels of the world or pick a thing that they tend to be a little less computation oriented, I think, a little more front end- But there's plenty of platforms like that that allow you to do ... I mean, honestly, and just deploying something super simple in a serverless world or whatever, there's so many options now, which can be a little bit debilitating. It takes too long to figure out. Pick something that you know how to do and just do it. But again, if you're four or five people starting a company, you don't need Kubernetes, you definitely don't need microservices or whatever, find the business. And I think that's true of small businesses, but successful new product teams in large organizations are 98%, whatever, some very high percentage overlap with the same operating approach.
(40:48):
If you're going to be successful in that world, you do have to separate yourself. And going all the way back to your question about organization, we did spend some time saying, how do we allow these teams to not carry the burden of our more scaled, mature processes in order to go do what they need to do? If the expectation is we have a check-in once every two weeks and you're on a formal sprint cycle and whatever, I don't know, you're just way too slow. Why would we operate like that? That doesn't make any sense. Again, some cases you need some consistency and you have enough direction that it's like, I can not connect with anyone for a couple weeks and just do the things that I know need doing, but when it's this exploratory, I mean, you're talking all the time. And so kind of modeling that a little bit differently and allowing folks to operate in a way, don't stress about these processes because this stuff is just too fluid.
Ganesh Datta (41:42):
Yeah. And it's very much like the old nasty, you ship your organization. So if you want to ship velocity, you want to design your organization to enable that. The point you made about the fact that having an existing business that you need to run in a world way maybe highlights, you want to solve that because that allows your fast moving teams to move faster is like, I would summarize that as a very interesting lens around slowing down to speed up. You have a team that is trying to go slow and methodical, but actually the investments in allowing them to do that allows the other teams to speed up. And you may not see that because you haven't had a team that is trying to really push the limits of like, we've slowed down, now let's pit the gas and speed up, but actually we have all this other friction there.
(42:25):
So I think you highlighted a really interesting lens for that.
Rob Zuber (42:28):
Yeah. And I think that's right. And some of the tools, I'm trying to think of what we put in place, but we have decent frameworks for building a service. We have an entire platform kind of underlying system for building on top of, and you don't have to ... I'll keep joking about Kubernetes, whatever. You don't have to figure out how our Kubernetes infrastructure works. You can just throw stuff into it and it'll work.
(42:51):
And so yes, those small teams are using Kubernetes in our world, but that's because we have a massive infrastructure and a team that owns it and has abstracted it away and all those sorts of things, as opposed to like, I'm a tiny little company building all this stuff from scratch. So it's a little bit different, but yeah, those investments that we've made are super helpful. And I will say we're more focused on resilience. I would never advocate that that part of the organization is moving slowly or would want them to move slowly, but they move with purpose. And I guess to build on that, like the slowest, smooth, and smooth is fast. They have the pieces in place that allow them to just execute on a regular cadence, which is just very, very different. It's a little frenetic and well, I just keep saying exploratory on the other side.
Ganesh Datta (43:39):
Yeah. At the end of the day, it's all about leverage. Well, Rob, thanks so much for coming on the podcast. It was a great conversation. It went in directions I maybe didn't realize we would go down, but I didn't think I'd be talking about multimodal LLMs at all in this conversation, but really enjoyed the conversation. Thanks so much for coming on.
Rob Zuber (43:56):
Yeah, thanks for having me. This is awesome. And whatever direction it goes, I think most people are struggling with the same things right now, so there's always plenty to talk about.
Ganesh Datta (44:05):
Exactly. Thanks so much. Thanks so much for listening to this episode of Braintrust. If this resonated with you, do me a favor, share it with another engineering leader who's wrestling with these same challenges. And if you want to continue the conversation or learn more about how we're thinking about engineering operations platforms at Cortex, reach out to us at cortex.io. Thanks for listening, and we'll catch you on the next one.

More episodes

Chapters

What is Braintrust by Cortex?