Braintrust by Cortex

Tyler Davis, a software engineer at Canva, joins Ganesh Datta to explore how internal tools can be a source of joy and leverage for engineering teams. Tyler shares Canva's multi-year journey building their internal developer portal on Backstage, the moment they realized scorecards could transform operational excellence, and the eye-opening decision to switch to Cortex despite significant investment in their homegrown solution.

Tyler and Ganesh discuss their thinking around build vs. buy, how to measure the value of internal tools beyond traditional ROI, the psychology of sunk costs, and more.

What is Braintrust by Cortex?

Candid conversations with the builders shaping the future of engineering.

Braintrust dives into the operational realities of running high-performing engineering organizations, from production readiness and migrations to AI adoption and operational excellence.

Hosted by Ganesh Datta, CTO & Co-founder of Cortex

Tyler Davis:
For something like an IDP, I don't think a lot of organizations are going to have that many bespoke needs to where they will really benefit from building something internally, because they're going to be building something different. I think that's the real eye-opening piece for me. That was, we could have been spending all that time doing things that we value more highly. It was focusing on the Canva specific concerns that we really wanted to wrangle. So it's just about what your competencies are as a company, what you want to be focusing on. And unless you're a company named Cortex, you probably don't want to be spending your time building an IDP.

Ganesh Datta:
You are listening to Braintrust by Cortex, where we explore how engineering leaders blend AI platforms and culture to build high-performing software teams. I'm your host, Ganesh Datta, CTO and co-founder of Cortex, an internal developer portal designed to help engineering teams ship reliable software faster with AI. In each episode, we go deep with CTOs, VPs of engineering and technical leaders who've been in the trenches, navigating the tension between speed and quality, building reliability at scale, and figuring out how to lead through major platform shifts. Whether you're running a team of 10 or a thousand, this is your space to learn from people who've made the hard calls and live to talk about it. Hey, welcome to the podcast. I'm Ganesh. I'm one of the co-founders and CTO of Cortex.

Tyler Davis:
Thanks, Ganesh. Great to be here. My name is Tyler. I'm a software engineer at Canva. We're really excited to start working with IDPs.

Ganesh Datta:
Well, thanks for being on the podcast today. I'm really excited to talk about a few different things. One, I want to start with the idea of internal tools and how it can be a source of leverage, or how it can be a source of pain or hopefully more the former internally. And then from there we'll talk about the build versus buy, how you think about the value of internal tools, and then maybe a little bit about the IDP journey at the end. So maybe we'll start it off with internal tools. Let's start with an interesting question. What is the best internal tool you've worked with in your entire career?

Tyler Davis:
Oh man. Best. I mean, so many dimensions to that, right? I mean, I'm tempted to just immediately say source control, because that's the most fundamental thing in software development. You really need to, if we're still shipping around zip files of versions of our software, all hope would be lost. But I think the best tools, it's really hard to identify that kind of thing. Different people get different kind of value out of all different kind of internal tools. But when I think about that, I often think about the same kind of metrics that I would judge a product that I get, something you buy off the shelf. It's the level of joy that I get out of using it.
So of course there's fundamental things like your source control, but I have previous companies, I've used things like package managers, which were these things that you would only touch them every so often, but just the interaction with it was such a delightful experience that it always gave you a positive experience using it. So I think it's often just comes down to the end user experience and kind of touches on the idea of treating your internal tool users as customers. And this idea that just because they're engineers inside your company, it doesn't mean they should have to suffer through some kind of rough experience. Sometimes get something working, get the tool off the ground where it needs to be, but other times it really pays to invest in these things.

Ganesh Datta:
Yeah, I think everyone has seen the journey of these internal tools where you start by building some sort of really hacky internal UI and then eventually people realize, "Oh, this is kind of useful."
And then all of a sudden there's a team of four people around it and then there's a designer, and then all of a sudden it's like an actual homegrown product, which like you said, ideally brings joy. I remember one of the things that I really enjoyed, and this maybe goes into the conversation about how you even define it, what an internal tool is, but I remember at my last job, we had invested a ton in our CI pipelines and we had these one-liner functions for Jenkins that were defined for different types of services.
So you had a Java service, you could do a build Java function or a build Python function, and you would get everything out of the box. It would do linting, it would do code coverage, it would do the actual multi-stage deployments and all those things. And it was so nice to be able to start up a new service and just drop that in or the feeling when you migrate an existing service then new CI pipeline, it's like this is amazing. Seeing that one line Jenkins file was so incredible, but would you classify that as internal tooling? What do you draw the line? What does an internal tooling even mean?

Tyler Davis:
I don't draw any lines, and I think that's something that I have seen people not struggle with, but I think it's easy to fall into a trap, where you work on some things really hard or you don't value certain things enough. And of course, prioritization is really important. You have to work on what's really important, and if you're a company and you're selling some sort of product, that's your moneymaker. You have to sell that thing, but there's just an immense amount of value you can get out of investing in your internal tooling as well. And it doesn't necessarily mean building it. I mean, you talk about build versus buy, but investing time in something or money in something, the point is to get some sort of result. And I think if you're working backwards from what people want to get out of it, you're generally going to have better results.
But you talk about that journey that tooling and products can take. You start with an MVP of course, and maybe it's something scrappy, put it together, but it does the job and you build up, and increase the value over time. I really don't see a distinction between internal tooling and external products except for of course, one is the one that's maybe directly funding things, but if your internal tools weren't bringing you some value indirectly, you wouldn't be doing it in the first place. If you are doing that sort of thing, then yeah, please rethink that. But it's maybe a little harder to measure the value of internal tooling, but it seems really obvious to me as a user of these tools that they of course bring a lot of value.

Ganesh Datta:
Yeah, that was exactly going to be my next question, which is how do you think about capturing or talking about the value from an internal tool? So I think even before we talk about build versus buy anything, the understanding of what value we're getting is kind of a precursor to that entire conversation. So for internal tools, how do you think about what does value mean? How do you go about assessing, like you said, as a user, it's very obvious to you, but how do you turn that into, here's how we explicitly talk about the ROI?

Tyler Davis:
Yeah, I mean, delight is a hard thing to measure, and yeah, it's really case by case. So for example, you own a developer tooling company, right? You've built that and you almost get it for free, right? Because the value that you get out of using your internal tools is directly shown by the value that your customers give it to you. But I would actually put it back a little bit. It's not necessarily that easy to prove the value of your external products either. Of course, you can say, "We have a lot of customers, we're getting a lot of revenue."
And that is one way to measure it, but as I said, that's not necessarily a great way to measure delight. Who knows how much more you could be giving to people in terms of the experience if it was built a certain way? I don't know of a good example for this, but if you're sort of forced into using something, because it's the only option, you don't necessarily know if it might be better built a better way.

Ganesh Datta:
Yeah, it's a really interesting point.

Tyler Davis:
Tough.

Ganesh Datta:
We faced that problem in the early days. I mean, before even it was called an IDP, and then after this market was called an IDP, how do you explain to prospects, to customers what the value is? And it's different for every organization. It's the same with internal tools. And I think one of the things that was, I don't know if this is unique to Cortex, but we thought about it as leverage, which is what are problems that enables you to solve in a way that's better than what it was before? Or if you can find those kinds of things, that's a clear answer for the value of an internal tool.
I think the earlier example I was giving about Jenkins, that CI cleanup, is obviously there's delight and obviously that leads to better morale and retention, and all those things, and that's kind of the intangible piece. But then there's also the idea of leverage, which is like, okay, well now it actually enables when we want to update our CI pipelines or we want to roll out a security fix, it gives me leverage, because I can do it in one place and have it roll out everything else. So that concept of leverage I think has been really interesting, and I think that's where internal tools that succeed maybe are implicitly capturing some sort of leverage that it brings to the team internally.

Tyler Davis:
Yeah, I mean, I think it's, again, just like any external product, it takes vision. You have to be able to say, "This kind of thing has value, because either I have experience, I've been in this place before, I know I'm going to want to refactor those CI pipelines on the road and this is going to save me a lot of time. And that's something I'll be able to see concretely, measurably later, but I know the value for my experience having used it."
So I think a lot of it is just sort of applying those same instincts and experience that you have for any sort of product, and applying intentional tooling as well. As we've seen it's very successful to be able to build companies out of these tools a lot of times and who knows? There's plenty of examples. Amazon started off as a bookstore. I don't think anyone thinks of Amazon primarily, at least in our industry as primarily as a retailer anymore, right? It's AWS, that's what we think of. And I don't know, maybe they had the vision there from day one in '95 that it was going to be that big, but I'd guess not. So I think you have to let it lead you down potentially surprising paths, and just fill a need and keep building it.

Ganesh Datta:
Let's go even earlier in the timeline. We're talking about the value of an internal tool, assuming that it's already been built, but how do you even get people to buy into building something internally in the first place or adopting some sort of new internal tool internally in the first place?

Tyler Davis:
Yeah, I mean, internal tools I think do generally follow a different early product development life cycle. It will often start off with someone just being frustrated with something, an engineer writing a script to solve a certain problem. It's I think pretty rare to start from a place where you have nothing and then do a lot of internal product research. It's generally these things turn out to be more grassroots things. It's not to say the only way to build these things internally, but in my experience, that's the most successful tools are ones that you can trace the roots all the way back to someone's Pearl Script 20 years ago. So part of it is I think having the kind of culture that allows that sort of artificial building and letting engineers build whatever they think they need to solve a certain problem and giving them the tools to do that. So it's a very sort of self-fulfilling thing in a way.
But yeah, it really is just about fulfilling a need, like an IDP for example. It does a lot of different things, but for me, fundamentally when I think about that, it's just about ownership and it's about having a list of things that this company owns internally, this thing, just a catalog of software. I mean, that's what it fundamentally boils down to. And if you have no other metadata, just someone to contact about who owns this, what happens when it goes wrong? And everything else is pretty much optional. But honestly, I think probably a lot of companies have IDPs in disguise, some way of finding out who owns that something. It might just be like an on-call tool, something like that. It might be someone listed in GitHub or whatever your source control is. It could probably take a lot of forums. But yeah, I think just seeing that there's a sort of a gap there and being able to build upon it, and probably realizing over time the amount of extra value can bring, I would guess.

Ganesh Datta:
Yeah, that makes sense. Interestingly enough, the first feature we built at Cortex was ownership. That was the problem I was setting out to solve, was I got paged at 2:00 AM, "I don't know who owns this thing." There's a confluence page somewhere that somebody edited. You're trying to hunt these people down. And so, the first feature was like, I want a list of services and one who owns it. And eventually, obviously things grew from there, but is that how the IDP initiative started at Canva? Shifting gears to that.

Tyler Davis:
I think that is it roughly pretty much, and I would be willing to guess that that's pretty much how it starts at a lot of companies, is just this idea that we have either no idea what we own. You can go trawl through all the code and come up with a list of services, and probably a lot of people did that. I remember I just actually joined Canva shortly after we had started our IDP project, so I can't speak firsthand to how it started, but I do remember one of my very first projects was building a dependency graph of all the internal services at Canva. And there's actually a lot of different ways. I mean, dependency can have a lot of different meanings, but I took the approach of trawling through Java import statements, because this was just back in Java services that we had at that time and ended up building this GraphQL monstrosity.
And honestly, I was surprised that I even got to the point where I had a visual graph, and it wasn't so much that the graph revealed a lot of complexity that we didn't know we had. We kind of understood the state of things at an intuitive level, but being able to see that data and also sort of realizing that maybe there's some value in being able to do this without a hacky script, and having a catalog that kind of build these kind of dependencies and show that kind of thing visually, it's emergent behavior, I think, right? And once you have a certain capability like this, you start to see a lot of possibilities that you can build on top of it. So yeah, I honestly don't know what the original impetus was, but certainly I know a lot of the things that we got out of it early on, it was all around the catalog.

Ganesh Datta:
Yeah. The scope obviously has expanded quite a bit past just the catalog piece of it, but do you feel like there was early value that was established with the catalog? How did you think about, you were talking about joy, you're talking about the usefulness of this information. Did you start to see people adopt this information? Because it's kind of like a chicken or the egg thing, which is the more people care about the data, the more they keep it up to date, but at the same time it's not up-to-date, they don't really care about it. And you guys probably seeded that data, like you said by building scripts and things like that, but yeah, was there initial value that people realized like, "Oh, this is solving something that we have, like how do people react?"

Tyler Davis:
That is one of the tricky things, honestly, because the catalog, it's sort of obvious to me. I take it for granted that the catalog has a lot of value, mostly because of the things we built on top of the catalog. So to me, the catalog is a more fundamental foundational piece. I don't really see people just crawling through the catalog for fun. I might do that, just because it's the space I work in and having data completion is satisfying for me.

Ganesh Datta:
Feels nice.

Tyler Davis:
Yeah, filling out forms, is a little bit of a pleasure that I get out of doing things like that. But no, I don't necessarily think the catalog alone provides an immense amount of value. As I said, it's always sort of possible, difficult but possible to get that information in other ways. I think it's what the catalog enables that really provides the value, and that's what we saw firsthand actually. For a number of years, we were focusing on the catalog, and I think that was time well spent, but at some point we looked back and saw our metrics, and it was exactly what I was saying. People weren't just looking at the catalog and yeah, it makes sense. Why would they? They don't have a reason to do that. Of course, people did use it, but it wasn't like every person at the company is looking at the catalog every day.

Ganesh Datta:
Yeah, looking at your own services. Oh, it looks nice.

Tyler Davis:
Yeah. There was usage, but it wasn't actually in that form. What the usage was that we saw was people calling the API. We developed a dozen or more downstream users of that data. And when I think back to what we would've done without having the catalog, and to be honest, even with the catalog, I was tempted sometimes to build sort of an intermediate layer, an ownership service. This is fundamentally what people wanted, was a way to look up an entity and find out who owned it or the reverse, look up a person and find out what they owned. That's where I think sort of the initial value was, but it wasn't people doing this. It was services that were building on top of it and other internal tools.

Ganesh Datta:
Yeah, it's one of the realizations that we came to very early on in our journey as well, because when we just had a catalog, which is the first thing we built, nobody really cared. People were like, "Oh, this is important, but it's not valuable enough for us to switch off from our IDPs in disguise."
Which was the thing back then. But then scorecards was the inflection point for us. It was an application of that data. It was like, hey, here's a thing that's very concrete that you can now do with this stuff. And then I think that's what opens people's eyes like, "Oh, I see why I would invest in the catalog."
And I think it sounds like people had that same realization of like, "Oh, it's actually really valuable to have this stuff in a single place, because it enables other things down the road." I guess maybe shifting gears, you guys went through a pretty long journey in the IDP exploration. You built a lot on Backstage. Obviously you're using Cortex today. What was that journey like? Why start on Backstage? Why switch? Is it related to the stuff you built on top of the catalog? How do you guys think about that?

Tyler Davis:
Yeah, absolutely. As I said, I wasn't there initially. And for me when I joined, and it was still just a proof of concept, to be honest. Backstage was something that I sort of launched internally, but it was like I kind of took it for granted personally. Someone had already made that decision, and I don't think we did ever at that point any good market research to see what options were available. But to be honest, maybe looking through the Internet Archive, it was actually pretty hard to find things in this space. And I think probably we would've started out with Backstage all along, just because it fit what we wanted to do at the time. I don't know if we would've been ready to commit to an off-the-shelf product at that time, even if we'd been able to find one.
So yeah, I do think Backstage served us in the space that we were looking for quite well for a number of years. Yeah, exactly what you were just saying. It wasn't until we realized that we wanted to just get a lot more direct value out of it and couple the catalog with our desire to improve operational excellence concerns. We had another team working on operational excellence completely separate from us, and we kind of realized over time kind of doing roughly similar things. We're focused on the metadata for services, and they're focused on is that service performing to a certain standard or is it meeting certain checklist items, things like that. And it's like a natural overlap right there, which is exactly what scorecards provide.
So yeah, I don't know who first brought it up, but it was just like, "We should just team up and work on this together." So yeah, I think it was kind of serendipitous almost. We had this realization that we weren't actually seeing the full potential of the catalog. We had the catalog and we were building small bits and bobs on it. But yeah, I think scorecards for us was a huge eye-opener and it just realization that we didn't have to constrain ourselves to being kind of a foundational layer. We could actually be something that was potentially a front page for developers, somewhere they could actually go and get some value out of not just trawling through it like a phone book.

Ganesh Datta:
Yeah. The scorecarding bit is interesting, because I think a lot of our customers who have made a similar shift, in many cases, there's been a specific thing that they were trying to solve and they're like, "Oh, this particular problem looks like a class of things that Cortex can do well and it's some sort of scorecard."
Was that the initial kind of impetus there? Was it the operational excellence use case that was like, "Oh, we should probably look at are there better ways to do this?" Or was it like a generalized, "We're looking at a class of things and we should look at all these things and see if there's something that can solve them for us?"

Tyler Davis:
It definitely was primarily operational excellence concerns that that's our sort of first customer for scorecards, the way we see with it, and the Reliability platform is the name of the team that's in charge of that, and they've really been our partners in rolling out Cortex internally and developing the scorecard. So it was a little bit of both though. I don't think we would've been nearly as excited about it if it had just been this one concern. I mean, the operational excellence is huge, very important thing, something I'm really passionate about, and even though I wasn't on that Reliability platform, it was something I had been trying to push for a long time, and it's rolling a really big ball up a hill. It's a really tough thing to push. So I was definitely very excited about it, but I think the thing that made it obvious that we were definitely going to go down this route was knowing how much more applications for it there was.
And yeah, we didn't just roll out one scorecard initially. We have a second scorecard as well that's focused on code quality. So we've got two dimensions there already and we've got three more scorecards in the pipeline already. So yeah, if anything, it's the kind of thing where we're more worried about building too many scorecards and overwhelming people, so trying to keep sort of a really deliberate tight rein on things. But no, I mean, I think scorecards is one of those... Maybe it's an emergent thing. I don't know. I can't say if I would've come up with it without having the catalog there already, but it just to me seems like an immediately obvious-

Ganesh Datta:
Application.

Tyler Davis:
... win. Yeah, just a great way to utilize the data.

Ganesh Datta:
Yeah. We're shifting gears into the build versus buy idea, and maybe we'll use this as a practical application of it, but I want to talk about it more broadly as well. Now I'll be the first to say Backstage is by no means a bad framework. If you have the time, the energy, the technical chops, whatever, you can build a great IDP with it, but it is building, it's not an off-the-shelf thing. You have to build your capabilities and whatnot.
So at least at Cortex, we consider Backstage to be the build in the traditional build versus buy. It's not really like people building their own. It's like you're building on Backstage or you're buying Cortex. And so, I guess you guys had built quite a bit on Backstage. It was one of the more fully fleshed out Backstage instances we had seen. And so, this long into the journey, why decide to switch? How did you think about build versus buy? Do we continue building? Do we go buy something now? How do we think about the sunken cost? How do we think about future potential? What was that conversation like? How do you even have this kind of conversation internally?

Tyler Davis:
Yeah. Well, it wasn't one conversation, it was about a hundred. Yeah, it took a long time. And to be honest, I wasn't convinced initially either. It took me a while to come around and I think a little bit of that I kept having this thought, "Am I just sort of a emotionally attached to this thing that I helped build for so long?" And I've always considered myself a person that's sort of more pragmatic, and it's not difficult for me to say, "Let's focus on the value here that we're bringing to customers, and if customers are having a better experience, I can take myself out of the equation." But it was still a challenge. I think what it comes down to is what you want to spend your time and energy on. There's probably a lot of financial equations that you can do. Is it a better use of your time to build something or spend the money on that instead?
But the thing that we struggled with is that it's actually, all things being equal, you can sort of do either one. The time you spend has a monetary cost, of course, that's what I'm trying to say, but it's not always as easy to hire a bunch of people to build something as it is to just buy an off the shelf product. And the key thing is that for something like an IDP, don't think a lot of organizations are going to have that many bespoke needs to where they will really benefit from building something internally, because they're going to be building something different. I think that's the real eye-opening piece for me that was kind of, yes, we're going to be building this and maybe we're going to have a lot more control or flexibility, but in the end, the product that we end up with will be very similar to the thing that we can get off the shelf, and we could have been spending all that time doing things that we value more highly.
It was focusing on the Canva specific concerns that we really wanted to wrangle. So it's just about what your competencies are as a company, what you want to be focusing on. And unless you're a company named Cortex, you probably don't want to be spending your time building an IDP. That's probably not what your real product is. And so, of course, we spend a lot of time talking about internal tools. I don't think there's a conflict here. The tools that you want to build internally are the tools that are unique to you. So if you find something like an IDP and if you're in a similar boat to us, where the IDP that you want to build is very similar to IDP that already exists off the shelf, what are you really getting out of that? I think that was sort of an eye-opening moment for us to kind of realize that we weren't building something special.

Ganesh Datta:
Yeah. Going back to the idea of leverage, it's like build internal tools that give your organization unique leverage that are truly custom to your organization, and you buy the things that are not. I guess the kind of last question I had for you was about the sunken costs. I think in many cases it's easier to talk about the forward looking like, oh, we want to start this initiative, or we've POC something for a little bit of time, and now we're going to decide whether it's a core competency or not, whether we want to invest in it. But if you spend a significant amount of time and money and resources on it, it's almost more of a psychological game to say, "Do we just throw more at it?" Do you just keep going and hope for the best? Or do we throw it away and start from scratch almost? How did you guys think about that and how do you think about it generally when it comes to internal tools?

Tyler Davis:
Yeah, I mean, that came up a lot, but of course, there's a reason why it's considered the sunken cost fallacy is it is a psychological trick that you often fall into, but it is still a real cost. It is something that you actually paid. But I think for us, and one of the ways that I ended up thinking about this is we didn't waste our time. We built something that we used and we got value out of it while we were using it. That wasn't, if I take that away, we would've been worse off. And at the same time, we didn't throw away all the stuff that we built.
We actually reused quite a bit of it sort of internally, all the data that we had spent years on. That's the real value is the data. That still exists, of course. And the most important thing, of course, is all the stuff we learned along the way and the friends we made along the way as well. But no, I don't think I would've been nearly as well-equipped to set up Cortex as an IDP without all that experience building Backstage, honestly. So yeah, I mean, it's not to say that you can't start from scratch with it all to shelf product, but I definitely think that we had a lot of hard-earned lessons that we still keep with us.

Ganesh Datta:
Yeah. And there was probably a lot about where you want to go. You've learned so much. We've also learned where we want to go next. And then you can go back to the question we were just talking about, which is depending on where we want to go next, is the next thing something we want to keep building or is that what we want to just look for something off the shelf?

Tyler Davis:
Yeah, I mean, you asked me before about the decision to do that, and while scorecards was a big deciding factor for us, we of course thought very briefly that maybe we could just build scorecards ourselves, right? And we actually built a POC for scorecards internally, and it was a lot of work, and I think that was fundamentally what it was. This wasn't just a small feature that we're going to add on, because we added a lot of stuff in the Backstage, but this was a big, big platform level thing, and it was going to take us a while, and we could maybe get to a pretty good point eventually, but then that's just another thing we have to own and maintain and build upon forever.

Ganesh Datta:
Yeah, I love that. I know we're wrapping up here, but whenever people talk about operational excellence, I'm always intrigued. Maybe in the last couple of minutes, if you want to quickly talk about what operational excellence means to you and what a good operational excellence program looks like. Maybe at a high level.

Tyler Davis:
For me, it comes down to two things. Basically, it's preventing incidents before they happen and not relying on heroes in your organization to do that basically with no incentive structure. So both having the tools you need and the ability to measure and show what can be done to easily improve your operational stance, but also building some sort of mechanism internally to actually reward people for doing that sort of work. Because I think so often that type of, I mean, anything non-feature, but especially operational excellence work gets, it's the first thing down on the backlog. And that's understandable, right? I mean, again, talking about products versus internal tools, like one's the true moneymaker and you got to prioritize as best you can, but what's the value in avoiding an incident? That's hard to put a price on.

Ganesh Datta:
Absolutely. Well, thank you so much for being on the podcast. I really love the conversation. A lot to take away from internal tools and the value of joy, and how to think about the build versus buy as well. But thank you so much for the insight.

Tyler Davis:
Yeah, thanks for having me.

Ganesh Datta:
Thanks so much for listening to this episode of Braintrust. If this resonated with you, do me a favor, share it with another engineering leader who's wrestling with these same challenges. And if you want to continue the conversation or learn more about how we're thinking about internal developer portals at Cortex, reach out to us at Cortex.io. Thanks for listening, and we'll catch you on the next one.