Braintrust by Cortex

Cortex co-founder and CTO Ganesh Datta sits down with Sneha Rao, VP of Product, and Ahmed Bebars, Principal Engineer, both from The New York Times Developer Platforms team, to discuss what it means to build and operate a developer platform at scale across a complex media organization.

Sneha and Ahmed explain why developer platforms need a product mindset alongside engineering, how renaming their team from Delivery Engineering opened a broader strategic mandate, and why SRE and reliability belong inside the platform rather than a separate function. They also share how to think about build vs. buy, when to start a platform function, and why AI is another evolution in the platform story rather than a revolution.

What is Braintrust by Cortex?

Candid conversations with the builders shaping the future of engineering.

Braintrust dives into the operational realities of running high-performing engineering organizations, from production readiness and migrations to AI adoption and operational excellence.

Hosted by Ganesh Datta, CTO & Co-founder of Cortex

Ahmed Bebars (00:00):
When I think about developer platforms, I think about all of the things that takes to develop and delivers that final products to the customers. So our customers are actually the developers themselves. So that's how we want to think about it. What that entail is when we solve a problem from an engineering standpoint, sure, it's simple. It's just like developing a solution for something and solved it. But how you solve this on scale, how you deliver this in a cohesive experience. It's not about like, I love Kubernetes. I build a lot on Kubernetes. It's not about how you just give someone Kubernetes cluster. It's like how you give them a Kubernetes cluster that works with CICD. But what does that means when we have scaling problems? What does that means on a cost perspective? What does that actually mean what all of that entails when it comes to developer productivity and making sure all of the pipelines are green?

Ganesh Datta (00:54):
You're listening to Braintrust by Cortex, where we explore how engineering leaders blend AI, platforms, and culture to build high performing software teams. I'm your host, Ganesh Datta, CTO and co-founder of Cortex, an engineering operations platform designed to help organizations continuously improve their operational maturity and reduce developer friction. In each episode, we go deep with CTOs, VPs of engineering, and technical leaders who've been in the trenches, navigating the tension between speed and quality, building reliability at scale, and figuring out how to lead through major platform shifts. Whether you're running a team of 10 or a thousand, this is your space to learn from people who've made the hard calls and live to talk about it. Today with me, I have Sneha and Ahmed from the New York Times. Very excited to have them on. Sneha, would you like to introduce yourself?

Sneha Rao (01:52):
Yeah, absolutely. Firstly, thank you. This is a very cool experience. Love the studio. So my name is Sneha Rao. I am the VP of product of a team called Developer Platforms at the New York Times, and it's actually a very cool department. I like the name, I like what it stands for, and it's pretty self-evident. We care deeply about our developers, their experiences, and enabling them in whatever ways makes sense. So some examples of that are ensuring that we know how to serve the news and that we can break news in a effective way and that we run in efficient ways on the cloud. So we care deeply about FinOps and that we have the right backend ecosystem and infrastructure to scale in an elastic way. So yes, I'm talking about Kubernetes and yes, I'm talking about runtime environments, but it's more than that. It's what makes sense for the business.

(02:45):
And in addition to that, how do our users even come to those platforms? The iOS native experience, the Android native experience, a web experience, all of those components and all of those platforms is essentially what makes developer platforms pretty awesome.

Ganesh Datta (03:01):
Thanks for coming

Ahmed Bebars (03:03):
Thanks for having me. And this is a great intro. So just my name is Ahmed Bebars. I'm a principal engineer at developer platforms, work closely with Sneha. And then when I started at the Times, I just wanted to start as an engineer. And I joined in customer care, solving problem for some customers. And now it feels different because I'm not solving problems for customers directly, but I'm solving problems for developers who are trying to solve problems for customers or build a product for customers. So the main focus is build tooling, provide platforms, have all of the things that we are focused on. And we know that we build best practices around just centralized and mature to people. So focus more on operation, focus more on all of that heavy lifting for things. So we can let team innovate in building cooking or games or delivers and use.

(03:51):
All of that. We're trying to make more focus into how we can do this more effectively and in a ways that can scale. I

Ganesh Datta (03:59):
Love that. I'm very excited for today's episode because A, it's the first time we're having two guests on this podcast, but specifically a product leader and an engineering leader. So trying to cover all of our bases here. I would love to start by talking about what developer platform means because in your introduction, you covered a couple of different things which maybe people don't realize are part of platform.You didn't just talk about developer experience. You talked about FinOps and breaking the news. And those are things that maybe you don't realize are part of a developer platform. So what exactly does it mean to build a developer platform? What is the scope of everything, if you could describe your function in the organization?

Sneha Rao (04:37):
Yeah. Do you want to go first?

Ahmed Bebars (04:39):
Sure. I can try. So when I think about developer platforms, I think about all of the things that takes to develop and delivers that final products to the customers. So our customers are actually the developers themselves. So that's how we want to think about it. What that entail is like we have to think about, this is part of an interesting process because I was just talking to Snap about this. When we solve a problem from an engineering standpoint, sure, it's simple. It's just like developing a solution for something and solved it. But how you solve this on scale, how you deliver this in a cohesive experience. It's not about like, I love Kubernetes. I build a lot on Kubernetes. It's not about how you just give someone Kubernetes cluster. It's like how you give them a Kubernetes cluster that works with CI/CD. But what does that means when we have scaling problems?

(05:29):
What does that mean on a cost perspective? What does that actually mean what all of that entails when it comes to developer productivity and making sure all of the pipelines are green? So they have their own mechanism of understanding what is their system is doing. But all of these processes, when you think about when I'm in my own team, if I'm building all of that myself, the ways that I start a product, code is one part, but the problem even comes before that. So the ways that we look at the journey is, and we have done many conversation on this, is like you start from an idea and then you go to a plan and then you go to a story like you developed already a project plan and then you come to a process where it's a code, it's like deployment, it's like the CI, it's like the catalog, it's monitoring, it's all of that kind of stuff.

(06:16):
This is what we shape. This is what we try to shape. We try to take the journey itself and see the commonality between all of our customers and delivers that experience holistically in a way. So what you come in, it's not like, hey, as a mission, you come to us and say, "I just want to deploy something or I want to create a new service." You get the full experience of like from like where you start, where's a couple clicks when you template something until you get a monitoring and observability on it. But you also want to get the cost, you want to get all of that cycle of it. And this is how you deliver a full mature product. It's not just about like, sure, take Kubernetes cluster and play around with it. Yeah, that's good when you deal with an infrastructure team. That's an ops kind of situation.

(07:03):
It's not like a platform. The other part of it is the patterns that we observe as well, because it's not just like, "Hey, take this and do whatever you want with it. " It's like we try to develop patterns and best practices and like how actually you engage with the product. If we see something that is a platform way, we implement that by default to be the only way you can go. Sometimes you want to have that leeway and customization that your customers can do their own configuration, not everything fits all. So that's where we try to do. That's my perspective from how it evolved over the years into the platform.

Ganesh Datta (07:40):
I love that because you don't talk about, like you said, individual capabilities, but it's the entire pattern of experience. And specifically, you talked about things like monitoring and alerting and the reliability practice, all of that part of the journey. And I think the commonality in what both of you described is there's a focus on the end user. Your customer is the developer, but your developer's customer is the New York Times customer. And so finding patterns of what your developers are trying to deliver for the end user and trying to build that into the platform is kind of your charter. Is that the right way of describing it?

Sneha Rao (08:16):
Yes. And they are the customers, but the stakeholders are the business units. So let's just expand on that. So the New York Times has multiple subsidiaries. We have games, we have cooking, we have shopping experiences for Y-cutted reviews, and we have the athletic, which is for sports. Each of these business units have big ambitions, and they're trying to scale in different ways, and they're building a whole new tier.

(08:45):
And those user journeys and those user experiences are being defined. And what we built for the news in the shared platform ecosystem needs to evolve to support these new subsidiaries. So the way I like to have conversations with these business unit, we call them mission leaders, is essentially asking them, what keeps you up and right? What are you dreaming of? What do you wish you could have built yesterday and how do you intend to scale that? What are your biggest concerns? And some might say it's deployment and it's like we just don't feel like we have the right velocity. Some might say it's experimentation. We don't know how to go about that. And some might say, "We're just worried we won't be able to scale to the audiences that we have the ambition for.

(09:32):
" Whatever those problem statements are, there are so many patterns to support those. So everything that Ahmed said are the blueprints that those strategies are built upon, but ultimately our job as I see it is like, yes, it's about satisfying the developer experience and making sure they know how to do their jobs. Every engineer that walks through the doors of New York Times knows how to onboard into our ecosystem, understands the systems that they need to work with, have these fine green blueprints of how to work with infrastructure. And it's really easy for them to get in, but it's also, what can you now do with that? Okay, you've onboarded your engineers and they know how to do their jobs and they're really enjoying it and that's great. Now what are the business outcomes that we're trying to scale to? And that actually helps us innovate our own infrastructure.

(10:22):
So we then grow into new verticals. An example of that is video. We're going really big on news and video format. So video observability is a no-brainer, but to be honest, we're catching up and we're learning as we're going. So there's so many of these capabilities that we keep expanding on as we go because we talk to those businesses. And that's really important. And I would say any team that isn't doing that, they're missing out.

Ganesh Datta (10:47):
I think one of the things that's very clear about the way you guys both describe this is there's a clear product mindset to the way you're doing things. You're talking about the user journeys and finding patterns, you're talking about stakeholders and making sure that you're aligning to that. I think that's very unique when it comes to platform organization. We've seen a lot of platform organizations where it turns into science experiments and you're building Kubernetes clusters like, "Hey, we have this cool Yammalops thing where you can spin up clusters and it solves nobody's problem." So I think it's very clear that you guys have built a product mindset around it. How did that start and how do you guys go from business outcomes to platform outcomes? What does that process actually look like? You're talking to a mission leader, you hear something about, we want to do a video.

(11:29):
How does that come backwards to the platform or the other way around?

Sneha Rao (11:32):
Yeah, great question. I think it's a craft that is perhaps we're just peeling the first layer of the onion on this craft. And when you find somebody who can empathize with a developer experience, perhaps potentially having been a developer, having been a software engineer, they can then connect with the nitty-gritties of what it would take to build those business challenges. So that's the sort of acumen that TPMs bring, and that's the kind of talent that I hire. Going back to when we get a media problem, like, okay, we're trying to scale video to essentially every reader, every Reggie that's out there, our minds immediately go to problem solving. And guess what? I'm an engineer too. I mean, I might have a product title, but it's a core principle. So you start going to the whiteboard and sort of figuring out, all right, what is the scale that we're talking of?

(12:32):
What are the cost implications to serve those experiences? What are the drivers associated with it, cost drivers associated with it? Do we care? Do we just want to grow in this vertical, test it before we start thinking about optimization, or do we want to do it in a cost optimal way? And those constraints actually help us fine tune the architectural design that we would support. So we do care about costs as we have to,

(13:02):
As we scale. So everything that we invest in in video, yes, of course we want the highest quality bandwidth and the 10ADP resolution as this video would be, but it comes is that necessary on every format that we have, long form, short form native ads, maybe, maybe not. We're not sure. We're figuring that out and just supporting that thinking. DevP is the partner supporting the newsroom as they sort of innovate in building out that video acumen. And guess what? It's not just the news, it's cooking. Now you're seeing our head chefs and our cooking experts do reviews in the kitchen in video formats. And you have wire cutter reviews in video formats and obviously sports will be covered in video. And a lot of our ads are also native video ads and they're performing really well. So this goes back to you got to talk to every business unit as they sort of invent that and have shared video player experiences.

(14:14):
We're like the central unit that can do that. And there is no other team that's set up to do what we can. And we take a lot of pride in that, to be honest. It brings us to the table. We're here to solve the big, meaty, complex problems. And a product manager is one who can scale their thinking to think beyond ... It's a pattern.

(14:35):
It's a shared pattern that goes across multiple teams, multiple full stack backend front end engineers. It goes across different business units. What are those personas? What are those segments and how do we roll this out? And then how do you think about stability and scale and cost? And how do you think about the speed of execution because each of those have different infrastructure investments associated with it? Yeah, that's kind of like the job of a TPM in the space.

Ganesh Datta (15:11):
It sounds like there's, maybe top down is not the right word, but you're focusing on the why and the what before the how. And I think that's like, you see a lot of platform organizations start from the bottoms up of what technology do we want and then how do we deliver it versus what do we want to deliver to our customers? And then how does the platform enable it? I'm curious, from the history of the platform organization, has there always been a platform mindset? Is it something that was new? How did that evolve over the years?

Sneha Rao (15:41):
I can share my perspective. And then Ahmed's been here longer, so he absolutely ... Oh, maybe we should start with the past and then ...

Ahmed Bebars (15:49):
Yeah. I think you're absolutely right. I think solving it from just a problem solving perspective like, "Hey, I want to scale my Kubernetes cluster." Sure, yeah, here are a hundred notes scale. You don't even ask that question, why do you actually want to scale? How do you want to scale? Is that apply to someone else? I think it evolves from being like, the momentum of ops, DevOps, platform engineering. I think that where the momentum was coming from, where you have teams that are responsible for vending you something. So I want infrastructure. Sure, here you go. I need 16X. Here, take it. This is up to you. Then it evolves to like, you know what, it's not evolving. It's not like working the ways that you expect because there's a gap between like, oh, and now I need to do something about it. Then there's a miscommunication gap into it.

(16:37):
And it brings the concept of DevOps in a way where teams start to be more responsible, more cognizant of what they have done. So you start seeing people looking at infrastructure, understanding and understanding what they are building. They are building in their own systems. I've been in customer care. I built my own Kubernetes cluster one day. I came to the conclusions that if I'm doing this, I'm sure others are doing this. So how we can evolve this. And this is one of the reasons that a few years ago I joined developer platform because I want to scale this. And this is happening not just in a single enterprise. It's happening like work a lot in the open source. We all solving very similar problem depends on the business needs and depends on each goals and unique organizations that you look into. So when you look at the end of it, it's becoming like, instead of me just having couple solutions for my problem, now how we evolve that into a way is that it could scale to all of the business units, all of the engineers that we work on.

(17:37):
And I mentioned earlier, developers is our customers. It's like the entire business unit, including developers. It's like their experience, it's their leadership experience, like whatever you're trying to do, how to do it. So when you start evolving to that space, you think about like, it's not about the problem solving, it's not about the technology. It's about the capabilities that you're delivering. It's a mix between both of them. I was just talking with Sna earlier about this and you cannot go into engineering only or product only. It doesn't work this way. These are the things that like, it's a combination of thinking through. You can tell me a problem and I can give you a solution immediately. Would it help you? Probably in a way, but is that exactly what you need? I don't know. I didn't ask the question. That's like product, think about also how this would get shaped into your experience.

(18:23):
That's where we sort of like more of the conversation. So when you have this wider conversation about like, what exactly are you trying to solve? And then let's jump into like, after you tell us, maybe you're solving for the wrong thing or maybe there's something else that you should solve for that you're not seeing because we didn't chat about. These are how it works. So to me, I have seen it from where it was just the vending style, just the infrastructure to like teams are enabled to do their own stuff to now it's becoming like a centralized effort on like, as Neha mentioned, we equipped with that expertise. We have the SMEs for CI, the Kubernetes. We have the mindset, we have the TPMs, the product people who think about user interviews, we have the design people who think about the user experience. And then now we can deliver a cohesive experience across all of it.

(19:17):
And it's not fraction. It's not like Kubernetes cluster that in a specific cloud provider, and then you have an identity system sitting in a different cloud provider and then like how to map these together to an engineer like, "I'm responsible for that Kubernetes cloud. I'm solve my problem. It's your problem." Then like you leaving the customer strand, it was like, "I have two things. I don't know how to go there." So it was an evolution. It was something that we have to step into over time to see what's working for us and what's not working and that's what we got there and we are there now. And then we're trying to evolve from there. That's how I have seen it over time. But what about you? How you have seen it?

Sneha Rao (19:56):
Plus plus. I think your question was like, how do you even get started with building the product platform, product acumen?

Ganesh Datta (20:05):
Is that where it started or was it like, "Hey, we started a platform team and turns out we're just building stuff and we need to start."

Sneha Rao (20:10):
Yeah. And so everything that Ahmed said is like this evolution that we've gone through. So something he kind of glossed over was we weren't called FP, if I could take any credit for that. We were originally, the mission I was hired in was called Delivery Engineering.

Ganesh Datta (20:29):
Interesting.

Sneha Rao (20:30):
And I guess it was like about three or four years old I joined and perhaps it's last year and it was very focused on backend, heavy infrastructure and building a shared ecosystem as Ahmed has described, like how do you do your CI/CD? How do you deploy, how do you build, how do you test, how do you build that elasticity, how do you build monitoring? It was very much focused on the service portfolio. How do you build services? How do you launch services? How do you scale in production? And that was great. But what that was missing is that I was hired to think about developer experiences as a product leader and the developer experiences go beyond service portfolios

Sneha Rao (21:17):
And it goes into the whole application. So that application tiering obviously includes front end engineers. It also includes designers and it also includes the full stack and integrations with the data layer and production data and how do you work with that? And that's actually when we expanded. So that happened, I would say last year and it's only been 12 months, but it feels like it's been much longer because you've accomplished so much in that time. It is an evolution. It's not a day one problem. I don't think you build a platform with a product team in mind. I think you build a product team once you have the notion, it's very much like the zero to one versus a one to 100 or one to 1000 step. You hire product people and product leaders like myself when you're at that, all right, we built a thing

(22:07):
And it's gone as far as it could have and we want to prevent a Frankenstein situation. So now how do we make sure that we're not just responding to the loudest voices, but we're taking a step back and we're thinking about the whole ecosystem and that it's really cohesive in how we think about it and that we roll out shared components that build homogeneity, that everything is more predictable because at the end of the day, the core business of the New York Times is not infrastructure, it is serving the news for our curious audiences and the job of games developer and the games mission is building these new innovative games. So we want to make infrastructure easy, accessible, and honestly, less of a cognitive tax tax or overload. Tax. And it's easier said than done. And I'm sure you've experienced that by talking to other leaders.

(23:07):
But I think that with a platform product mindset and with the right skills in the room that compliment engineering, and this is where it's a very niche space. So the type of talent you find, you have to dig deep to find them, and then when you find them, you kind of hold onto them. I'm very proud of the team that I built and I continue to build. So if you're watching this, reach out. Another thing that kind of popped out as you were talking, Ahmed, was

(23:42):
It's so interesting watching engineers become more product oriented and product managers become more engineering focused, and this is the world we're living in. Our roles are getting transformed. And so maybe, and we were just talking about it, things will change and we are open to it. We're a lean enterprise, so we're not a very large team. And I think there's a lot of space for that innovation and that space for roles to morph into whatever the needs are. And I just thought I'd share that because that's kind of what we're experiencing right now.

Ganesh Datta (24:19):
Yeah. There's so many nuggets in what you both shared. I think one thing I'll call out is in the zero to one and the one to a hundred, like you described, that one to hundred is where product really becomes important. And it echoes our own experience at Cortex. When we started the company, I was the product manager and our engineers were the product managers and it was great. And once we realized, hey, it's becoming a platform, there's so many capabilities, we're starting to add features that maybe don't make sense. We should hire somebody whose sole job it is to say no, or this is a thing that we should not be building or here's a pattern that we've already used and we should start to collect those things together. And so I think we saw a very similar thing. I think a lot of product organizations see that.

(24:59):
I think the other thing I'll call out is even in that zero to one phase though, I think it's very important to focus on still having a product mindset even if you don't have a product manager.

Sneha Rao (25:09):
Exactly.

Ganesh Datta (25:10):
I think that's where a lot of teams fail on zero to one because they get excited by the possibilities and they don't focus on delivering incremental value and proving that there's some value in what they're doing. And then sometimes you lose steam and you don't get to that one to a hundred phase. And so I think we've definitely seen a lot of that. And to what you were saying earlier, I think why the product mindset is so important, you were talking about the question behind the question. When somebody asks for something, you want to understand the why, what are they actually trying to do? There was a podcast episode I was listening to recently between Ben Horowitz and Brian Halligan, and it was talking about the why salespeople and engineers don't get along together. Not that it's relevant to this conversation, but I thought the takeaway was very interesting because engineers naturally, it's problem solving.

(25:53):
And so you tell them like, "Hey, can you do X?" And you'd be like, "Yeah, I can do X or here's how I can do it or no, I can't do it. " It's a very simple answer. It's a problem, it's a solution. But a salesperson, if you say, "Can you do X," their first question's like, "Why are you asking me this question? There's something you're trying to get out of this and I'm going to ask you more questions to figure out why you're asking me that question." And that's what we need to be doing is like, "Hey, I need to scale up my Kubernetes cluster." Okay, but why? Are you running background workers that need to be scaled out? Is there something that you're trying to accomplish that maybe actually requires a totally different solution? And so I think that's where the product mindset comes in, because it's about asking the,

(26:28):
Why are you trying to solve this? We see this a lot with customers, and I'm sure we'll run into this later, but customers will come to us with a solution. And our job is to say, what is a problem you're trying to solve? Let us go backwards and figure out the problem and then go from there. And I think that's really important. The other thing that you mentioned, we were talking about this at the dinner last night, the BrainTrust dinner, how naming is so important and naming is self-documenting because whatever you call yourselves, it becomes the truth. We were talking about how teams, when they name themselves something, if you name yourself like a payments team, if it's not related to payments, we're like, "No, no, no, that's not our team. It's somebody else." Even if there is nobody else who owns that thing, that's who you are and your identity gets built around it.

(27:11):
And so was that kind of what you saw with delivery engineering? The focus was on 100% delivery versus the holistic, "Hey, we want to enable developers to do a thing." Was that the root cause there?

Sneha Rao (27:22):
Yeah, 100%. It couldn't be more true in our case. Our world opened up once we renamed ourselves and the types of conversations we're having, the type of conversation you just had two hours ago, which we should talk about, is we wouldn't be a part of the conversation with teams that were building these capabilities that were outside of the service footprint had we not opened the doors to say, "We are here to serve and enable and to build and help you scale, help us all scale." And sure, we call ourselves developer platforms. And I want to actually talk about that too, because what does that mean

(28:02):
In a world where developers' jobs morph into product managers and developers and designers coming closer to each other's spaces? And I've been thinking a lot about that, so maybe there's a second podcast for that. But yes, naming matters. I wanted to just share the small anecdote. So I worked at Spotify before the New York Times, and one of the first squad that I had, we called ourselves Katamari. Now, if you're familiar with the game, it's a really old game and I think it's a Nintendos game and it's about rolling up a ball and then you build these pieces. So if you find a garbage can or you find a rub a ball or you find a car and you're rolling all of these things, and the more things you roll, the more points you get. And that was essentially what we were doing. We were rolling up user data and audience segmentation and we're like, we don't know the forms of what these blobs and unstructured data comes in, but what we're doing is essentially rolling it up.

(29:10):
And that visual really helped us because then that helped us evolve the thinking and the strategy. And we made a few pivots along the way, but it goes back to a name makes a big difference.

Ganesh Datta (29:21):
Yeah, absolutely. I obviously have opinions around the name, and we're starting to see that. I mean, you mentioned this too, it's not just developers anymore that's in your purview, it's like leaders. And I know SRE and other teams roll up into your organization as well. Actually, maybe let's talk about that a little bit. Developer platform, you don't necessarily think that DevOps and SRE and those kinds of responsibilities are part of a platform. Why do you guys see those things as part of your purview?

Ahmed Bebars (29:49):
So let's talk about naming first because naming is a problem across industries. This is the first things that you could see request disagreement on how people name things.

Ganesh Datta (30:00):
Sorry to cut you off. One of the founding stories of Cortex, and one of the reasons why I started the company was we, at my last job, the two naming schemes we had for microservices were Game of Thrones characters and coffee. And I got paged once for, I don't even remember what it was. It was some coffee theme thing. I was like, "What does this service do? " And nobody knew anything. And you had this big argument of, do we do funny names because it makes people excited about their services or do we do serious names that are self-documenting? And that's why I was like, a spreadsheet is not going to cut it. So that's why we started Cortex. But yeah, I agree.

Ahmed Bebars (30:33):
Naming it. I can relate. We have many projects and things, products that we deliver, like our runtime environments called letter case. For example, we have our own naming convention of doing things differently. And this is where you bring fun, but also you want to have a catalog to understand what does that mean? Because sometimes these identities is like, as you mentioned, developer platform now represents something. It's so meaningful in a way. But then you want to have some areas where you can name things differently, but still have source of truth of what does that mean? Because it's confusing when I tell someone like, "Yeah, I build this project." Oh, sorry, what does that mean? Explain the name in the first place if it's not coming. So your question was more about the SRE and the-

Ganesh Datta (31:17):
Why do all those functions, why are they part of the purview of Developer Platform?

Ahmed Bebars (31:21):
Yes. So from my perspective, and this is where it goes to the cohesive experience that you get and try to understand from the platform, it's not like ... So earlier in the days when we were more service oriented, like, "I want a CDN, get your CDN, I want to just get this. " It wasn't like how to connect all of these pieces together. So the problem that we are solving, it's not just engineering or a product or a platform. It's like the whole culture space problem. It's like all of the steps that you take in an instant when an instant happened, like how to frame that, what are the right steps to do it, how to prevent that in the first place. It's someone has that capability and understanding in their space to do that. So that's where an SRE team comes in. They understand all of the things that they have done.

(32:08):
They have the experience in SLOs, they know what it means. Sometimes you talk to an engineer and say, "Yeah, how do you solve this problem?" And then giving you an example, not in the New York Times. I was working in something and then you decide as an engineer, as simple as I would use that algorithm. And the algorithm was doing a simple combination on a string. But then you talk about this when you think about at a request level, which works and 100 request level works. What about million request level?This is where you want someone have that experience, have the understanding you check that, have seen these problems before, have that availability in the scope. You have some sort of an SLO run book understanding when an instant happened, where to look, how to look, how to connect all of these pieces together. The familiarity with that part is so crucial into solving this overall.

(33:00):
Because at the end of the day, part of the platform is not just running, it's the operation overall. It's like how to operate it. Building something is a one thing, but operating it is another thing. It's just like how to focus on operating something on the long term and that doesn't degrade. Because you could build a really good thing at earlier start, but then you don't know how to scale it. You don't know how to operate it. It's not about the building process, it's the operation process. That's where you have to combine both of these practices together. So you make sure that as you build, you continue to operate efficiently in a way that this platform can scale. And that's where you combine all of these things together. You combine the infrastructure with the product, with the SOE mindset, with other functions in the organization to bring the whole experience overall, you bring the whole platform, as you can see.

(33:58):
And that what delivers value at the end of the day. Because you don't want to build something. I don't want to build a Kubernetes cluster or I don't want to build like a CDN. I don't want to build a platform that doesn't mimic what the customer would see. They would take it and then it's not functioning. So who looks into that? Who helps them build their SLOs? This is the conversation. And we have seen this sort of engagement happens in a level where like, sure, I need to build a service and I have it in mind, but I don't know what that SLO going to look like. I don't know what's a Runbook look like. Usually people know, but they don't have the depth analysis into what are the things that you should be aware of. As much as you can put in a platform, you still need to have these practices because every service is a bit different.

(34:45):
There's one service that takes a request that takes like five seconds, which is okay. From a service perspective, holistically that might not be okay on other services. And there's another service that takes 100 milliseconds. The Runbook, the scale, all of that, these are little details and operation. We call them SRE engagements. And that's where you engage with a team to understand more in depth. So there's product, there's like engineering, there's SREs, there's like all of these things. There's the experience that the developer is seeing. This is how this returns back to ROI. And this is where I could see all of that fit since the same review into like, what are you developing and delivering at the end of the day?

Ganesh Datta (35:25):
Yeah, I love that because I think it's fundamentally focused on the end users of the New York Times, because if you can bake in those practices as part of the platform, then it's easier for your developers to deliver those experiences to their end users. And even if you think about it from an internal experience perspective, giving somebody a Kubernetes cluster and it's like, all right, good luck. Versus here's a cluster which has reliability practices baked in and we're going to make it very easy for you to get the monitors and SLOs that you need. And all those things baked in from day one is a much better end-to-end experience because it's not just like a, "Here you go, good luck. It's your problem now versus here's the end-to-end journey that you were talking about earlier." Has that always been the case back in the delivery engineering days, was SRE still part of it?

(36:10):
How has the organization evolved?

Sneha Rao (36:12):
It always was and it was always a practice and it still is. I think it's just evolved into thinking more full stack, thinking more of the native experience, thinking about the web experience, thinking about the data surfaces, just how do you build an app end-to-end, the whole nine yards matter because I mean, this is the journey when you're working on an SLO of a service, what you call a service, has all these dependencies on these multiple services, but they turn out to be application ecosystems that have multiple data graphs and that they have multiple client side calls and it's like a spaghetti mess that you need to unpack. And what you realize in that journey and what we realized in that journey was there are multiple reasons why SLOs are hard, but we ourselves as a delivery engineering team didn't have the full picture. We didn't have visibility.

(37:13):
We weren't in the conversation. We weren't having the right ... Those teams weren't within our Remit and now that they are, we have more controls. So for instance, we believe every tier zero service should have their own SLOs at this point,

(37:29):
But what does that mean for platform service? What does that mean for an end user service? Obviously a foundational platform service that is tier zero that can essentially pull down the New York Times should be treated with a higher level of availability goals, but they have so many dependencies across the board and they're invisible at times. So how should they work together? And we're learning, we're learning, we're figuring it out, we're getting much better, and we call this program that we're working on always ready. And it's a fun initiative, but it's so different from a consumer experience. So a consumer just knows what the end user experience needs to be. And sure, it knows the services it depends on, but it has to know that the platforms that support those services exist and that they are ready to go. Just like how you depend on Google Cloud, if you're on a GCP shop or you depend on AWS, if you're Amazon shop or whatever vendor you choose, that you have certain thresholds, the platform infrastructure should just work and operate with that in mind.

(38:40):
When you build your own infrastructure and when you build your own abstractions, it gets harder.

(38:46):
And that is the responsibility that I see. That is our responsibility to fix. And cost goes hand in hand with stability. How much are you willing to spend on this space? Sure, you could say, "I want five nines. Are you willing to cut the check for it? " And if you're not, then let's have that conversation. So if you don't have these functions centralized as we have it in developer platforms, there's a different kind of fragmentation that occurs where the cloud native financial operators may be sitting on one end of the spectrum further away from where the application development and the platform infrastructure is being created and the SRE could be on another side in another domain, in another spectrum, completely disconnected from the pain and suffering of what happens. So having it centralized, again, we're centralized. The New York Times, from a software engineering perspective, we're about under a thousand engineers.

(39:55):
So the scale of which we operate and the way that we can move is far more agile than perhaps a 10,000, 100,000 workforce of engineers. And I'm sure they need to make those trade-offs. And I'm not here to pitch this is the one way, but it totally makes sense in the context of the organization we have and the context of the work we do. And would it be nice to have more SRE engineers embedded in every mission that's building over time? If we could scale, this goes back to how do you think of DevOps?

(40:34):
Do you hire DevOps engineers on every team? Actually, truly speaking, every engineer at this point is a DevOps engineer. And SRE is also, it is a role and it is highly technical. And honestly, there are few people who get it and when they get it, you keep them. I think you hone in on that, but it is also something you can teach. It's a practice that you can teach. And I think every software engineer at this point should understand

Sneha Rao (41:03):
The practice of site reliability. This is production code that you built. You should be confident and you should work towards asking and making sure that you have the tools, the right acumen to be able to support that. And if you don't, that's a concern. Don't launch the production until you know how to do it and that you know how to support it. So that's where we come in and we try to help. I

Ganesh Datta (41:24):
Think it's really interesting because like you said, if you have SRE in one place and FinOps in one place and everything, they depend on each other. Like you said, every extra nine, there's some additional cost involved in every single nine or whatnot. And so having those two teams at odds goes against the idea of a holistic experience and a holistic business outcome. It's like those two things are not at odds. They're trade offs sometimes. And so as an organization, you have to be able to balance those things and figure out holistically what do we want to deliver to our customers and what does that mean at a service by service level or product by product level? And so I think we are starting to see a pattern in the market where these functions are starting to become more cohesive rather than separate functions. And I think that's really interesting.

(42:11):
And I guess maybe going back to the beginning of the conversation, we were talking about what is the charter of developer platform? And it sounds like it's not just developers, it's not just platform like FinOps and SRE and all these things. It sounds a lot broader than just what people might think as developer experience. How do you guys describe internally to leadership or business stakeholders? What is the goal of developer platform? What is it that you bring to the business?

Sneha Rao (42:38):
Yeah. Can we brainstorm?

Ganesh Datta (42:39):
Let's do it.

Sneha Rao (42:43):
I want to take a stab at it. I want to give you your perspective. You are right, and I sort of alluded to it. I feel like developer platforms helped us in many ways in terms of the users, having a very clear user base. Every developer at the New York Times is our customer, and that is very clear and that has helped us, but we do a lot more than that and it's become-

Ganesh Datta (43:09):
A lot more.

Sneha Rao (43:11):
So what should we call ourselves? It is core. It is fundamental. It is essentially what platform engineering really is, but we are not talking about it because those are not our platforms, but there are many other platforms that operate in the ecosystem. So we have a commerce platform and we have a publishing platform. And all of those pieces of the platform come together to serve their purpose built.

(43:42):
Publishing is for publishers. Commerce is for our payflow and our subscription and our revenue generation and so many others. In our case, it's not specific. It is- Organizational. It is organizational. So there's like a technical stack here. There's an architectural pattern here. We should draw it out, but we're a horizontal and we go as wide as we need to and we're starting to go as deep as we need to as well. And the further up we go, the more we realize that we need to lean into the vendors that we work with and build what is differentiated value for the New York Times versus undifferentiated value and invest in the things that differentiate us that are unique to our core business. So I talked about video observability and making sure that we understand the way that we serve video and what levers we want to pull.

(44:40):
At this point, I would say that's differentiation. A year from now, that's probably undifferentiated. Similarly, we have to pave the way for what those patterns are, and then once we understand them, we can find partners, thought partners who can lead the way.

(45:01):
And I genuinely believe that because this is the classic build versus buy. I've learned through my experiences, and when you build everything that you need, the economies of scale go against you. There's only so much that you can do on your own, and then your team is unable to scale up to the business challenges. And then at that point, you then become like, well, you become the bottleneck. We don't want to be the bottleneck. We want to be able to move as quickly as a business needs us to. And for that, I think that's the sound way to scale, grow. And honestly, I think we're just touching the ... This is the tip of the iceberg. We're going to go pretty far.

Ganesh Datta (45:45):
Yeah. And it's interesting you mentioned the commerce platform, payments platform. The value of those things is almost easier to explain because it's like we make payments possible or we make whatever possible. And so with your organization being so broad, what is it ... If we go to the CEO, the New York Times be like, "This is what developer platform does." How do you make that case? Hey, we need so many people on this platform team because we go so broad. What does that case look like? If you're an organization that is starting to think about building out a function like this, how do you start making that case? What is your value proposition internally?

Sneha Rao (46:20):
Yeah. Do you want to?

Ganesh Datta (46:24):
Go for it.

Sneha Rao (46:25):
Okay. I think it's a journey. I don't think you're going to get there day one. I don't think a CEO is going to start a business, as you have probably realized, as a co-founder of a business.That's not the first thing that you're thinking about. But as you scale and as a platform grows, you probably recognize that your teams aren't moving as fast as they need to. And that's something probably the CEO cares really deeply about. We have lots of really cool ideas on the table and they just keep lying around in the backlog and move into an icebox that no one ever gets to. That's the hook.

(47:04):
That's actually how you get started. It's like, what is preventing you as an enterprise? I mean, in the case of the New York Times, what is preventing us from building the readership experience, those highly engaged audiences that we need? What is preventing us from getting there? We know what they are. And then I would ask every enterprise, if they're on that journey, to think about what is your ambition? Are you on track to getting there? Do you have the right practices in mind for your five teams, your 10 teams, your hundreds of teams? Do you know how to work together? Do you know how to work in standard operating systems? And do you have patterns that are shared and templatized, caffolded ways for you all to work? Then great, maybe you figured it out. But for most enterprises, and that's been my experience, you hit a tipping point.

(47:59):
At Spotify, that tipping point was actually 5,000 employees,

(48:04):
And it's similar at the New York Times, which is why I came in because I'm like, "I know this space. I know this problem." And I think that's kind of, in my opinion, and if I ever wrote a book about it, it would be knowing where that entry point is. And it's usually at that point where you have about 90 to a hundred engineering teams and you're trying to figure out how to ... You're noticing perhaps that you're not recovering from your production issues as well, or that your teams are just working hard, but not moving in the direction that you need them to. They're struggling. There's something getting in the way and diagnosing that. It's an engineering problem and it is a platform problem. And that's where you start championing, okay, here are the patterns, here's the things that clearly CICD should be the first thing.

(49:01):
If there was one thing you wanted to start with, I would always start with the build test deploy pipeline. I think that is essential. For me, it seems like a no-brainer. You can disagree with me. And then what are the other pieces for application delivery? Okay. Well, we've noticed that most of the production issues are that we don't know how to scale. We can't predict what those behaviors are. And maybe at that point it's a runtime

Sneha Rao (49:26):
Sort of investment. And okay, well, we don't actually have the right parameter protections. We don't have a handle on our edge and we don't know how to operate in those. Then maybe that's the next evolution. So I could say it's compute networking and storage as the primitives, but every enterprise has their own shape to it. But for me, it would be the impetus of what would the CEO care about? Well, I would imagine that every CEO is in the business of making sure their business is successful and that they're monetizing on their goals and it's cash positive. And how do you get there is by building more lovable experiences and features and fine-tuning that and building that virtuous cycle. And if they see any bottlenecks towards it, then I think this is the playbook for

Ganesh Datta (50:18):
That. That makes a lot of sense. And for what it's worth, I think it's similar in every other function. It's like once you start scaling up to organization and you start to see the value in doing things in a repeatable way, and that drives efficiencies in and of itself. We know that in sales, you don't want every single seller to be selling the product a slightly different way. It's like actually, if you know something works, pretty much everyone should be doing it that way. It's the way to simplify and let people focus on their job. You don't want developers focused on things that can be repeatable across the org. You want them to be focused on value generating work. And I think if you start to realize that there's this dark matter of people are starting to do this random stuff and not working on this stuff that's generating business value, the average developer's not doing that, then it's probably time to say, "Hey, actually, if we took all that stuff that people are doing, give it to a team to focus on that specifically, then we can actually free up their time as a leverage producing activity as probably what you're describing." I know we're coming up on time, and so maybe I'll leave this question to you because I know we were talking about AI yesterday.

(51:22):
One of the things I think has been clear in this conversation is we're talking a lot about the broad horizontal organizational meta problems of enabling the organization. And to me, we haven't talked very specifically about developers or agents. It's kind of the just general practices of building and delivering software.That's kind of the thing we're focused on. So is it fair to say that whether it's humans or humans augmented with agents, all the stuff we talked about is still important? Has your strategy materially changed or is it just evolving based off of that? Or does it look still mostly the same as you're thinking about where AI fits in?

Ahmed Bebars (52:06):
So I would say this in a nice way. I think one of the things that I just want to say about AI in general, it's just like it's evolving, but it's another tool of the other tools that we've been looking. When you look at all of these practices that we have been trying to evolve around from runtime, from something else, the AI ecosystem, it's not on its own. You still need to have all of these practices embedded. It enabled to doing something differently from my opinion, just enable sometime speed, efficiency in some other areas, maybe accuracy in some areas. It depends on what are you trying to solve and how are you using it in the organization? The ways that my personal opinion on it, it can enable us to do a lot of things in a way before. It was still doable, but just now might be faster or now it changed the parameters that I have to give because it can do all of some of the works that needed to be done, but then you don't have to share it.

(53:06):
But also, again, it's a tool. We didn't reach ... Funny is that we were talking about AGI. We didn't reach that run it for me in the industry overall. But from a one perspective, to me, it's similar to how we evolved into the internet world. It's another evolving into how we shape that into the business, how we shape that into the experience overall. It's not replacing anything in my opinion. It's just giving more boost to how we do things differently. I was literally talking to Tony about this earlier when in the 2000, my example was in the 2000, before when you look at any business was on paper. When I send someone, you call on the telephone, you do something, the internet wasn't there into that ecosystem, then it's there. There's a digital transformation. I think where we are in now is in the AI transformation.

(54:03):
We have been using machine learning for a while. What's transformative about it, from my experience, from an engineering standpoint, it's been more generative in a way. I used to do in customer care, like some machine learning modeling and taxonomy and all of the kind of stuff. You have to set and spend the time on iterations and to develop your models the way that you want. What's happening today is different because these models come pre-trained with a lot of knowledge and information happening around the world and more and more it's becoming better and better in doing different things and different functions. So how I use that off the shelf into my actual work. Let's talk about runtime. How I use that into a runtime mechanism, just making sure that it's optimized to the levels that meet standard, have an agent that to optimize for that. Can that happen?

(54:52):
Yes, you have seen companies around that space. In the SRE space, I could see that in a space where tell your best practices, run over your scorecards, but it's a tool that embeds with your practice in place. What makes a human element more important, and this is something that I mentioned earlier in my conversation, the problem from an engineering standpoint we're all solving for is fairly similar, but each organization has its unique problems that it's solving for because they have that business ideas, they have the goals, they have that vision of where they want to go. And that's what are you trying to solve for. So you try to find engineering solutions, product solutions, platforms that fits the goal that the enterprise or the company has. And that's where I see AI fits into that space. It's just another evolution of how technology is evolving around us.

(55:46):
And we have to take that and see where it actually makes the most sense. Is it in a migration? Is it like in SRE? Is it like in product? Is it where that? And if that capability that make more sense brings us the ROI that it makes sense to the platform, that's where we should invest more into it. But at this moment, it's more of like, let's try in many areas how we evolve around this and how we make that transformation into the product, into the business, into everything.

Ganesh Datta (56:17):
Love that. I think a couple of things you mentioned there, a lot of ideas around, it's our practices, it's our mission, it's our vision, and it doesn't matter whether it's AI or anything else, that's what you were saying. It's a very broad horizontal organization. And things like a tier zero service is a tier zero service, whether or not AI rotate or human rote. If it goes down, it goes down and the site goes down. SRE practices like reliability, your customers are not going to be like, oh, the New York Times are using AI to write their code. I want my news. I either get it or I don't. And it's like your customers have those expectations. And so in some ways it sounds like the work you guys have done on the platform are going to become more and more important because by codifying these things into your infrastructure, into your platform and making it easier for people to build on it, actually you guys are better positioned to take advantage of AI now.

(57:03):
If you have designers or product managers that want to go and build their own software, hey, our platform is going to do all this stuff for you and actually you don't have to think about it. Maybe you're less scared of delivering that stuff to production because a platform takes care of those things for you. And so it kind of becomes a leverage point. I know we're at time,

(57:20):
Sneha, Ahmed, thank you both for coming on the podcast. I think one of my takeaways from this is I think you guys have a very, very, one of the most mature mindsets around platform and the product management around it that I've had the opportunity to talk to. So I think people will have a lot to learn from this episode. So thank you both for joining me.

Sneha Rao (57:41):
Thank you for saying that. And it's been a pleasure.

Ahmed Bebars (57:43):
Yeah. Thank you for having us and look forward to chat with you again in Braintrust.

Ganesh Datta (57:47):
We'll talk about naming on the next episode. Yes. Thank you both. Thanks so much for listening to this episode of Braintrust. If this resonated with you, do me a favor, share it with another engineering leader who's wrestling with these same challenges. And if you want to continue the conversation or learn more about how we're thinking about engineering operations platforms at Cortex, reach out to us at cortex.io. Thanks for listening and we'll catch you on the next one.

More episodes

Chapters

What is Braintrust by Cortex?