Chain of Thought | AI Agents, Infrastructure & Engineering

AMD's VP of AI Software runs 10-12 Claude Code agents in parallel, burns 6.5 billion tokens a week, and rewrote a 25-year-old Slurm replacement in Rust overnight. Anush Elangovan on why normal SDLC is dead, testing is the new code review, and software is just tokens.

Show Notes

What happens when a VP of AI Software at a major chip company goes all-in on AI coding agents for his own team's work?
Anush Elangovan runs 10–12 Claude Code agents across three machines, burns 6.5 billion tokens a week, and rewrote a 25-year-old project (Slurm → Spur in Rust) in a single night.
He does it all on dangerously-skip-permissions.

About Anush
Anush Elangovan is Corporate VP of AI Software at AMD. He founded Nod.ai, where his team built SHARK and was a primary contributor to Torch-MLIR and IREE. AMD acquired Nod.ai in 2023, and Anush now leads AI software strategy across AMD's full silicon portfolio. Before Nod.ai, he shipped the graphics stack on the first ARM Chromebook and led Chrome OS's migration to Gentoo.

We cover:
  • How Anush runs 10–12 parallel agents with a geo-distributed AMD hardware rig
  • Why the test harness is the new code review (and why agents are "sneaky and dumb")
  • Rewriting a 25-year-old project in Rust overnight, without opening the editor
  • Why every new project is in Rust specifically because he refuses to learn it
  • The "HR partner fixing engineering bugs" moment and what it says about upskilling
  • Why normal SDLC is dead and speed is the only durable moat
  • AMD's fully open-source software stack and how community contributions are accelerating ROCm
  • "Software is just tokens" and what that means for AMD's bet against CUDA lock-in
Connect with Anush
LinkedIn: linkedin.com/in/anushelangovan
Twitter/X: @AnushElangovan
AMD AI blog: amd.com
AMD AI Developer Program: amd.com/developer

Connect with Conor
Newsletter: newsletter.chainofthought.show
Twitter/X: @ConorBronsdon
LinkedIn: linkedin.com/in/conorbronsdon
YouTube: @ConorBronsdon
More episodes: chainofthought.show

Chapters
0:00 Cold open
0:21 Welcome + guest intro
3:43 250K lines a week, 10–12 parallel agents
7:34 Agent architecture + geo-distributed test rig
9:57 When does AI-generated code become a liability?
14:12 80% tests first: the test harness philosophy
18:24 Dangerously-skip-permissions + testing as code review
19:52 "Normal SDLC is dead in the agentic world"
20:44 Advice for engineers and leaders who feel behind
24:51 Tokens, throughput, and what happens next
26:29 Block layoffs, uneven AI gains, the 25-year Slurm rewrite
32:55 Galileo sponsor break
34:24 When agents go off the rails: sneaky and dumb
37:52 Orchestrator agents vs. focused multi-threading
40:45 Open source, ROCm, AMD's software bet
44:19 "Software is just tokens"
45:24 AMD Developer Program + community contributions
47:09 Where to start with AMD
48:39 Heterogeneous compute
50:13 Outro

Thanks to Galileo. Download their free 165-page guide to mastering multi-agent systems at galileo.ai/mastering-multi-agent-systems

Full show notes: newsletter.chainofthought.show

Disclaimer from our host: All views, opinions and statements expressed on this account are solely my own and are made in my personal capacity. They do not reflect, and should not be construed as reflecting, the views, positions, or policies of my employer. This account is not affiliated with, authorized by, or endorsed by my employer in any way.

What is Chain of Thought | AI Agents, Infrastructure & Engineering?

AI is reshaping infrastructure, strategy, and entire industries. Host Conor Bronsdon talks to the engineers, founders, and researchers building breakthrough AI systems about what it actually takes to ship AI in production, where the opportunities lie, and how leaders should think about the strategic bets ahead.

Chain of Thought translates technical depth into actionable insights for builders and decision-makers. New episodes weekly.

Conor Bronsdon is an angel investor in AI and dev tools, Technical Ecosystem Lead at Modular, and previously led growth at AI startups Galileo and LinearB.

Disclaimer: All views, opinions and statements expressed on this account are solely my own and are made in my personal capacity. They do not reflect, and should not be construed as reflecting, the views, positions, or policies of Modular. This account is not affiliated with, authorized by, or endorsed by Modular in any way.

FINAL TRANSCRIPT
================
Speakers: Conor Bronsdon, Anush Elangovan
Duration: 50:54
Total Words: 8518
Generated: 2026-04-21

---

[0:00] Anush Elangovan:
Normal SDLC is dead in the agentic world. The new world is here and the new world is just forming. And the only way you keep up with this and you subscribe to the speed is the boat moto, right? You've got to run. You just got to run faster than, you know, what's trying to catch up.

[0:21] Conor Bronsdon: [OVERLAP]
Welcome back to Chain of Thought, everyone. I am your host once again, Connor Bronson, Head of Technical Ecosystem at Modular. My guest today is someone that you may be familiar with if you have been listening to Chain of Thought for a while, and that is Anoush Elangovan, Corporate VP of AI Software at AMD. I'm delighted to have him back with us. Anoush, as many of you know, founded Nod.ai, where his team built Shark and became a primary contributor to Torch ML IR. And when AMD acquired Nod in, what, 2023? God, three years ago now, Anoush began leading AI software strategy across the company's full silicon portfolio. Before Nod, he had shipped the graphics stack on the first arm. Chromebook, he helped lead Chrome OS migration. He's been building at the intersection of compilers, hardware, and AI for over two decades, and his coding more than ever today with the help of AI agents, which I'm sure he's very excited to talk about. Many of you may know him for going viral for building a pure Python AMD GPU user space drive with Cloud Code without opening an editor once, which I'm sure we'll talk a bit about. And as anyone who has listened to him before may know, he will probably tell us that speed is the moat and we need to use our agents to accelerate ourselves because they are a great equalizer in software today. In fact, as we're recording this today, we are seeing that play out with Claude Code's internal source code leaking out into the internet and people rebuilding it with their agents rapidly into Rust, into Python, and changing that TypeScript package. It's going to be a fascinating few months and with Anoush running six or more AI coding agents in parallel at once, generating 250,000 lines of code in a single week, like I think many of us are starting to do. And as he said, consuming 4.2 billion tokens weekly, he is rewriting, like I think many of us are, 25 year old projects overnight. Anoush is so good to see you. Welcome back to Chain of Thought.

[2:22] Anush Elangovan: [OVERLAP]
Great to be here. Glad to have this conversation. Super excited to talk about all of the AI impact that's having on our software development methodologies and just people in general.

[2:35] Conor Bronsdon: [OVERLAP]
Oh my goodness, it is a crazy time. It is bedlam

[2:38] Anush Elangovan: [OVERLAP]
Yeah.

[2:38] Conor Bronsdon:
out there. I know you and I talked, what, eight months ago now at this point, something like that. And I think we kind of talked, oh, well, we'll, we'll talk in a year, see where things are. And well, there's a reason we've had to shorten that timeframe because there is a lot going on. And one of the ways you can protect yourself from everything going on is with our presenting sponsor, Galileo for reliable AI evals and guardrails. Maybe you don't want your source code to leak, uh, to leak onto the internet. Well, maybe check them out at Galileo.ai. But we're not talking about guardrails today. We are talking about how to go fast. And we may talk guardrails and testing a little bit. So what happens when a VP of AI software and a major chip company goes all in on AI agents for his own team's work? And what does that reveal about where software development is actually heading? Anoush, you told me before this call that you've been generating 250,000 lines of code a week with AI agents. And obviously lines of code isn't a metric of success, but as an indicator of throughput. Walk me through what your typical day looks like, given that I expect you are in meetings all day, you are a CBP at AMT, and it sounds like you have a bunch of agents running in parallel.

[3:43] Anush Elangovan:
That's right, Connor. You know, I feel like AI agents and agentic coding has given me 20 years back in my career, right? That's how I feel. Last night, I had a little... Of course, it's all through cloud code or codex, right? I'm just like, okay, just look at my GitHub and tell me all the commits, per month commits. And March ending today is the largest number of code that I've ever generated in my entire life since like, you know, GitHub signed up in 2007 or something, right? And it is fascinating because I generated all that code without opening the editor once. And it is about how you frame the problem, how you understand the harness that you're setting up. And then it is just The first principle of engineering is still there, it's just that you're not manually doing it. It's like suddenly you've brought in an excavator when we've been using sticks and stones to dig holes, and suddenly someone shows up with a big excavator and

[4:59] Conor Bronsdon:
Yes.

[5:00] Anush Elangovan: [OVERLAP]
four arms and they're just like going

[5:02] Conor Bronsdon: [OVERLAP]
You're like,

[5:03] Anush Elangovan: [OVERLAP]
at it.

[5:03] Conor Bronsdon: [OVERLAP]
how much dirt we just pull out here? What the heck? Yeah. What do we do with

[5:06] Anush Elangovan: [OVERLAP]
Exactly.

[5:06] Conor Bronsdon: [OVERLAP]
it all? We don't even know yet.

[5:09] Anush Elangovan:
But now you need to plan because you're like, oh, if I start digging this, I'm going to get rid of this whole hill over here. You're thinking of the next level of what is the outcome, then like, OK, I need to shovel this thing of sand from here to here is

[5:25] Conor Bronsdon: [OVERLAP]
No,

[5:25] Anush Elangovan: [OVERLAP]
what

[5:25] Conor Bronsdon: [OVERLAP]
I'm

[5:25] Anush Elangovan: [OVERLAP]
you could.

[5:25] Conor Bronsdon: [OVERLAP]
following this. We're changing

[5:26] Anush Elangovan: [OVERLAP]
Yeah.

[5:26] Conor Bronsdon: [OVERLAP]
the software landscape to your point. We're not just changing the build site. You know, all that

[5:31] Anush Elangovan: [OVERLAP]
That's right.

[5:32] Conor Bronsdon:
code has to go somewhere.

[5:33] Anush Elangovan:
Exactly, exactly. And I can give you an overview of things that I'm working on. They're very complex projects. I have a full GPU, virtual GPU ISA simulator written. I rewrote Slurm called Spur in Rust just because I wanted to have a modern, scalable thing. And it set up the CI, CD. It's running across multiple clusters. Now we are deployed in production internally. Then I did this ISA rewriter so that you can on the fly target different GPUs, right? So that you can start from an RDNA and target the cDNA, start from a cDNA and target the RDNA, but still keep the application the same. And I can keep going. There's like six such projects that are, if I were to plan it, you know, pre-December last year, I'd be like, oh, that takes six months. Actually, there was a case where the user space driver, it was like, oh, if we were to rewrite it, it's like six people over six months and 2 million in hardware. All I had was like cloud code and a virtual machine rig, which could power up Windows and Linux. And it just, you know, came up with a plan and we just went, you know, executed through the whole thing. And now we have a full open source Windows device driver, WDDM driver that, you know, that's just it. It was not possible even in December. It would have been like, oh, two years to get someone to write a Windows driver, not even going to attempt it. Now it's like, if there is a problem, the first thing I like, my notepad is cloud code. My notepad is like when someone tells me I have to do something or I would like to do this or something, I actually type it into cloud code to just be like, okay, go ahead. implemented because the time it takes for humans to like come to an agreement, you may have already implemented all the options that the humans are going to discuss. And then you just start with like, okay, which one do you pick?

[7:34] Conor Bronsdon:
So, uh, how many instances are you running right now while we're on this podcast together? Because I

[7:39] Anush Elangovan: [OVERLAP]
I have.

[7:39] Conor Bronsdon: [OVERLAP]
heard a little ping there that I recognize. I

[7:42] Anush Elangovan:
Oh, you heard the pig outside.

[7:44] Conor Bronsdon: [OVERLAP]
forgot to ask you to mute your notifications. So

[7:46] Anush Elangovan: [OVERLAP]
Sorry,

[7:46] Conor Bronsdon: [OVERLAP]
I'm.

[7:46] Anush Elangovan: [OVERLAP]
I don't know. That wasn't supposed to lead to, but I'm going to. There are literally six. Actually, one, two, three, four, five in one system, three in another, and then two on my Mac. It's got 10, 12 agents. And these are like the main agents, right? And then they have sub-agents

[8:09] Conor Bronsdon: [OVERLAP]
how many

[8:10] Anush Elangovan: [OVERLAP]
that

[8:10] Conor Bronsdon: [OVERLAP]
sub-agents

[8:10] Anush Elangovan: [OVERLAP]
are,

[8:11] Conor Bronsdon: [OVERLAP]
are they spawning, yeah.

[8:11] Anush Elangovan: [OVERLAP]
and they're like, you know, like for the GPU simulator, emulator that are writing fully open source, you can model ISAs. It's modeling transpiling between GFX 1250 to GFX 950, you know, different ones, and each one is a sub-agent, and that agent is just going. All I have is like the numerics you need to check out, and it has a performance guarantee. Within that, it can do whatever it wants. When it's done, I'll go review it and then see it's OK to go or not. But yeah, I have muted my agents now. So I will try to keep the interaction on.

[8:48] Conor Bronsdon:
I have three or four running right now. So I,

[8:50] Anush Elangovan: [OVERLAP]
Good,

[8:50] Conor Bronsdon: [OVERLAP]
I'm, I'm

[8:51] Anush Elangovan: [OVERLAP]
good,

[8:51] Conor Bronsdon: [OVERLAP]
not quite

[8:51] Anush Elangovan: [OVERLAP]
good,

[8:51] Conor Bronsdon: [OVERLAP]
as

[8:51] Anush Elangovan: [OVERLAP]
good.

[8:51] Conor Bronsdon: [OVERLAP]
many as you at the moment, but,

[8:53] Anush Elangovan: [OVERLAP]
Isn't

[8:53] Conor Bronsdon: [OVERLAP]
uh,

[8:53] Anush Elangovan: [OVERLAP]
this

[8:53] Conor Bronsdon: [OVERLAP]
I understand.

[8:53] Anush Elangovan: [OVERLAP]
like,

[8:54] Conor Bronsdon: [OVERLAP]
Yes.

[8:54] Anush Elangovan: [OVERLAP]
what is the new word, token maxing?

[8:57] Conor Bronsdon:
Yes. I, um, uh, a good, a good friend of mine is, uh, one of the maintainers of open Klan. He was complaining about how he was on a flight for a few hours and he's like, I'm not really able to code effectively here. Cause I just, it's, it's like, I feel like I'm trying to get tokens through a straw here. I can barely do what I need to do. Uh, and yeah.

[9:15] Anush Elangovan:
So the way I do it is I have a personal tail scale network with head scale set up so that it's my own, you know, VPN setup. And in there, it's just like a redundant power redundant systems with different variations of AMD hardware. And so each agent could like if it's testing windows on RDNA 4, it'll just spin up, it just knows where to log in. And it's geo distributed. So The test bed is set up so that the agents can talk and do whatever it wants to keep the progress going. But then it's still contained in that environment. And then I can review what goes in and out of that enclosure.

[9:57] Conor Bronsdon:
So, I mean, obviously this is an incredible throughput, just step change in how much code we were able to generate. But as anyone who ever listened to me on Dev Interrupted knows, lines of code are not necessarily an indicator of successful code. I think that has changed a little bit because if you have good iteration loops in place and you can just keep agents turning in a problem, their ability to effectively refactor and learn from their mistakes means that massive throughput and the increased velocity that can drive, as long as you continually refactor and iterate, will actually start to drive success over time, again, as long as your infrastructure setup is, is well done. But what would you say to folks who basically say, look at this and go, I mean, it's great you're generating that much code, but you know, at what volume does AI generated code become a liability? How do we maintain that?

[10:54] Anush Elangovan:
Yeah. You know, it's a very, very good question because like, I mean, even since we spoke, I think I said like a week was like 4.2 billion tokens. I just checked this morning, it's like 6.5 billion tokens consumed and lines of code I can't keep track now, right? And it is, so yesterday I was trying to explain the six projects that I had. in one level lower than what I explained to you, right? Like it's like, okay, hey, this is how we are doing this, how we are hooking up the user space to the kernel space, the kernel space, you know, even the firmware loading is going to user space, blah, blah, blah. I took like about 10 minutes and the other person was like, can't drop this, right? So then we were like, okay, you know what? You take your agent, point them to these commits and these planned documents, and let's meet after your agent digests this, right? So that person reached out to like whatever the six repositories, plus the six planned documents, blah, blah, and it generated one PDF that they consumed. I looked at it, that's exactly my vision of what these six projects are doing. they looked at it and they were like, oh, now I get what you're trying to do. And they're generating 100,000 lines of code, too. It's not like they're not, you know, AI native. They're going hard at it. But increasingly, what's going to happen is, it's just, you know, how you review code how you access code, how you do anything with your computer is going to be agentic. And that's a bold statement or a change. In December, my view of agents were like, haha, it's a cron job in a prompt. It's like a prompt in a cron job. That's what I looked at it as. Sure, you can ask it to do something here or there. But now it's gotten to a point where, you know, I have an agent that just monitors the spur issue tracker. So it says any issues someone files, it fixes it, it tests it, deploys it. If CI is green, it'll ship, right? And every commit it makes, it has to write a test to fix that before it's committed. And so I just have some rules like that, right? After that, if it decides that it can't read a 10,000 line file, sure, it can break it up. But if the tests pass, ship it.

[13:22] Conor Bronsdon:
Yeah, I think that is increasingly the philosophy of many projects. And we're seeing this bifurcation between projects that are embracing this philosophy and saying, we are going to continually be churning through code and we are just going to focus on good CICD principles, good architecture, and we're going to have a ton of changes. And I think OpenClaw is probably the biggest example of this, right, where people are just like, yes, we're just constantly refactoring, we're constantly changing. And then you look at projects that are still trying to push back on this and say, oh, we don't want AI-generated code. You know, we're having these extensive reviews. And they are starting to fall behind from a velocity perspective. And I know you have a thesis on this. What's your perspective around velocity in this current era?

[14:12] Anush Elangovan:
Yeah, so my philosophy is, even before agentic coding, I used to advise people and friends and colleagues, 80% of your time should be spent on tests and writing tests. Even before you write your first line of code, you should have a test framework. What are you going to test? How are you going to do it? And that's just like first principles engineering of like, okay, I want to make sure I've covered everything. Can I, even when you get to hardware, right? Like it's like, I have some I squared C lines. I want to simulate each of that so that the driver is not there. It's not going to die when you fuzz something. Now it's just like 99% of the time is your test bench. even with cloud code. And so when I start with a plan, I explicitly ask it for like, okay, can we write down more tests? Can we take, you know, like for this ISAR rewriter, right? Like I said, hey, go get all of the data kernels that are pre-compiled and use that as a test set. Every one of them should pass in every case for every commit that you have to make. But if it passes every one of these tests, I

[15:21] Anush Elangovan:
don't really care if the ability for a human to read takes a lower priority. Yes, of course, at some point, we'll be like, okay, the robots are taking over and we need to go check the code. That's fine. We'll deal with that at that time. But right now, In my personal space, I treat all code like ephemeral. Every line of code is an iteration loop towards where you're going, and you're generating reference points along the line for your agents to learn and for other agents to learn. So if I have written an ISA rewriter now, like for example, yesterday I was implementing, last night actually, Like a couple of weeks as I had with another colleague, we had implemented HIP remote. So you can actually run Rokum on any client, even on a Mac actually. And it will mount a GPU and you can use Rokum across the network and treat it like a local GPU. Of course, modular network latency and bandwidth. So that was a good good project, we were making good progress. And then there was a requirement yesterday to see, OK, hey, we want to use this capability for debug, where we want to capture what goes into the GPU, and we want to replay it on a different system and in the loop and ensure it's OK. Until now, RockM didn't have that capability. Well, until 12 hours ago, it didn't have that capability. And last night, it went to like 3 a.m. But anyway, last night, we were like, okay, here is the pointer to the scratchpad of how HIP remote was written. And that had, you know, how many thousands of lines of code to do remoting of the HIP protocol. But now, learn from it and implement the record and replay protocol. Right. So it could record any GPU, you know, handoff and it could then replay it. And that works now. So now I'm actually one of my agents is generating a random sample of ML workloads, inference workloads, and it's trying to record and replay. And the test set says, record and replay in Linux, in Windows, on X number of machines with different hardware, and ensure all of them pass. And that's your test harness. Now, When it's ready and if someone says, hey, you're doing this silly, you have this, this, this, or you should rewrite in Rust, I literally would be like, take this thing and rewrite in Rust. And all my projects now are written in Rust. You know why?

[18:06] Anush Elangovan:
Because I don't know Rust. Because I never want to know Rust. And I hear good things, but great. Sure, that's my agent's problem. And I'm not opening a file.rs ever. But all my current projects are us projects.

[18:24] Conor Bronsdon:
So am I right to assume that all of these agents are operating in dangerously skip permissions mode?

[18:31] Anush Elangovan: [OVERLAP]
100%. All

[18:32] Conor Bronsdon: [OVERLAP]
And

[18:32] Anush Elangovan: [OVERLAP]
of them

[18:32] Conor Bronsdon: [OVERLAP]
then,

[18:32] Anush Elangovan: [OVERLAP]
are

[18:33] Conor Bronsdon: [OVERLAP]
I

[18:33] Anush Elangovan:
digital treescape permissions.

[18:34] Conor Bronsdon:
mean, clearly from your tech test harness, you are treating testing as the new code review.

[18:39] Anush Elangovan:
Yep, testing is the code review and testing is kind of the gate of your sandbox. If you have a complaint about anything in the box, fix the sandbox, right? Like if you say, hey, this thing is not done right, it's suboptimal, blah, blah, blah, it's fine. It's like, okay, that means your test harness should say there's a performance. metric that says I need to be in my virtual GPU simulator I said each dispatch has to be executed in 50 microseconds right in simulator time or whatever some metric like that. And that's my sandbox, right? So that whenever this agent tries to go make changes, if that test fails, it'll be like, oh, I have to figure out something else because that's a hard requirement, right? And then it'll iterate, it'll refactor. I don't have to tell it when is it time to refactor. I don't have to argue with agents as to when it's time to refactor. Can I maintain the SLA of the dispatch? Can I maintain the correctness of the outcomes? And, you know, can I ship, right? Then refactor is like, okay, whenever the agent feels like it can do that, that's fine.

[19:46] Conor Bronsdon:
Do you think it would be fair to call this agentic, test-driven development or some variation of that?

[19:52] Anush Elangovan:
I think it's moving so fast that don't think I have the terms yet for what we're experiencing, but, it's moving so fast that you can't coin something and be like okay you know it's it's normal SDLC is dead right in the agentic world and and the new the new world is here and the new world is just forming so you know there's going to be assumptions and institutional structures that are going to be toned down, which is part of change, right? And the only way

[20:35] Anush Elangovan:
you keep up with this and you subscribe to the speed is the motor, right? You've got to run, you just got to run faster than what's trying to catch up.

[20:44] Conor Bronsdon:
So if I'm an engineering leader or an engineer who's listening to this and I'm using AI, but maybe it's not at the speed that you're describing, how do I take that next step to truly running, as you put it?

[21:04] Anush Elangovan:
I see an engineer aspect to this, which is how do you move fast? How do you learn the new craft? How are you going to give in your shovels and start learning how to drive this excavator? So that's a skill that we have to invest time. And there's no other way than time. It literally has to be hands on keyboard. You know spend time like I started with two hours on a Sunday for this in December And and so I would highly encourage everyone in the space whether you're an IC or a manager manager or leader to spend the time yourself because Just yesterday I taught my HR partner how to use cloud code and he's now fixing engineering issues. I was like, what? So his agent is like, when I'm not doing my HR work, I think I need to send my agent towards fixing some bugs. I was like, great. you know, the

[22:04] Conor Bronsdon: [OVERLAP]
Amazing.

[22:04] Anush Elangovan: [OVERLAP]
great equalizer of, you know, like, and traditionally this would have been like, oh, no, you're an HR person, why are you doing this? Talk to the manager who's going to do that. And then the manager will be like, I have a nine month plan that's going to be implemented in 12 months. And, you know, so there's going to be the IC angle. So for the ICs and, you know, the engineer and you, You just have to go founder mode for your own skills development. You just go in, learn what this thing is. There is no other way. Nobody's going to teach you what this is, right? It's like pre-internet computers, and suddenly someone's going and showing you Google. It suddenly came from the mainframe to Google, and you're like, what do you mean you sent them an email? I mean, it's that level transformational. So you've got to experience the internet is all I'm saying. Then on the leader's side, there's a responsibility in terms of how you upskill and bring people along for this journey. This transformation is going to be so huge and impactful that we want to make sure we bring as many of us as possible into this. And it could take different amounts of time, but as a leader, we want to learn the trade and then teach the trade or the craft. So you're like, okay, this is how I use agent decoding. So I actually have very, very small sessions with teams that I may not necessarily work with daily, but I reach out and we're like, okay, let's just 10 people. and I show them each one of my agents and I say, okay, you tell me one bug that you're working on. I'm going to help fix that bug for you because I know nothing about that bug, but I know I will fix it by the end of this meeting. And we'll say, okay, great, what platform is it on? What is the ticket number or whatever, right? I'll point it, I'll ask it to download the artifacts and run it and da-da-da. And more often than not, it solves the issue. And then at the end of the meeting, people are like, OK, wait, this thing is like, OK, I need to know what this thing is. It's so transformational. So I think for leaders, you've got to do two things. You've got to first be relevant in the future, which means you've got to upskill yourself. So put your oxygen mask first before you put, you know, try to help someone with your oxygen mask, with their oxygen mask. And then second, you go and make sure you can help everyone learn the new tricks of the trade. And the new tricks are growing day by day and the trade is changing.

[24:51] Conor Bronsdon:
How do you see it continuing to change as we look to the next couple of months as the acceleration in effect continues, but also potentially tokens get more expensive as we're seeing signals of that maybe something that occurs? What do you think is going to happen?

[25:06] Anush Elangovan:
Yeah, I think I don't want to act like an oracle that knows everything. I'm barely in like, okay, I got my bearings on what this thing is and I can deploy it to some level that are like, you know, engineers in our teams that are 10x more throughput and they go to sleep and their tokens are maxed out. So I'm like, okay. So there's another level of unlock that I don't recognize yet. I'm old school in that sense. I have a Byobu terminal with six agents that I'm talking to. Six agents, they have sub-agents. But there are orchestrators and agent things and higher level, intent-driven mechanisms that I haven't yet understood. So to your first question on make a prediction two months from now, I don't know where the industry will be, but I'm going to try to run as fast as possible to be relevant in that industry.

[26:09] Anush Elangovan:
The second one is it's Yeah, I think it's similar, right? It's just like it's moving so fast that you really... I don't think you can make a statement on where it's going to be, what we're going to do. It's just soak in and run.

[26:29] Conor Bronsdon:
Yeah, I mean, we're seeing these large layoffs at certain orgs, like Block is one great example. I wrote a large essay about that on my sub stack for anyone who wants to check it out. But there is a transformation occurring and orgs are looking for people to be AI native. And if you do not meet that transition, if you do not aligned to it, your skills are not safe in this coming era because of the throughput expectations that are being put out there. And because we're seeing, I mean, CircleCI did some great research on this back in January that refactoring.fm published. I highly recommend Luca's sub stack is on the side. But they talked about the fact that we are seeing this acceleration effect where the top 10% of engineering teams at the time were simply showing this massive gains from AI coding. And then a lot of folks throughout the system were not feeling it because, and this is where you see all these articles about like, oh, you know, a large percent of AI projects aren't delivering yet. And it's because the gains are very unevenly distributed. And I think that's starting to change as it's really becoming obvious to folks who are at all involved with the cutting edge and are cutting up the AI, that these tools are accelerating, the models are getting better, the, the hardnesses are improving. what you're able to do is vastly increased. Like I am a bad engineer. I have never been a full-time engineer. I have been a hobbyist. I've worked with engineers a ton. Uh, and I can now ship projects overnight.

[28:10] Anush Elangovan: [OVERLAP]
Yeah.

[28:10] Conor Bronsdon: [OVERLAP]
Um, you, as someone with deeper experience, you did a refactor of a 25 year project, uh, with Slurm

[28:17] Anush Elangovan: [OVERLAP]
Right,

[28:18] Conor Bronsdon: [OVERLAP]
turning it into Rust,

[28:18] Anush Elangovan: [OVERLAP]
right, right,

[28:19] Conor Bronsdon: [OVERLAP]
uh, for

[28:19] Anush Elangovan: [OVERLAP]
right,

[28:19] Conor Bronsdon: [OVERLAP]
Spur. And that's

[28:21] Anush Elangovan: [OVERLAP]
right.

[28:21] Conor Bronsdon: [OVERLAP]
a, a massive accumulated code base that you rebuilt in a single night. What does that say about how we should be thinking about both the value of software development skills today and the value of legacy codebases?

[28:34] Anush Elangovan:
Yeah, so I think, you know, I'm a big believer in like Jevon's paradox. It's just, it's going to get

[28:49] Anush Elangovan:
There's going to be so much more demand, both in terms of GPU compute, people doing things. We should just embrace that. It's just going to be, everyone's going to want to have this. Just like this example I used for the HR person, their efficiency now has like gone up tenfold probably, right? And he ran into some issue with cloud code and he had cloud code fix it. for himself and he was just like, okay, I don't need anything else now, right? I'm self-sufficient with Cloud Code. So I think, you know, there's going to be that transformation, but the transformation is also going to unlock so many more opportunities that I'm not worried about it in terms of like, you know, like a loss of jobs or something like that, right? It's more, I think there's going to be so much more to do. And there may be the top end of the K, right? Which will be like, okay, they have taken off, right? Like they'll be generating like 2 million lines of code a day. And they're just like orchestrating agents and agents orchestrating agents. The main thing is they need to be headed towards a goal because they could go off into the wilderness and not show up and they'll just be burning tokens, right? So that's one band. But then there's the mid tier that we want to make sure everyone like the software engineering leaders, et cetera, enable and steer. And they may have a slow movement, but when that transition happens, it'll be fast, right? It'll be, you know, I'm sure we've all noticed the, or at least the older folks like me have noticed like BlackBerry transitions and

[30:36] Anush Elangovan:
similar transitions in, where they were like, oh, this is all we did. And I think there was

[30:47] Anush Elangovan:
enough place for the technological innovation to absorb what we can do. And then if you look at it at a meta level, what is happening is we are automating out repetitive, redundant work. that there used to be a huge cognitive overload, right? So if you just look at it from that perspective, it's not like we want all of the engineers to only be in the top band, right? just what normal engineering flow would have been, which is git rebase my 22 patches on a diverged branch. Would have been like, I'm gonna spend two hours, I need to rebase this like halfway through, I rebased it, I rebased it. And then, oh man, this thing doesn't work now. I need to figure out when was the last point at work. And then there's a commit that broke it and that's from somebody else. And I don't have context of what they're doing. Now I got to go learn what they're doing and they changed some underlying API that broke my thing. And now I got to go figure this out. All of that is net

[31:54] Anush Elangovan:
burdensome on a human because they had innovated on something. They just couldn't get the innovation hitched on to the bandwagon. And in the past two, three months, I've never touched Git manually, right? Like, it's just, I just ask God, go to like, okay, rebase this, open up for review. And I look at it as like, okay, fine. Roughly, this is what I want to do. The most time I spend is on the plan document, and the test harness. And everything in between is like, I treat it like this. That's V1. Sure, we'll do V2. Update the client, update the test, right? V3, right? So even if you look at it as efficiency improvements for

[32:37] Anush Elangovan:
developers, that's going to be huge, right? It's not that they even have to do multi-agent, some crazy flows or anything like that. make their life easier. So there's a lot that's going to touch everyone and it'll be exciting.

[32:55] Conor Bronsdon:
Thanks to Galileo for sponsoring this episode. Their new 165-page comprehensive guide to mastering multi-agent systems is freely available on their website at galileo.ai and provides you the lens you need to understand when multi-agent systems add value versus single-agent approaches, how to design them efficiently, and how to build reliable systems that work in production. Download it for free at the link in the show description to discover how to continuously improve for your AI agents, identify and avoid common coordination pitfalls, master context engineering for agent collaboration, measure performance with multi-agent metrics, and much more. Yeah, you mentioned when things go wrong in that answer too. And I'd be curious to get your perspective here because we've talked a lot about the successes and what's possible here. And you brought up refactoring and we've talked about that a bit already, but I think back to, God, was it 18 months ago now or something when there was this big article from Amazon saying, oh, we've done all this refactoring with our, you know, Amazon Q tool. And part of that was obviously marketing for Amazon Q. But I think at that time, folks were kind of like, oh, no way. Like, you know, we're kind of pooh-poohing this. And now, you know, we're a year plus on, and this is just obvious. Like, of course that's possible. You can do that in an evening now. You can do that in a couple hours

[34:24] Anush Elangovan: [OVERLAP]
Yeah.

[34:24] Conor Bronsdon: [OVERLAP]
to get clogged code and port it into a new language so you can try to keep it up in a GitHub repo. And yet there are plenty of examples where AI goes off the rails or spins off in the wrong direction. How do you account for that? And what do you do when mistakes are made?

[34:43] Anush Elangovan:
It's a very, very good question. This is the part where where we want to separate out just agentic prompting and the craft of coding or engineering something with the assist of an agentic layer. So what I mean by that is, I could go in, if I didn't have any priors, I would just go in and say, implement the HIP remote protocol, right? And it may ask me some things where I'm like, I don't know. I just want to run my Roku maps on my Mac. That's my high level, right? And then it's going to say something, something, here's an example. And if you're not dialed into what those are, you could diverge so much that you may burn tokens in the evaluation. And it may be so broad a search space that you may not converge on a solution that you're looking for, right? versus if I did it in a way that I could tap into the experience of like, oh, I've spent time building this, looking at this, and I say, hey, scope it narrowly. All I want is a network wire transport protocol and a shim layer that shows that the HIP layer works on the MAC side. And let's start with tests to make sure, you know, A, B, C work, and then there is a few other things. For example, I was doing something with XNAC, which is like demand paging on GPUs. I was like, hey, just write, Claude was like, I'm going to go implement the feature and do everything, right? I was like, hold on, hold on, hold on. I want you to sit down and write, I literally said, write 100 tests that test various combinations of demand paging where there's like corner cases, this case, that case, this case. It came up with literally 100 cases. Some of them are repetitive. It changed some parameters. It didn't really change the test as much, but there were 100 tests. When I looked at the test, I was like, okay, if this thing passes, It's good, right? So you should look for, I think one of my friends, he put it right,

[37:01] Anush Elangovan:
AI agents are sneaky,

[37:07] Anush Elangovan:
and they're dumb, or, you know, or they're sneaky and dumb, right? They're like, they try to be, you know, they try to just reward hack their way towards that test. So all you have to think of is like, they cannot reward hack that, right? Like you're putting up gates that they have to pass through. It's like a, you know, it's like a marathon where every mile you're like, okay, you better cross that mile, you better cross that mile and that mile, and then come back to the starting point. But by the time if you went through all those miles, you roughly hit 26 miles anyway, right? So that's one way to look at it as to how you deal with agents and their shenanigans.

[37:52] Conor Bronsdon:
Are you pulling principles from Gascown and other approaches where you have review and mayoral agents that are managing all these other sessions? I mean, obviously you're leveraging subagents significantly, but as I think anyone who has used a lot of subagents in cloud code will rapidly find out is sometimes subagents do really dumb things.

[38:12] Anush Elangovan:
Like

[38:17] Anush Elangovan:
I mentioned, there's another strata of AI coding gurus that I look up to and I see their setup and I was like, I don't know what you did, but just give me that setup. It looks cool on my three monitors. But, so I think I would not leave, I wouldn't go that far to just be like, okay, this agent is, you know, going off in the weeds and doing something, right? Like my agents, for example, the one that looks at Spur, it just monitors for issues filed, fixes it. I know exactly what that agent is doing. It's gonna wake up at 5 a.m., it's gonna check it. If it is, it'll figure it out and go back to sleep. That's its job, right? That's like, so I'm not gonna maintain Spur, until it finds an issue it can't fix and then I'll go and be like okay I need to go take care of this thing. But I think the key part of it is I multi-thread a lot or it's multi-processed. And I don't necessarily just give free reign to like, go spawn your own thread pool, right? And what I mean by that is, you know, I would rather take, and this is another analogy I use, which is like, I don't really care about like, Teak, you know, slash fast 2x the tokens, because that means I'm in the loop, right? That means it's like, you know, CPU IPC count. do I need 5.2 gigahertz of this, you know, because that means I have to do something that's so latency sensitive that I'm in the loop and I have to go type something, right? My general use is, you know, kind of like what the strengths of like a thread ripper, right? It's like, it's got 96 cores of things that are just going to chug along. And my job is to make sure, okay, there's one multi-threaded server that's running with like 10 things, it'll go do this, five will do this. I'm not trying to orchestrate the whole world. I

[40:11] Anush Elangovan:
view it like my personal workstation, which has 96 cores. And I don't use all of it all the time, but I have two or three things that I multi-thread, but it's contained. It's not like, hey, here's a computer. You can do instructions. Just go and just spawn and fork whatever you want and make some progress. I think it'll just be like, too much noise and less signal to extract. So I kind of constrain it to focused efforts that I care about.

[40:45] Conor Bronsdon:
You've spent a lot of your career betting on open source, from what you're doing at non.ai to open sourcing Rockham, to Shark, to, you know, what you're doing today. As you think about this new era where everyone is coding with agents first, agents are driving massive new throughput, speed is the moat, as you put it. What does this all mean for AMD and AMD's software stack?

[41:13] Anush Elangovan:
So the first thing, I mean, open source, since day one of my career, in the early 2000, 2001 timeframe too, I

[41:25] Anush Elangovan: [OVERLAP]
was a big proponent of Gen 2 Linux, and I still have a Gen 2 laptop on my framework laptop here. So open source is key to innovation, in my opinion. Open source, open models, open ecosystems, which comes to why AMD has been a real wavelength match for me because that's how I think AMD is set up, right? We want to innovate, but we also want everyone else to innovate so that you're not limited by our abilities to innovate, right? We want everyone to be able to move as fast as possible. AMD's software stack is fully open source, right? In fact, there was one little blob that was like 50 KB that wasn't. Finally, I got that one open source, right? So people are happy with that. that just allows like immense ability to innovate. Like I literally have a Sphix Halo with the entire Rokken codebase checked out. And I could make one change and I could say, go make this change across the entire stack. And if you wanted, that could be part of what tomorrow's OrCam releases, right? Which is amazing. And it's also amazing because you could do it. Like Connor could be the one that checks out on your six Halo and says, fix my hip plus LT something, something for my new workload that came in. we don't have to be in the loop, right? Which is amazing. And now with the speed of like, you know, generating this code. you just point it to what you have to do or what is bothering you, right? And, you know, in the next few weeks, months, we're gonna increasingly expose a genetic, you know, kernel layer and all that, right? So that we don't have to be in the loop. If you find an issue, you have the entire source code, entire source tree. You could even patch that issue. Your agent could patch that issue. and you could move forward and you could submit the PR, right? Which is just like mind-blowing because now you're just saying, you know, sure, there's an issue, there's an issue, file a ticket, do this, do that. And then it's like Excel, SLA, but like, you can do all of that. But also your agents like, you know what? I think I know where the issue is. I'm just going to fix it. And I'm going to patch your, you know, little library that does this. And here's a workaround and you're unblocked. As long as it works for you, you're okay, right? Of course, we'll review it and we'll test it and formulate it, et cetera. So, I don't think there is any software mode left. I do, and that's a very broad statement, of course.

[43:56] Conor Bronsdon: [OVERLAP]
I mean, we saw it this morning, right? Like people have been loving Claude code as a harness. And again, we're dating ourselves in time. So this won't come out for a couple of weeks, but you know, the code leaks and within hours, people have multiple forks of it in different languages that have been completely rewritten and seem to be working. I think you're right.

[44:14] Anush Elangovan:
Yeah, it is just, I mean,

[44:19] Anush Elangovan:
the way I'd summarize it is, as a software engineer, you bring the skills and the first principles thinking,

[44:29] Anush Elangovan:
you should get unlimited tokens. And then the rest is just how you spend your tokens, right? It's just like, you know, the product will be made. It's just, you're going to consume the tokens, apply your critical thinking, and there's a product. assumptions of like how big a team needs to do this and all of that is going to be rewritten because you yourself have so much agency to execute at a scale that we couldn't, right? Going back to the excavator, now you could be moving 10 tons of thing a day versus, you know, you could be shoveling with Vim and, you know, VS Code, right? If you are not agentic. So, It's just a transition that we have to embrace and we have to help everyone come along.

[45:24] Conor Bronsdon:
So how does this change AMD's philosophy towards things like your developer program and how you're training enabling developers, uh, or things like the community contributions? Like I, I saw in the recent Rockham release, uh, like the support for Strix Halo got was a community contribution. It looked like, um, client platforms on windows, there was a community contribution. How are you, you continuing to lean into how to keep enabling that?

[45:49] Anush Elangovan:
Yeah, so

[45:52] Anush Elangovan:
we uniquely had that opportunity that we're fully open source, so anyone can contribute. But the code base obviously is large, it's different layers, it's hard to understand. So contributions were kind of manually done and limited in terms of how much impact it could have. But now you could, we literally see commits that are, you know, far reaching impact, but just, you know, it's, initially it was like, there's a little bit of like, oh, AI slop, AI slop. We just had to tighten the knobs a little bit to say, okay, tests have to pass, AI has to be green before even we, anything to do to look at the code, right? So, we expect that to accelerate AMD to a point where it is like, you know, suddenly we have jet fuel and everyone can fly. It's like, let's go and let's see what we have to enable, to focus on the outcomes that you need. I think AMD provides a very solid hardware platform and now software is just tokens.

[47:09] Conor Bronsdon:
Software is just tokens. That is a really good, I think, closing note for us. And maybe we'll become the Titleist episode. So we'll see where we get to with the edit. But Anoush, it's been fantastic catching up with you. I really appreciate you sharing your thoughts on the future of software, how you're approaching it today, and what it all means for AMD. For folks listening who want to check out AMD's ecosystem or want to learn more about your approach, where should they start?

[47:38] Anush Elangovan:
definitely like if we amd.com for our hardware portfolio right and github.com slash rockum and the best place in there is like the rock which is you know it's just git clone build get an amd device and and just start tinkering you'll you'll be amazed as to You just set up cloud code and just say go get the rock project and build it and run it on your Strix Halo. That's all you need to tell cloud code. And it's going to go do everything. Of course, it takes some time to compile a compiler and all that good stuff. But after all of that, you own every bit of that, which means you can make changes across the stack. And we are here to help. I think AMD had a growing up and a journey to get to where we are in terms of like our software but I think we are there and and you know it's great to see that software is just tokens now.

[48:39] Conor Bronsdon:
Absolutely. I think we are very close to a future of heterogeneous compute where instead of folks being, I mean, locked into NVIDIA's tech stack, as many folks are, uh, you are enabled to leverage compute across devices. Obviously something, you know, we're very excited to be partnering on the modular front with AMD

[49:00] Anush Elangovan: [OVERLAP]
Yes.

[49:00] Conor Bronsdon: [OVERLAP]
around much of this. And, uh, though this, this episode is not about that, so I'm not going to dive into it too deep. Uh, but I, I

[49:07] Anush Elangovan: [OVERLAP]
We

[49:07] Conor Bronsdon: [OVERLAP]
think.

[49:07] Anush Elangovan: [OVERLAP]
love modular, by the way.

[49:09] Conor Bronsdon: [OVERLAP]
We'll have a lot of fun, I think, later

[49:11] Anush Elangovan: [OVERLAP]
Yes.

[49:11] Conor Bronsdon: [OVERLAP]
this year. It's going to be a good time. Anush and I can't tease too much of that, though. So I think there's very exciting things ahead of us. And the fact that AMD has leaned so hard into open source just provides this incredible acceleration opportunity to, you know, have this speed as a mode, as you've talked about, and as you've been, I think, hammering home for years now. And the testing opportunities that provides, the acceleration opportunities that provides, the simple awareness for developers and the trust that it builds. We are very much seeing a future where you can leverage other devices, you can leverage them concurrently. I'm super excited to get some local coding models running on AMD hardware here in the next few weeks so I can try that out and compare and run that in parallel. So some of my cloud coded harness work. So it is a very exciting future and can't wait to see how you and the rest of the team over at AMD continue to develop a new. Thank you so much for coming on the show. It's been great chatting with you.

[50:12] Anush Elangovan:
Thank you, Connor. It's a pleasure.

[50:13] Conor Bronsdon:
And for folks who are listening and want to dive deeper into all the topics that Anusha is discussing and everything happening with the cloud code copying, with AMD's acceleration, with everything else happening in software development and AI today, Make sure to check out the Chain of Thought substack at newsletter.chainofthought.show. Would love to hear your thoughts about our next essays and keep giving us your feedback. We always appreciate it when you give us your rating and review or drop a comment. And thank you so much for listening. We'll see you next week.