Oxide and Friends

Raja Koduri joined Bryan and Adam to answer a question sent in from a listener: what's are the differences between a CPU, GPU, FPGA, and ASIC? And after a walk through history of hardware, software, their intersection and relevant companies, we ... almost answered it!

In addition to Bryan Cantrill and Adam Leventhal, our special guest was Raja Koduri.

Some of the topics we hit on, in the order that we hit them:

If we got something wrong or missed something, please file a PR! Our next show will likely be on Monday at 5p Pacific Time on our Discord server; stay tuned to our Mastodon feeds for details, or subscribe to this calendar. We'd love to have you join us, as we always love to hear from new speakers!

Creators and Guests

Host

Adam Leventhal

Host

Bryan Cantrill

What is Oxide and Friends?

Oxide hosts a weekly Discord show where we discuss a wide range of topics: computer history, startups, Oxide hardware bringup, and other topics du jour. These are the recordings in podcast form.
Join us live (usually Mondays at 5pm PT) https://discord.gg/gcQxNHAKCB
Subscribe to our calendar: https://calendar.google.com/calendar/ical/c_318925f4185aa71c4524d0d6127f31058c9e21f29f017d48a0fca6f564969cd0%40group.calendar.google.com/public/basic.ics

Bryan Cantrill: 00:00

Okay. Raja's here. It's great. And so, Adam, let me give a little intro here.

Raja Koduri: 00:05

I don't Can you hear me okay?

Bryan Cantrill: 00:06

We can hear you, Raja. Oh,

Raja Koduri: 00:08

okay. Okay. Sorry.

Bryan Cantrill: 00:10

Yeah. No worries. Well, so, Raja, first of all, you should know that that, audio problems are a part of this podcast. And, those of you who's like to send us complaints about the audio, well

Adam Leventhal: 00:23

Yeah. We know. We know. We know.

Bryan Cantrill: 00:25

We also we also, like, you know what? Like, hey, the the you don't complain about the ads on the podcast. Don't hear me complain about that. So it's like, you know, sometimes it's like look. And also, you know,

Adam Leventhal: 00:35

we do we do AI generated transcripts too. So, you know, if you don't like the audio.

Bryan Cantrill: 00:39

The the go through those. Yes. Yeah.

Adam Leventhal: 00:41

So Go read it up.

Bryan Cantrill: 00:42

And also, at least be way worse.

Bryan Cantrill: 00:44

Like, go to that showstopper episode and and go to Twitter spaces, any Twitter space. And Yeah. Get, you know, go through the flying blind episode with a clarinet solo in the back. I mean, it's like, we can just never come up. There's so many.

Bryan Cantrill: 00:55

We have come a long way.

Adam Leventhal: 00:57

Go to episode 1, which sounded like we were holding 2 tin cans with a string between

Bryan Cantrill: 01:03

Totally. Right. Right. So we've come so, Rasha, you may be thinking to yourself, god, this podcast can't get any worse. In fact, we know it can get worse because it started much, much worse.

Bryan Cantrill: 01:12

So I would like to say that

Bryan Cantrill: 01:13

in terms

Bryan Cantrill: 01:13

of in terms of, so we, but we got some, some feedback. So so the the feedback that we often get on the podcast well, first of all, I would say most people would say, I love the podcast. Even the baseball episode. I've heard that a couple of times. You don't have to say that.

Bryan Cantrill: 01:27

You just Yeah. Qualify that.

Adam Leventhal: 01:29

Just leave it there. Right?

Bryan Cantrill: 01:30

Just leave it there. The but a lot of positive feedback on the podcast. Then we get the the audio problems set, which is, like, I get it. I understand. You're upset about the audio, and fine.

Bryan Cantrill: 01:40

Sometimes those complaints are well founded. Sometimes those complaints are a lot less well founded. I we had a yeah. I'm sure you read all do you read the YouTube comments? Adam, you must read all all

Bryan Cantrill: 01:48

YouTube.

Adam Leventhal: 01:48

Yeah. Yeah. I actually Right. I they email me with every comment.

Raja Koduri: 01:51

I I

Bryan Cantrill: 01:52

feel like We we did a was just like the way apparently, like, my de esser is not working properly on this setup, and, like, the way I pronounce s is just, like, causes him to have a a, you know, a nervous breakdown. I'm so sorry. That's just not gonna change. So anyway, the, so we and then but we we've started to get a little bit of new feedback that I that I kinda love. We've taken requests.

Bryan Cantrill: 02:13

So people asking us for subjects that they would like us to tackle on the podcast, which is not brand new. We this has happened historically, but I I really like this, I have to say. If people want us to to there are certain topics you want us to take on, you, you should let us know. And because we do very poor advanced planning, you may find that our latency from your suggestion to an outside and friends may be very, very low. We made to have, exceptional latency.

Bryan Cantrill: 02:41

So as we do in in this case, so, this is a a listener of the podcast who is, an industry analyst. So someone who who follows the podcast, but is not necessarily a software engineer, but is very, but is is very technically proximate, and having to deal with a lot of these things. And, I was sent a in particular, and I'm gonna drop this link into the chat, a Forbes article that that this industry and also come across about vector databases, which don't worry. We're not gonna talk about vector databases. This is not a pop quiz, Adam.

Bryan Cantrill: 03:17

Not that you I sorry. That that's not much purported. I don't, like like, I don't know. There's no

Adam Leventhal: 03:22

vector databases. I'm I'm not looking on the Wikipedia article right now.

Bryan Cantrill: 03:26

Right. The, but this Forbes article talked about kinda it's trying to talk about cap trade offs for vector databases. And in particular, there is there's a sentence in here that the analyst was really lashing on to, which is, at this point in technical evolution, it is impossible for vector databases to guarantee all three of cost effectiveness, accuracy, and performance, at the same time. So consider that vector databases are compute heavy, and can run on different types of hardware. For processing, CPUs are slowest, GPUs are faster, and FPGAs and ASICs are fastest.

Bryan Cantrill: 04:05

And Raja, I'm sorry if you just threw up in your mouth a little bit. I'm just reading the text here. So I just wanna be just Not an endorsement. This is not an endorsement of this. And so this analyst asks the very reasonable question of like, what are these things?

Bryan Cantrill: 04:20

What are like, I have basically not had to deal with I've been dealing with software folks for my career, in terms of the industry, like, could you please do an oxide and friends that explains the difference between CPUs and GPUs and FPGAs and ASICs? That is a really good idea. And I'm like, who I who's the who's the right person for that? Because you need to get someone who really understands these things really well, can speak to a bunch of these different kinds of things. And then, Rasha, you had this great tweet over the weekend, really interesting tweet over the weekend.

Bryan Cantrill: 04:51

And I'm, like, Raj is the right person for this. So Rasha Khaduri, has spent, an entire career in silicon, and has been in many different interesting places, and really thinks about all these things holistically. So, Raja, I hope you don't I I I think that you are the perfect person to, to answer this question or to take us kinda through this. And if you don't mind, I would kinda like to start with if I I'd I'm gonna channel some on the metal here a little bit. On the metal, Raja was our podcast that we started when we first started the company that is, the kind of doing these long form interviews with folks.

Bryan Cantrill: 05:29

And I wanna channel that a little bit, because if you don't mind, I would like to get into kind of how you got your start, in Silicon and, in chip design, because it was on the the graphics side of things. Right? I mean, you you started working for s 3, and, you know, I often have to explain that Sun Microsystems was a computer company. I don't know how often s 3 comes up in conversation for you, but you must be like, no. Not the object store.

Bryan Cantrill: 05:59

No. There's a company.

Raja Koduri: 06:02

Yes. Yeah. You know, thanks, Brian, for, inviting me to this, this chat. Yeah. You know, I, I became a silicon person by accident, you know, frankly.

Raja Koduri: 06:16

So, you know, I got smitten by computer graphics when I was in grad school back in India. Okay? Because I saw that, you know, hey. With math, you can draw pictures. Right?

Raja Koduri: 06:30

And this was so fascinating. Right? It was, and, but there weren't any computer job any computer graphics jobs. Actually, there weren't any much computer jobs back there in India, but whatever they were, they're in, you know, kind of databases and COBOL programming, you know, if anybody knows what COBOL is.

Bryan Cantrill: 06:52

And, you know, now that's even dating, you know, a

Raja Koduri: 06:57

few decades back then s 3. So so, you know, I I, you know, did various kind of miscellaneous stuff. And, the first time I had an opportunity to come to Silicon Valley to work for a compute computer company called Tandem Computers. And Tandem you were Tandem? I didn't know that.

Raja Koduri: 07:17

Yeah. Yeah. Yeah. Yeah. So I worked on their parallel transaction database system, and, you know, that's where I got, you know, first encounter with networking and, you know, parallel, you know, things.

Raja Koduri: 07:31

Right? How to, you know, code things to be a synchronous parallel and all. But my kind of, you know, h has always been computer graphics. And, you know, back in the early, nineties, there were 16, you know, graphics hardware startups in the Bay Area. Okay?

Raja Koduri: 07:49

Wow. And, it was like AI back then. Okay? Like, you know, everybody was building a 3 d chip, 2 d acceleration chip, 3 d acceleration chip. And so applied for a job, you know, to several companies, but because I had no background, right, other than some academic stuff, I got into, finally, I got into a game developer support engineer role.

Raja Koduri: 08:12

Right? A junior role. So I kinda, you know, went backwards in my career that was, you know, blooming on the networking side to take an entry level job, you know, and and it was fun because it's like, hey. You know, that these games were super cool back then, and you get to work on, you know, helping game developers optimize the games for your, you know, hardware. Right?

Raja Koduri: 08:35

So that kind of was my first job. And, yeah. So then you start working with really this, you know, the the game programmers back then. They were like geniuses. Right?

Raja Koduri: 08:46

You know, because the game programmers This is

Bryan Cantrill: 08:48

this is the John Carmack era of Yeah.

Raja Koduri: 08:50

Exactly. Yeah. John John Carmack.

Bryan Cantrill: 08:52

Right. Yeah.

Raja Koduri: 08:53

Yeah. So I met John Carmack and, you know, learned a great deal from him. But, you know, I was bringing back to the hardware team. Right? Hey.

Raja Koduri: 09:01

We need to do this. We need to do this. You need to fix this. You need to do this. You need to do you know?

Raja Koduri: 09:06

So I I'm the I was the translation layer between John Carmack and my, silicon designers. And, you know, that was kind of, you know, my my my my my job. And one day, the guy, you know, who's running the architecture team said, hey. You seem to be pretty good at. Why don't you join us?

Raja Koduri: 09:24

Rather than, you know, the guy who complains and,

Bryan Cantrill: 09:27

you know, constantly, you know, you know, and demeans us.

Raja Koduri: 09:31

And, that's the slippery slope. Right? You know, I joined the team, and I said, oh, cool. Like, I get to build this stuff. And then, you know, here I am almost, 20, you know, whatever, 6 years later.

Raja Koduri: 09:44

Right? They call me Silicon guy, but, no, I was, you know, a game a game developer support engineer. That's how I started before, you know, getting set. Yeah. And this stuff.

Raja Koduri: 09:54

So that's kind of my my my journey into I I came from, you know, the software side of things. Even though, you know, my, undergrad is is on electrical engineering, electronics, communications, and all, I, you know, I kinda, you know, wandered off into networking software, in computer graphics.

Bryan Cantrill: 10:14

Because that's where the jobs were and then kinda bided your time. And yet that's that is wild. And so the this is a really interesting kind of Eric. Because you said it was explosive. There were a bunch of different I I mean, I think it's a really interesting parallel of kind of the AI boom.

Bryan Cantrill: 10:28

You've got a bunch of different companies out there. This is like the era of 3 d FX. Right? I

Raja Koduri: 10:33

mean Yes. Yes.

Bryan Cantrill: 10:34

The Yeah. And Adam, have you watched that you watched that computer history museum panel on 3 effects, which Yeah. Ariane pointed us to, which is outstanding about the 3 effects folks. Russia, have you seen this? I mean, I'm sure you would break I mean, you would have very similar memories of kind of that era and the the Yeah.

Raja Koduri: 10:54

I'll I'll you know, sometimes I I, you know, I I see those history, things. And, you know, first off, I, you know, I don't like them because it makes me feel like I'm all, you know, like I'm in the history channel, like, you know, and this stuff because I grew up with all of these guys and and this stuff. And and to me, all of this seemed like yesterday. Right? You know, that was that a long time ago.

Raja Koduri: 11:16

So yeah. And and as you know, Brian, when you live through it and, you know, when people summarize history and all, like, you know, it it gets, simplified or, you know, some of the things are not you know, it's it's kind of like, you know, it's a it feel like chat GPTMs or sometimes. Right? It's like, yeah. Kinda right, but not quite.

Raja Koduri: 11:34

Right? You know, the stuff. Right? But, anyway, so I I try and avoid seeing the recent history. Right?

Raja Koduri: 11:40

Yeah. Yeah.

Bryan Cantrill: 11:41

And it's it is so funny because, like, when you've lived it, and then people kind of turning to you as an authority be like, okay. You lived it. And I'm like, well, actually, there are a couple of different narratives that are out there, and I'm not even sure. Like, I was there, and I'm not sure which one is the right one. You know?

Bryan Cantrill: 11:54

I did they've got, and there's so many you but that yeah. That is that's why by the way, we just Adam and I just embrace the fact that we're living fossils, by the way.

Raja Koduri: 12:02

We're not

Bryan Cantrill: 12:03

we we we're not we're not excited. Yeah. No. No.

Raja Koduri: 12:06

But but what what's common in all the themes of history, which I think we'll kinda get into, right, is this hardware software interplay, right, is the, has always been, you know, kinda the the catalyst of change. Right? Like, you know, whoever got that right or whoever, were on the right side of that equation. Right? By by luck or, you know, or or or by planning, right, ended up surviving.

Raja Koduri: 12:36

And, no matter how great you were, like, how wonderful and, you know, 3 d effects was just, like, you know, just blew people a bit. And and if you just look at how long the duration was, it was less than 24 months. K? It's like, you know, back then, it's like 1 year of dominance, 2 years of dominance seemed like a long period. Right?

Bryan Cantrill: 12:56

And things

Raja Koduri: 12:57

like yeah. Yeah. Yeah. Leadership changed very fast. Right?

Raja Koduri: 13:00

And and especially because, you know, when the Moore's Law was active and the silicon cycles were faster. Right? Like, you know, 12 months to a chip. Right? 12 to 18 months, we could do a completely new thing.

Raja Koduri: 13:17

Now it is 3 to 4 years minimum. Okay? Minimum. It's a brand new architecture. It's 4 to 5 years.

Raja Koduri: 13:24

We did brand new architectures every year, throwing away everything and redoing it right back then. So Because the process

Bryan Cantrill: 13:31

changes are so radical. Because this is like Dennard scaling is alive and well.

Raja Koduri: 13:36

Yeah. It was there. Right? So and and if you just miss one cycle, you find yourself, like, you know, 2 x off. Right?

Raja Koduri: 13:42

Like, it's just suddenly. Right? You know, you just misexecuted just one time. So the winners and the losers, you know, the the those things shifted much more, you know, much faster, which was exciting, right, in some ways. Right?

Raja Koduri: 13:57

Okay. That's right. Yeah. Yeah. Yeah.

Raja Koduri: 13:59

Yeah. You see, it wasn't like, oh, yeah. Well, now, right now in the valley, it just shocks me that I mean, you know, that, people can't even think of, like, you know, that, you know, who will beat NVIDIA? Right? Oh, that's like, oh my god.

Raja Koduri: 14:13

Don't don't say that out loud. You know, don't even say what did you say? Deet, Nvidia? What's

Bryan Cantrill: 14:20

oh my god. It's just right in

Raja Koduri: 14:22

the stuff. Right? And this thing, you know, the the evil, lord Sauron will come and, they'll knock and knock on your door. Right? And then this stuff.

Raja Koduri: 14:30

So don't even say that. Even the big $1,000,000,000 VCs don't want to talk about it. Right? Isn't it a holy christian? But that wasn't the valley, right, in the in the nineties.

Raja Koduri: 14:40

It was like everybody was disruptable.

Bryan Cantrill: 14:43

Right. And so and you and and you talked about hardware software interface. Because I actually, honestly, just don't know that much about I mean, very much on the CPU side of that. So I I if the GPUs of that era, I mean, are are they I know that they're getting, like, richer and richer in terms of their own computational power. And I at Sun at the time famously, you ever deal with Leo at at was your graphics card at Sun, Adam?

Bryan Cantrill: 15:09

This is a

Adam Leventhal: 15:10

No.

Bryan Cantrill: 15:11

Absolutely bonkers graphics card that had its own MMU, and and and kind of they had to kind of fork the operating system in a way that was not very good. But they and then that they had a lot of live in the dumps. But the what is the abstraction that that folks are kinda coming to during that era? This is I mean, because this is almost like I I mean, you've got kinda SGI and and Sun, I guess, is kind of high end graphics, and you're kinda coming up to the game side. And, I mean, is it is it what are the abstractions of that era?

Bryan Cantrill: 15:42

Era?

Raja Koduri: 15:42

So everyone practically had their own proprietary abstraction, Brian. And, and then, you know, SCI, proposed to open GL. Right? But it hasn't quite caught on, and it was kind of, you know, it got a a bit of a branding that it was only for, you know, kind of non real time graphics or workstation stuff at all. Right?

Raja Koduri: 16:06

Like, kind of this high end visualization stuff, but not for, you know, PC gaming. Right? So, and these APIs, and there were some APIs that actually did CPU rendering, not GPU rendering. Right? So, 3 3 d effects, came up with their own API called Glide.

Raja Koduri: 16:27

Right? So Glide was actually, like, you know, like the CUDA of the generation. In many ways, it's proprietary, but it was, like, you know, tiny little API on which, like, you know, the, the first, I mean, I still see the the the the first 3 d effects demos, in in my dreams, right, in the valley of raw. And, you know, they would just blow blew people away. Right?

Raja Koduri: 16:51

And then, John Carmack did the the GL Quake on it, on top of Glide, Apple. And, so Glide was, a proprietary API, and, you know, people who are trying to build bring Open GL to PC, but but the history was, you know, changed with Microsoft. Right? Microsoft was the big kahuna back then, you know, the dominant PC operating system.

Bryan Cantrill: 17:17

Right.

Raja Koduri: 17:17

So Microsoft formed, you know, this group to build direct text. Right? So Direct3D was the first three d API that tried to unify all of us. Right? So they brought the 3 d effects, rendition, s 3, 3d Labs, NVIDIA, API, you know, Seng Labs, Diamond Multimedia.

Raja Koduri: 17:44

So I'm I'm you know, there are, 10 other names together. Right? I said, we

Bryan Cantrill: 17:49

Wow.

Raja Koduri: 17:49

Yeah. We're going to have a, you know, one common API called direct text. You know? And, and it was kind of messy in the beginning because all our hardware were, you know, not all, in equal capabilities. Right?

Raja Koduri: 18:02

So they introduced these things called caps bit. Okay. So you declare your capabilities, what your GPU is capable of in this caps bit. And, here was a caps bit nightmare for game developers. Right?

Raja Koduri: 18:16

You know, which caps? Right? You know, they were it wasn't it wasn't very if I go back and look at all the, you know, code we made people write and all, it was messy, messy, messy. And, but Microsoft was, like, you know, a key dominant factor. Right?

Raja Koduri: 18:31

And it was, API standard that will be available on Windows. And, 3 d effects was, you know, brimming in success with their proprietary API, so they dragged their feet to get on to the x, Direct X. And, Microsoft iterated the Direct X every year, actually twice a year some you know, sometimes. And by the time they got the 5th version, it was pretty good. And what we were doing as hardware guys, we were actually building the, hardware for the API.

Raja Koduri: 19:01

Actually, if you just pass on that statement, that is one of the big differences between, you know, a CPU and a GPU guide. Right? You know, I I tell people, like, you know, the GPU people think in terms of APIs. CPU people think in terms of ISA. Right?

Raja Koduri: 19:17

What's my instructions then? Architecture. Right? Yeah. Yeah.

Raja Koduri: 19:20

You know, the and the GPU guys thinks think in terms of what is the API. And that API contract became the direct text. And, NVIDIA, and s 3 and ATI were the first to build native API compatible accelerators. Right? And, you know, that was actually the reason why 3 d effects lost.

Raja Koduri: 19:43

Right? Because the, they weren't very compatible with the, you know, Microsoft DirectX.

Bryan Cantrill: 19:50

Right. Exactly.

Raja Koduri: 19:51

And, yeah. And NVIDIA and ATI eventually kind of in s 3, you know, wandered off into you know, just like, you know, when shiny objects happen, when when hype cycles happen, you know, people go after hype cycles. S 3 went after the hype cycle called web without kind of, you know, being core focused on, you know, what what what they were doing. Whereas, you know, NVIDIA and the ATI stayed focused on doing graphics and, you know, so kind of, you know, history fell by the wayside.

Bryan Cantrill: 20:23

Yeah. So companies that are contemplating a pivot to AI, treat this as a concept.

Raja Koduri: 20:28

Exactly. Yeah. Yeah. Yeah. Yeah.

Raja Koduri: 20:29

High high high yeah. High forces basics. Right? You know, that, that's that's kind of, you know, one lesson learned.

Bryan Cantrill: 20:36

Yeah. Interesting. So s 3 decides it's a web company, it wanders off and and flies into the background.

Raja Koduri: 20:41

Because I I remember the the interim CEO we got said, see, at that time, just I think, you know, Nvidia got the Xbox, design and their, you know, their IPO and their stock popped and, you know, it was it was pretty good. But relative to all the Internet companies, their PE ratios and the stock was low. Right? So he gave this famous stock. He said, hey.

Raja Koduri: 21:03

If you guys wildly succeed, you know, the best you'll be is at Nvidia, which was, like, some $20 stock or something. Right? And, you know, and we were at, like, 53 or 4. And then he said, but if you go to this Internet stuff right? You know, look at our neighbor.

Raja Koduri: 21:18

We had this company called Exodus. Actually, you guys coming from the, you know, server and storage world might remember this company called Exodus. That was the first, kind of, you know, AWS ish, thing. Yeah. Yeah.

Raja Koduri: 21:32

And they were our neighbors, and they were, like, you know, $120 stock. Okay? And they, you know, of course, they crashed and burned in in a couple of years after. So our CEO says, you know, if you, you know, if you look at Exodus, that's we should be Exodus, not Nvidia.

Bryan Cantrill: 21:50

Yeah. And this is Exodus Communications, I believe. Right? This is the Yeah. The and yeah.

Bryan Cantrill: 21:55

We actually I actually had a bunch of x Exodus people that I worked with at the after Exodus. So do do not recommend. Yeah. Yeah. Yep.

Bryan Cantrill: 22:05

That is well, that God, that is amazing. Isn't it Rasha that you think like the best you're gonna do is Nvidia? By the way, if you could be like, hey, I just traveled back from the year 2024. I've got a super funny story for you about Nvidia. And so the, and then just god, that is so telling.

Bryan Cantrill: 22:25

And and we're trying to to to catch. And, obviously, like, an s 3 is not going to be I mean, you are a you you're a graphics hardware company in your DNA. You're not gonna become a web. Like, it's it's meaningless for you to pivot to you're not

Raja Koduri: 22:39

gonna Yeah. He sold you know, they sold, the graphics IP to a company called Via. You remember Via used to make, chipsets? They were the most you know, the v is a Taiwanese company Via. And, you know yeah.

Raja Koduri: 22:53

And said the rest of it will, you know, build, you know, interesting, you know, web devices and all that stuff. And, yeah. Anyway, you know, you can write, chapters and chapters on that.

Bryan Cantrill: 23:06

Right. So at some point in here, you're like, I think I'm leaving this company, and you went to Yeah. ATI happened shortly thereafter. Yeah. So and so ATI, which I mean, it should be said, the 2 survivors of that era are ATI at AMD and NVIDIA, really.

Bryan Cantrill: 23:23

Right? I mean, are there any other survivors from No.

Raja Koduri: 23:26

AMD. It was just really ATI and NVIDIA because, AMD didn't have graphics. Right? AMD did not Yeah. Right.

Raja Koduri: 23:33

Right. Just to see and Intel had a graphics team. Right? So, you know, I I gave this talk at, Stanford, in January and also in, you know, recent. I was in China and I gave the similar target, you know, Changge Jiao Tech there, you know, what's considered the mighty of China.

Raja Koduri: 23:51

I said, you know, every 5 years of so, Brian, in the graphics world, we we declared, like, you know, there was a big board meeting saying, this is it. This is the end of my screen GPU. You know, we should go do something else. So, you know, one of those, very early on is when Intel said it is going to integrate graphics into the chipset. Okay?

Raja Koduri: 24:13

And, oh my god, in the PC by default, like, you know, you have to buy into the chipset and into a CPU. So why would anybody buy a discrete GPU on top? Right? You know, it's dead. Right?

Raja Koduri: 24:25

So the the the declared depth of discrete GPU in, like, in the 97, 98, and everybody was looking for kind of, you know, alternatives. Right? And then really also got back into chipset business. If you remember, right, you know, they did the enforce and, you know, it's like they did the the best, AMD chipset for a while was, from NVIDIA.

Bryan Cantrill: 24:49

Right. And so and it's so this is actually an interest because this is gonna be a theme that we're gonna see as we kind of advance in time, where you have a general purpose CPU, general purpose thing. Say, hey, I can actually that special purpose functionality, I can actually deliver that as part of this general purpose vehicle. And

Raja Koduri: 25:06

Exactly.

Bryan Cantrill: 25:07

Yeah. Yeah. Yep. And I mean, this is what I think what you called generality in in Yeah. Yeah.

Bryan Cantrill: 25:13

And kind of that the power of that generality. And I mean, I remember, I mean, I, during that era, my observation myself was that general purpose compute Trump's special purpose compute every time, which, of course, isn't exactly it doesn't it's not true right now anyway because general purpose I I I I feel that that's true in the limit, and it's so

Raja Koduri: 25:34

much easier. You know, what's interesting, Brian, right, and they all it's always kind of easier connecting the dots when you look backwards. Right? You know, it's it's hard to connect dots forward, but you can connect the dots Right. Backwards.

Raja Koduri: 25:47

Yeah. So the way I would frame as is see, when you're in, when your problem space, right, you know, can scale, more and more with computation. Okay? You know, for example, graphics was one such thing because, you know, we kept, getting increase in the number of pixels on the screen. Right?

Raja Koduri: 26:09

You know, when I started, my graphics career, you know, games were shipping at 320 by 240. Okay? I saw the transition to 512 by 384 to 640 by 480 to 800 by 600, 1024 by 768. Right? 12 80 by 960.

Raja Koduri: 26:27

You know? And when we were doing the faster 1920 by 1080, it felt like we were in, you know, immersive reality. Right? You know, that's 1080p. Now we are at 4 k.

Raja Koduri: 26:37

Right? So the pixels kept growing up, and then the amount of computation on each pixel also kept, like, you know, exponentially growing up. Right? So the scale like, you know, if I if I can throw a bigger GPU at a at a at a game, you got better quality, better performance, better stock. Right?

Raja Koduri: 26:55

So that's a scalable problem. Whereas take an example of, image encoder or a decoder, like video encoder decoder. K? See, after I hit 30 frames per second or 60 frames per second or 1080p or 4 k, You giving me a bigger, larger encoder and decoder doesn't make any difference for me. Right.

Raja Koduri: 27:15

Right? It's completely waste. Right? It's, so, you know, so that's a contained problem. Right?

Raja Koduri: 27:22

You know, once you hit a particular quality level, right, at the, at the at the time interval level, you don't need more. So I think we have to divide problems into those into those categories. Right? So, you know, graphics and now parallel computation and now this AI computation. Right?

Raja Koduri: 27:40

You know, larger and larger, you know, neural nets. Right? And, and more parameters. Right? Seems to not end.

Raja Koduri: 27:48

If that stops tapering off, like, you know, say somebody says, you are not going to get any more fidelity beyond, you know, 1,000,000,000,000 parameters. Let's just

Bryan Cantrill: 27:57

say

Raja Koduri: 27:57

and I'm just showing the number, right, and the stuff. Like, it's just diminishing returns after that. It's applied to after that. There is nothing more. You know, then you could say that, oh, yeah.

Raja Koduri: 28:06

Okay. Right? You know, something that is fixed. Right? Non scalable, right, fixed is the right answer.

Raja Koduri: 28:14

Right? And, you know, we could, get, you know, 10 x efficiency, on building something that is fixed.

Bryan Cantrill: 28:22

Right. But it's still

Raja Koduri: 28:22

a problem. Yeah. Yeah.

Bryan Cantrill: 28:23

Yeah. Well, that that when you because I think it's a really interesting way to think about it. And I was thinking about, like, the in, like, the the decoders as, like, encoder, decoder is an example of that where once you do hit that point where it's like, no. No. There's no point in getting better absolute performance.

Bryan Cantrill: 28:39

Then you do get to drive to efficiency. Then these other kind of things like power and economics.

Raja Koduri: 28:43

Yeah. Exactly. Yeah. You can drive those in smaller and smaller. Now we have, like, amazing 4 k and 8 k, you know, encoders and decoders sitting on your, you know, phone.

Raja Koduri: 28:53

Right?

Bryan Cantrill: 28:53

On the phone. It's crazy.

Raja Koduri: 28:54

Yeah. Yeah. Right. It's crazy. Right?

Raja Koduri: 28:56

So yeah.

Bryan Cantrill: 28:57

At such low power, you know, I mean, like

Raja Koduri: 29:00

Yeah.

Bryan Cantrill: 29:00

I mean, every time I kinda complain to myself about, like, the battery life of my you know, I I a part of me just wants to, like, just punch myself in the mouth being, like, are you how are you possibly like, I wanna conjure my past self who can come up and be, like, are you Yeah. What? You're communicating because you're, like, your batteries died in the grocery store while you're having a real time video chat? Yeah. Are you, like Yeah.

Bryan Cantrill: 29:20

Like, oh, I'm scared. You're 4 k phone.

Adam Leventhal: 29:24

Right. Multi gigahertz processor with a gigabyte of memory. Really? I'm sorry.

Raja Koduri: 29:30

Yeah. You know, Brian, the first time I saw a, real time stream, a DVD quality video stream of a live event. Okay? I I literally cried. Right?

Raja Koduri: 29:42

You know, there was tears in my eyes saying that, oh my god. I mean, it's like, you know, because I was involved on the video pipeline too. Right? Like, at the start, you know, how difficult it was to just play video locally. Right?

Raja Koduri: 29:53

Like, you know, the first, you know, when we got DVD playing on a PC, right, without dropping a frame. Okay? Was like, you know, we were like, you know, celebrated like, oh my god. It was just a major achievement. And now you're doing that across Internet.

Raja Koduri: 30:08

Right? You know, both, like, you know, camera capturing, encoding, streaming, and decoding, and at at DVD quality. Right? I mean, you know, DVD was the big thing back then. Right?

Raja Koduri: 30:20

You know, they are not we didn't even hit 1080p back then. Right?

Bryan Cantrill: 30:23

Well, it is and and, like Yeah. And really, like, I mean yeah. But really, I mean, that's the kind of stuff that really like has meaningfully changed people's lives. I mean, I know it's like, yes, the internet has been is awful in plenty of in plenty of ways, but it is worth a moment of being like, that is so amazing. And you think about how amazing it is that, you know, a a grandparent that lives 1,000 of miles away can have a, you know, can can watch a recital that their grandchild is in.

Bryan Cantrill: 30:49

You know? And like, holy smokes. And it's a lot of technology at every, you know, you got the the at the edge in the Internet. And then of course, at the I may kind of both ends of this. And as you say, it's like there's a lot of silicon that

Raja Koduri: 31:02

is And then getting and then getting audio right on that. And we're talking at the beginning. Right? That's you know, you know, people tend to say, oh, yeah. Audio.

Raja Koduri: 31:11

It's like, you know, well, whatever little bit rate and this and that and all. But, you know, getting audio right, you know, is another challenge too. But, yeah, it's it's amazing that they all come together, and, but it's an example of, right, you know, kind of contained, you know, non scalable problem. Right, in the sense of, you know, just throwing blind more compute at it isn't going to make it any better. Right?

Raja Koduri: 31:37

It is other things that you have to do to, you know, deliver, audio and video while on the Internet. Right? I mean, compute is important. But, like, you know, this fixed function algorithms. Right?

Raja Koduri: 31:52

For you know, once we got standards, you know, h dot 264, 265, a v one, MPEG. You know, once you have a standard, I can then do a very efficient, you know, hardware implementation of that standard. Right?

Bryan Cantrill: 32:07

Right.

Raja Koduri: 32:07

There is and and that's one of the things I ask, like, you know, where, you know, the AI hardware companies and all. It it's like, you know, I am doing this great AI chip. Okay. Say, so what's your, programming model or what's the API? We don't have one, but we are, you know, flexible.

Raja Koduri: 32:27

We are programmable. Okay. And, you know, we can attach to whatever, right, you know, on the stuff. And I say, oh my god. Okay.

Raja Koduri: 32:35

You know, that's that's the problem. Right? You know, you don't actually have a, a a an interface to your architecture. And, I mean, this is the point I made at the, you know, the docker house talk on, Saturday. Right, I said, you know, just kinda, you know, switching, gates a bit, Brian.

Raja Koduri: 32:53

You know, I read the statement. I think it was in 2004 or 5 in a in a small, you know, handbook called data parallel, programming handbook, and it was a collection of papers by, you know, people eminent people from HPC. And, there was a statement that said in the history of parallel computation, there is no architecture that was successful whose programming model was different than its execution model. Okay. Again, repeating whose programming model and execution model.

Raja Koduri: 33:28

Okay. Then so execution model is how the hardware executes a stream of instructions, how it fetches data, its cache hierarchies, all of those details. Programming model is how a programmer expresses a sequence of computation you want the hardware to perform. Right? So and I didn't understand that statement.

Raja Koduri: 33:49

Right? It it felt, you know, it felt strange. It felt like very interesting, but I can't kinda couldn't quite, like, you know, put my finger onto it. And, you know, and again, coming from the graphics world and all, you know, the I wasn't that an expert on the history of HPC, Cray supercomputers, and, like, you know, all of those stuff that happened in that world. Right?

Raja Koduri: 34:12

We were coming from the PC side of things, so I didn't live the parallel computing world. But the reason I say that is, you know, the reason now kinda, you know, why why CUDA is so successful, k, is the CUDA programming model and then really a hardware execution model, how the rest comes to execute. It is one to one mapping.

Bryan Cantrill: 34:32

Right.

Raja Koduri: 34:33

You can read the CUDA code and you can kinda, you know, say, oh, yeah. This is what is going to happen inside that asset. Right? So when people try to kinda, you know, implement CUDA like languages on a different execution model, you know, they they they they made it work. See, making it work is one thing.

Raja Koduri: 34:54

Making it work fast is a completely different thing.

Bryan Cantrill: 34:57

Is it the way to think? Okay. So I Yeah. It's a super interesting statement. And I so I I I just said to to pull it out a little bit, because I kind of think that, the general purpose CPU running systems workloads or even x86, maybe a counterexample to that because the programming model and the execution model don't exactly match.

Bryan Cantrill: 35:21

And just to to play this out a little bit because, you know, we had when Dennard scaling ended Mhmm. It was we we had this memory wall that we that that that felt impenetrable, that we worked around. I mean, on the one hand, you worked on that multiple course for certain, so that that's a degree in which the programming model, the execution model do line up. But speculative execution is an example where the programming model and the execution model don't exactly line up, and it hasn't been a source of real problems

Raja Koduri: 35:50

for us. Yeah. Exactly. So, you know, we can talk about the programming models, right, and the execution model. I mean, the most I'd call the most successful programming model in the history of computing, right, is, you know, single threaded c program.

Raja Koduri: 36:05

Right? Right. We built the entire stack with, you know, with c. Right? I I still think that's, you know, kinda c is, you know, to me, the language of computer guards.

Raja Koduri: 36:15

Right? And when when, you know, I say, you know, to Roger, what language are you most comfortable with? Right? You know, that, English or my mother tongue, Telugu, or in the stuff. And I say, often half jokingly, I say, see.

Raja Koduri: 36:31

And, you know, I think in c. I may sometimes talk in c and all that stuff. But, you know, if I if I look at c code, the reason I like c code is that, I can, write every statement. Right? What it turns into.

Raja Koduri: 36:45

Right? Yes. Yeah. On the on, into instruction scene, you can kinda clearly see. Right?

Raja Koduri: 36:51

Like, you know, I'm not talking about libraries and other stuff. The the core, you know, c. Right? Like, you know, declaring a variable, you know, operations on the variable. Right?

Raja Koduri: 37:00

You know? And, if then else classes and all that stuff. You can see, you know, the the load compute store,

Bryan Cantrill: 37:07

Yeah. Paradigm. The assembly underneath.

Raja Koduri: 37:10

Yeah. You can peel the assembly underneath. It's like in this stuff. Right? And then, you know, compilers came and, you know, made made, the efficient you know, getting efficiency, out of a processor job a little easier than me writing assembly.

Raja Koduri: 37:24

Right? And then Right. The specific out of order execution was this, oh, you know, I you know, again, out of Huddl, if if I could be executing single threaded code really fast, right, you know, frequency was at, you know, 1,000 gigahertz by now or something like that. We wouldn't actually do speculative execution. Right?

Raja Koduri: 37:46

It wasn't. Right? So we could do more parallel, computing slots even on a on a CPU and to exploit, to run more than one instruction at a time. Right? You know, we invented, you know, this out of order thing.

Raja Koduri: 38:01

Right? You know, we tried v l I w and then,

Bryan Cantrill: 38:04

like, you know,

Raja Koduri: 38:05

which is which we said, you know, let compiler, figure out what can be executed in parallel, and that never worked out. Right? And then out of order came and rest kind of was, a bit of history. But one example, actually, just stepping back to kind of the original thing, Brian, I don't know if I used told this you this analogy the last time we met, but, you know, I I've dealt with the CPU versus GPU versus FPGA versus, like, you know, kind of ASIC question, you know, for for last 15 years. Right?

Raja Koduri: 38:34

Like, you know, the, it comes up and and if you work, if you ever work at a company where you have all of these as options, man, it's, it is kind of like you're fighting religious wars between this, you know, these entities, which I, you know, unfortunately happened to work at one such company where you had all this processing units. In fact, you know, I jokingly say the reason why NVIDIA won is because they had only one thing. Right? You know, they didn't have all this, you know, options. Right?

Raja Koduri: 38:58

If they if they even had a CPU, they would be screwed. Right? They would be fighting, like, you know, so many, you know, this stuff. So the That's a interesting point, actually.

Bryan Cantrill: 39:08

That's a very interesting point because and just for give other people context. So you were I mean, you were the the chief architect at Intel Where you I mean, it is true of AMD too, or you've got both AMD and Intel. Although, I think at the the you know, you've got all these different things under the same roof. And

Raja Koduri: 39:24

Yeah.

Bryan Cantrill: 39:24

It is. It's diffracting. And you talk to and it also does not result in, like, the best things. You think you've been

Raja Koduri: 39:30

No. No. So I'll tell you and I'll give you an analogy, which I think, you know, that you and, you know, all all the audience will very much connected. Okay? So think of if you I I would switch computation to transportation.

Raja Koduri: 39:44

Okay? Let's use transportation analogy. Okay? So I want I I live here in, you know, South Bay, in San Jose. Right?

Raja Koduri: 39:54

And, I I want to go to San Francisco. Right? You know, I have a bunch of different transportation options. Right? Or I want to, you know, transfer a package, an Amazon package from point a to point b.

Raja Koduri: 40:05

There are a bunch of different options. One is a car. Right? I can go, you know, sit in a car and drive. Right?

Raja Koduri: 40:12

And, or, I can go drive up to the train station nearby Caltrain and get on a Caltrain station and go. Alright.

Bryan Cantrill: 40:21

I apologize. Adam and I have had so much of our lives spent together on Caltrain that any mention of Caltrain is very much.

Adam Leventhal: 40:27

And go and maybe if you can go on

Bryan Cantrill: 40:29

Caltrain and maybe eventually you'll get there. Right?

Raja Koduri: 40:32

Yeah. Yeah. So so then you also have buses. Right? So, like so think of, CPUs as cars.

Raja Koduri: 40:40

Okay? So from point a to point b, you know, I can get right, you know, to my destination in a car. Okay? Yeah. I can carry 4 people to maybe 5 minivan.

Raja Koduri: 40:51

I carry 6 people. Right? But if I have 1,000 people that need to go from point a to point b, okay, you know, a train is far more, efficient. Right? You'd agree.

Raja Koduri: 41:03

Right? If I have 1,000 people. So, and in between is, a bus. Right? Bus is kind of an in between stock.

Raja Koduri: 41:10

Right? And the train is kind of far more efficient. So CPUs, when you have small payload, you know, CPUs are really efficient. Right? You don't actually even want to think about a GPU.

Raja Koduri: 41:23

Okay? You know? And it takes you to the end destination. So today, and GPUs, the early generation GPUs all the way in the 1st generation of CUDA GPUs before we introduced Tensor cores are kind of more like these big buses. Right?

Raja Koduri: 41:40

And when we introduced Tensor Cores, I call it they became more like train stations, right, in this stuff. But you still need to get to the train station.

Bryan Cantrill: 41:49

Train station.

Raja Koduri: 41:50

Right? You know, you need to get in your car to the train station. So they you can't build a computer without a CPU. Right? No matter what anybody says.

Raja Koduri: 41:57

Right? You know, a car is still needed. Even after I get, get down from the train station on the other end. Right? I need a car to, you know, get to my home and tap Cisco.

Raja Koduri: 42:08

Right? So, again, the transportation analogy works very well. And in this scheme of things, you might ask, okay. Raja, I understand, you know, cars, buses, and trade. Okay?

Raja Koduri: 42:18

So CPUs, you know, vector GPUs, and now tensor, kind of, you know, GPUs. Now where do FPGS put in this?

Bryan Cantrill: 42:25

Can I can I guess? Cause I've been this is what I've obviously been thinking for us. The analogy, I'm like, where do FPGS? But I'm like, FPGA is by foot. FPGA goes absolutely anywhere.

Bryan Cantrill: 42:34

I wanna go in the high Sierra. I wanna I can go wherever I want with an with an FPGA. Uh-huh. But the but it's not I mean, I think, you know, part of FPGAs are not necessarily fast. I mean, they are fast.

Raja Koduri: 42:46

Like, if you wanna get the discount Whitney. Yeah. They're not going to be fast on the roads and the the train tracks. Right? But they may be faster than car and the train on a pile of sand.

Raja Koduri: 42:58

Right? Or, like, you know or over the or the top of

Bryan Cantrill: 43:00

Mount Whitney. You're not taking a car to the top of Mount Whitney. You can Yeah. Yeah. Off 10 AM.

Bryan Cantrill: 43:04

Driving to the top of Mount Whitney. I will beat you there on fire.

Raja Koduri: 43:06

Yeah. Yeah.

Bryan Cantrill: 43:07

I mean,

Raja Koduri: 43:07

FPGS have 2 for personalities. 1 is I call them, you know, the this all this, construction vehicles that you see. Right? They also have you know, they look like, you know, you know, transportation vehicles, but they're only used for building the roads. Right?

Raja Koduri: 43:22

You know, building construction, building stuff. Right? You know, they're they're very useful. They're very, very important. Yeah.

Raja Koduri: 43:27

You can use them to drive on the road. Okay? Yeah. In fact, they go on the road from one transportation site to another transportation site, but you don't want to commute on them. Right?

Raja Koduri: 43:37

You don't want to use them for your daily use. Right? So if you do

Bryan Cantrill: 43:41

have analogy, the the FPGAs, you got construction vehicles. Are you're, like, dozers and so on or FPGA? Which actually

Raja Koduri: 43:47

is more. Yeah. Okay. That's how I look.

Bryan Cantrill: 43:49

Right. Right. Yeah.

Raja Koduri: 43:49

Okay. There is another there is another interesting aspect of FPGA as well, Brian. Right? They say, you know, that that that, analogy, you know, works for many things. But FPGA also has another interesting property.

Raja Koduri: 44:02

Right? I kind of call off just imagine, you know, the Lego car, you know, the super cool Lego car you can build. Okay? Right. Like, you know, with with, you know, its own custom stuff and, like, your proto it's a good prototyping mechanism for a new bus or a new car and all that stuff where, like, in all this kind of, you know, robotic car parts that if I buy and kind of build my own car, right, you know, from the parts.

Raja Koduri: 44:26

As a as a prototyping vehicle, you know, all these parts are very customizable, so I can build a car and and test it out. Like, you know, everything. Does it work? Is it in the stuff? It won't hit the, you know, my finished car cost or performance or energy efficiency, but I can prove out a lot of things.

Raja Koduri: 44:48

Right? And especially if I I only need to build, a dozen of them, right, to try it out. You know, FPGA is the best way to do that. So you don't want to tape out a chip and design and this and that and all that stuff. Right?

Raja Koduri: 45:00

So they're also good LEGO kits,

Bryan Cantrill: 45:03

right, for CPUs, GPUs, and any, ASICs. So that's kind of

Raja Koduri: 45:08

the way I see it. CPUs, you know, and and all of these are transportation mechanisms, and and they run on different infrastructure. Right? And when people ask me, like, you know, hey. Why can't you just add, like, you know, why couldn't the GPU win on AI?

Raja Koduri: 45:24

Sorry. CPU win on AI. Right? Like, you know, why couldn't you just deliver more bandwidth and all that stuff? I said, you could.

Raja Koduri: 45:29

There's nothing architecturally kind of, you know, special about all of this stuff. But remember that the the infrastructure, the roads, right, the road sets your speed limit. Right? The how fast you could go, you know, the the road infrastructure, right, versus a train infrastructure. Right?

Raja Koduri: 45:49

So your infrastructure is kinda like your memory subsystem, your bandwidth, all those things also need to be high performance, right, for you to deliver a GPU like throughput. So you can't just, you know, get GPU like throughput or a train like throughput on a road.

Bryan Cantrill: 46:09

Right. Right? It it would and when I think it's in this emphasis on memory, I think is really important because I think it's it's, and it also is a difference between, I mean, some of these workloads as you start to, I mean, like the these LLM workloads are really quite memory intensive. Right? I mean, they they they really balance memory and compute in a way that you don't see in these graphics oriented workloads, for example.

Raja Koduri: 46:33

Yep. Yep. Yep. No. I need, like, you know, literally.

Raja Koduri: 46:36

Right? If you take 1,000 people example. Right? If 4 people can fit in a car, right, you need 250 cars. Right?

Raja Koduri: 46:45

You know, going. Right? You know, going. And you can fit them in one train. Right?

Raja Koduri: 46:52

Thousand people. Right? So that's basically what's happening with these LLMs and, you know, this big AI workloads is the amount of the number of packages, right, you know, that you need to, you know, move across, is is so much that you can't just, you know, just use cars. Right? Right.

Raja Koduri: 47:11

You know, now to do a small prototype, quick and dirty prototype, right, with small number of parameters, CPU is awesome. Right? You know, like, you know, you can see Andre Tarpati's llama.c, you know, thing that he does. Right? You know, in fact, his single threaded c code runs on the CPU.

Raja Koduri: 47:31

And it's actually the most beautiful thing to understand the whole algorithm and all. Right? You know? Because we are, you know, much more, comfortable with getting in a car and going from point a to point b. Like, we live in California.

Raja Koduri: 47:44

Like, we understand we can't live without a car. Now just continuing on this analogy for them, you know, why you know? Okay. Now there is NVIDIA GPUs, and they are AMD GPUs, Intel GPUs. Right?

Raja Koduri: 47:56

And then, you know, some other AI hardware that try to mimic kind of NVIDIA like stuff. Why haven't they been successful? Why is CUDA? What is the difference between you know, if they are all trains and buses, right, you know, how come, you know, nobody could compete with, NVIDIA? So to for for that, I I switch the analogy.

Raja Koduri: 48:16

Instead of moving people, let's just say they are, shipping packages. Right? Like Amazon packages. K? So now, the package your the shape of your package.

Raja Koduri: 48:30

Right? Let's say your, trains are all rectangular compartments of a given size. Right? You know, each compartment is a, you know, given volume and is rectangular. And let's say all my packages are rectangular.

Raja Koduri: 48:45

Okay? I can pack my train compartment with 100% occupancy, right, with no air gaps. Right? And, you know, in in one shot, I can, you know, transport n number of those packages. But let's say my packages are weird shapes or some cylindrical shapes or some other kind of shapes.

Raja Koduri: 49:06

And when I load the same train up, I have, you know, air gaps. Right? I'm not utilizing it fully. Right? You know, the more concrete example.

Raja Koduri: 49:18

Right? You know, the the compartment size on AMD for AMD architecture, it's a 64 wide, you know, packet size, and NVIDIA is 32 wide. So when you have packages that come from CUDA, even if that was translated through HIP and Rockum, they go at half utilization on AMD unless you go repack.

Bryan Cantrill: 49:40

The code. Yeah. Interesting.

Raja Koduri: 49:42

Right? So while while you can make it work, your utilization, unless you ask the developer to repack, okay, is inefficient. Right? And that's the challenge with the popularity of CUDA. Like, you know, it's slowly.

Raja Koduri: 49:57

I call it the CUDA virus. Right? Like, you know, it's spread all across. You know, developers don't even write CUDA code. A lot of people, they're, you know, they're they're they're writing PyTorch code or something.

Raja Koduri: 50:07

But the implicit packing of, you know, your data and compute, that comes through, you know, various libraries and framework is biased towards, you know, the the in various sizes. Right? You know? And and, and and that's why it is not straightforward to just, attain performance. And performance matters.

Raja Koduri: 50:30

Right? And it's not just compatibility because, you know, if I'm paralyzing a workload, if I'm putting all the effort to get to a GPU, I'm doing it for performance. I'm not doing it for, you know, for compatibility because if I just cared about compatibility, I would never even write anything in parallel. Right? I would just, write a nice serial code and be happy with it.

Raja Koduri: 50:51

And so when I put that effort in, if I don't get performance, I'm disappointed. So that's, the the reason for, you know, Kide's stickiness and popularity, you know, beyond, being, you know, the the best architected, you know, hardware, software core abstraction. It, also taught people, you know, packing things in a particular way. Well, and

Bryan Cantrill: 51:20

just as you said, the execution model matches the programming model.

Raja Koduri: 51:23

Yeah. Yeah. Yeah. So if your

Bryan Cantrill: 51:24

ex if if your execution model does has a mismatch with that programming model, and it does remind me of, like, the the the non x86x86 parts that existed for a while. And Yeah. Ultimately, like, didn't deliver Like,

Adam Leventhal: 51:38

the transmeta

Bryan Cantrill: 51:40

stuff. The I I was thinking that you got the the the you had the Transmeta's for sure, and then you have those folks that were also doing binary translation. And Yeah. That and that all kind of with them because you you you're undermining the economics of it, or you're undermining the performance of it. Yeah.

Bryan Cantrill: 51:57

And you you you really, you know, you really have to be desperate to run on the you know, it's like, actually, I I I really wanna run on Spark or MIPS or Power. Yeah. And, it it it doesn't feel like it's not a kind of long term kind of a thing.

Raja Koduri: 52:12

Yeah. So that's what, Brian, led me to this, like, you know, that that equation. Right? That, that, hey, it has always been about whether it's CPUs, GPUs, whatever. Right?

Raja Koduri: 52:23

It was, about this blend of performance and generality. Right? So that's why, you know, the architectures that won that kept that performance, change in reality, equation they led. Right? So people, you know, said x86 was going to die many times, right, even through kind of eighties, nineties, and all that stuff.

Raja Koduri: 52:41

But thanks to, you know, they they had the generality covered with the the Microsoft partnership. Right? You know, the most popular operating system. And then, of course, then, you know, Linus did, you know, Linux on X86. But because they're not scaling and was, you know, active, they kept the you know, with the compatibility, they kept the performance, going up, you know, 2 x, every every 18 months or so.

Raja Koduri: 53:11

Who who who

Bryan Cantrill: 53:13

who the I mean, the presence of AMD really I mean, because the you know, Intel did fumble to Optron in the in, you know, what, 2004, 2005,006 kinda time frame through there. And the where the Intel got very distracted with IA 64 and VOIW and and the, the kind of magical compilers as you alluded to earlier that can go anywhere.

Raja Koduri: 53:36

Yeah. Yep. Yep.

Bryan Cantrill: 53:37

And and it was really AMD that that that that, refocused them on x86 and, you know, Yamhill. Right? And and realizing that we Intel developing Yamhill, which was effectively the implementation of the AM Yeah. Yeah.

Raja Koduri: 53:52

But by the way, that's a very interesting, point in history. Right? So be be because of the end of, you know, the large scaling, right, we couldn't just increase the performance, right, without impacting the generality in the sense that see, we tried keeping generality the same in the sense that, hey. Make a single threaded program go faster and faster and faster and faster, which the main tool we had was frequency. Okay?

Raja Koduri: 54:20

So frequency scaling ended, right, in that, you know, early 2000. Right? So we've been flat on frequency since that time frame. Right? Really, like, you know, we hit that 3 gigabit point.

Raja Koduri: 54:32

Yeah. Yeah. Yeah. Down even. Right?

Raja Koduri: 54:34

And so Intel said, oh, yeah. So how do I get, more performance? So let me go impact generality by, like, you know, VLIW architecture and make the compiler, but still hide it from the developer. Right? So don't let developers change anything.

Raja Koduri: 54:49

Okay? My magic compiler will, exploit the parallelism. Right?

Bryan Cantrill: 54:55

My my magic compiler that, by the way, I can't tell you about and I don't yet have. My magic my magic

Raja Koduri: 54:59

is a

Bryan Cantrill: 54:59

forthcoming compiler. Right.

Raja Koduri: 55:01

And then AMD and, you know, kind of the Microsoft stuff said, well, it is time for us to impact, the the you know, the developer needs to, contribute here. Right? The software guy needs to come into the play. So when multi core was introduced, right, you know, the multi core, right, that's how AMD went ahead. Multi core was needed software changes.

Raja Koduri: 55:28

Right? It didn't, you know, magically, you know, help all code. It only helped new code.

Bryan Cantrill: 55:35

Well, okay. So, okay. Actually, on that though, on that, I actually think to your point of the programming model matching execution model. Yeah. So the I I I actually don't think that that's quite right because what had also happened is the rise of SMP in late nineties.

Bryan Cantrill: 55:48

And part of the reason that actually Sun flourished when lots of other companies perish is because Sun made a very big bet on SMP. Yes. And Yeah. And there was an idea that time that, like, you cannot make SMP scale. You can't make the operating system scale.

Bryan Cantrill: 56:02

You can't make the database scale. Like, you you you can't scale this out to and I remember I've got a I, I I actually, I'm sorry. I'm just gonna name and shame him. Larry McVoy sent me a mail in in, 1995, saying in the confident Larry McVoy way that that, operating systems will never scale beyond 8 CPUs. And Yeah.

Bryan Cantrill: 56:27

That's and that wasn't correct, and that wasn't correct. In part I mean, it it it and this is one of these, you know, talk about, like, hardware software co design where the hardware did get a little further ahead of the software and where we had hardware systems that should have scaled You know,

Raja Koduri: 56:42

I think the blind spot, Brian then was I think we didn't, or, you know, I'm not on the CPU side. I'm just kind of saying that in this statement that the user level parallel like, you know, it's basically all the benchmarks, right, that we, you know like, I I made this statement on Twitter saying that, you know, you become what you're measured by. Right? Yeah. We were measuring, by single user benchmarks.

Bryan Cantrill: 57:07

Right?

Raja Koduri: 57:07

That's right. That's right. Like, you know, how can I scale a single user experience? And, you know, from that standpoint, that is still true that there is not much, like, you know, in this stuff. But, you know, the the whole transactions that the web and Internet and, like, you know, that we'll have, you know, literally billions of users.

Raja Koduri: 57:23

Right? So it's the user level parallelism meant that I am presenting multiple computers on a single computer, right, to the Right. Each user. Right? You know, that I think our imagination collective imagination didn't take into that into that into account.

Raja Koduri: 57:38

Right? So we underestimated the use of scale.

Bryan Cantrill: 57:42

I I think so. And and I think this is one of those where there was, an impedance mismatch a little bit in the industry because you did have people who were like, no. You don't understand. Like, there are many workloads that don't paralyze. And I remember being at times, I'm at Sun starting in 1986.

Bryan Cantrill: 57:56

And I'm be like, no. You don't understand. The thing that actually needs to scale is the actual operating system kernel. If the operating system kernel scales, you can actually run a bunch of single threaded processes. If you Yes.

Bryan Cantrill: 58:09

Provide things that you actually Yeah. Need to do a lot of work. And in particular, I I think a lot of people didn't realize that, like, Oracle, the dominant database commercial database at the time, this is true for for, the Cybex environment as well. These these were actual, like, these were single process databases that had multiple processes that shared a global area. And so they actually had the multiple processes sharing memory rather than both.

Bryan Cantrill: 58:34

And so Yeah. Not everything I mean, I I mean,

Raja Koduri: 58:36

I you know, now that you say that, I think, what and then, you know, we said, okay. You know, operating systems, you don't scale. We don't care. We'll put you, you know, underneath our, you know, with a virtual machine and have multiple

Bryan Cantrill: 58:48

operating systems running on the same machine. Right? And I think what we also just learned is that, you know and I think, I Bonwick's kind of observation Jeff Bonwick's observation that early on was, that if we have a machine that has n CPUs, and we can get it to be non pathological at n, it will perform well at n over 2. And so, I mean, the presence of the e 10 k 64 processor machine, it's like, by the time the late nineties rolls around, like, the operating system runs really, really well on certainly on 8 CPUs and on 16 and on 32. And Adam, like, are the the Serengeti systems that we had with 32 CPUs.

Bryan Cantrill: 59:26

Like and so then it's like you've got this kind of big question about how do we circumvent the memory wall? How do we how do we scale the memory wall? And the there's the VOI w idea coming out of IA 64. And then I the and I think in Russia, I'd be curious to know your perspective on this, but I think it's the piranha folks at deck world that are the first one where they they are the I think the first ones to take multiple alpha cores and put them on the same die. And I remember thinking, like, hot damn.

Bryan Cantrill: 59:59

That is a really good idea. Putting multiple cores on the same guy is a really good use. And this is before the end of the neuron scaling too. This isn't like the late nineties or 2000. I'm like, that feels like that's a direction that makes a lot more sense.

Bryan Cantrill: 01:00:13

And because it because to your early I mean, I'm just I I'm still ruminating on the programming model match matches the execution model and multiple cores on the die. The programming model matches the execution model. The idea of having no. No. Like, I've already got this, like, oh, I've got this big, you know, 64 CPU E10,000 server, and I'm actually just, like, putting that thing onto a die.

Bryan Cantrill: 01:00:33

Like, there's no the the level of magic was, certainly much less magic and magical thinking than VOIW. And, Raj, I'm actually got a very funny story for you about VOIW. And the so I went to a talk at Stanford given by John Crawford in what would have been the, like, the Merced hype peak. So I don't know when that would be. Like, maybe 99.

Bryan Cantrill: 01:00:59

I don't I don't think you were with me. I think this would have been

Adam Leventhal: 01:01:02

Yeah. But you the the that that peak was definitely late nineties before I was at sun.

Bryan Cantrill: 01:01:07

Before your sun. Okay. So I think that this is, like, 99. This is when because was Roger working on the Merced port when you arrived for that already?

Adam Leventhal: 01:01:14

No. We didn't talk about it by the time I arrived.

Raja Koduri: 01:01:16

You didn't talk

Bryan Cantrill: 01:01:17

about it. Okay. Okay. Yes. Okay.

Bryan Cantrill: 01:01:18

Yeah. So this is supposed to be, like, 99. So the the because we did I Raj, I don't know if you remember, like, project Monterey. This is the oh, god. The collaboration to do UNIX on on Merced.

Bryan Cantrill: 01:01:28

Merced being the first initiation of I 64. And so John Crawford is giving this talk at Stanford, and I'm like, oh, you know, it's kinda like open. It's free to people attend. I'm like, oh, this is great. So I went in this packed room in Stanford.

Bryan Cantrill: 01:01:45

And at the time, I'm like, I I don't know if the IW seems interesting. I I is it maybe this is interesting. And know, we were working on a port, and he's kind of explain you know, doing the classic, like, 8 queens problem. I'm just like, okay. Great.

Bryan Cantrill: 01:01:57

Wow. If I wanna solve 8 queens, this thing seems really

Raja Koduri: 01:02:00

cool. Rip.

Bryan Cantrill: 01:02:01

Yeah. It's just amazing. Unfortunately,

Raja Koduri: 01:02:04

I don't

Bryan Cantrill: 01:02:05

don't know how frequently I need to solve the 8 queens problem. But, boy, when I do, you're my first call for sure. And the but he's showing all of the spec benchmarks just to your point of, like, you are what you measure. And the and i64 is doing really well on all of them except for 1, which was the GCC benchmark. And and it's doing terribly on the GCP benchmark.

Bryan Cantrill: 01:02:29

And I'm thinking, like, you know, I'm just kind of a sprout over here. I'm I'm a I'm an I come a young turk. I don't but that feels that it doesn't feel odd that that the geez. And there's this voice at the back of the room who's like, hey, can I stop you on that slide? The g c's the the only one of those things that's actually a real world benchmark is the GCC benchmark.

Bryan Cantrill: 01:02:49

That's the only thing that runs into reality.

Raja Koduri: 01:02:51

Yeah.

Bryan Cantrill: 01:02:51

Yeah. And you were doing terribly on that benchmark. And I was like, oh my god, this guy and Crawford is kind of like scammering a little bit, and every and everyone is kinda turning around, like, who is this voice at the back? And it's John Hennessy.

Raja Koduri: 01:03:05

Yeah. And Nice.

Bryan Cantrill: 01:03:07

And and I'm like, I think I have just witnessed a Silicon Valley gangland slaying. I mean, this is like, you know, John Hennessey tearing off in a car, you know, with well, the, later to be the president of Stanford. I think well, Stan, is he the am I making that up? Yeah. The

Raja Koduri: 01:03:23

You know what's what's fascinating is it took, 20 years from that, day for, Intel to fix GCC or their GCC performance. Right? And it took 20 years.

Bryan Cantrill: 01:03:35

20 years. Amazing. Well, because it and actually and that stuck with me is and talk this was asked as a question. Doesn't this willfully ignore all research on this topic? This is like but, yeah, introduced to 20 wait.

Bryan Cantrill: 01:03:49

Because I'm like, it's a super sloppy workload. The GCC is just like it's just pointer chasing, and it's like not the the it's the kind of thing that VLIW is gonna do. And VLIW also does very bad at because

Raja Koduri: 01:04:02

Yeah.

Bryan Cantrill: 01:04:02

Not only

Raja Koduri: 01:04:03

Not only I mean, they had 2 issues. Right? Like, you know, there is a technical issue and there is a political issue with GCC. Okay. You know, Intel struggled with for years, which is, hey.

Raja Koduri: 01:04:13

If I contribute optimizations to GCC, right, like, you know, the com you know, the, all the stuff, all my secret sauce will be out to everybody. Right? So, you know, this this whole open source, closed source. Right? You know, battle went on for a long, long time inside and out, like, on the compiler optimizations and they're contributing.

Raja Koduri: 01:04:34

So they are they hope GCC would die and everybody would use ICC and, like, you know, the that, you know, that would be the compiler of choice. But then ICC became just a benchmark compiler. Right? It was, you know, it ran benchmarks and little of anything else that anybody trusted it to, you know, work. Which is pretty tragic.

Bryan Cantrill: 01:04:54

Because it was a it was a good compiler, I think. I always anytime I could get the time of the day at anyone from Intel, I'd be like, you guys should really open ICC, you know? And they're like, nope. Never gonna happen. I'm like, alright.

Bryan Cantrill: 01:05:03

We're gonna write a lot.

Raja Koduri: 01:05:05

Yeah. Yeah. Yeah.

Bryan Cantrill: 01:05:06

Yeah. Okay. But this would so the but this is really I mean, I I kinda love your your casting of history in terms of of programming model versus execution model, where these things kinda came into to tension and where folks were kinda where there's a divide that people were willfully ignoring. So Yeah. In terms of where does that put us today in terms actually, no.

Bryan Cantrill: 01:05:29

Before I get that, I got one other question for you because I think the other thing that is, like, really interesting to me is this you kinda have these things that become economically there's so much demand, broad demand that they kind of, economically, they become ubiquitous. And then something else comes along. It's like, hey. As long as those are economically ubiquitous, I've got another thing I wanna do on those. Right?

Bryan Cantrill: 01:05:53

And this is like Adam, this is obviously Flash and from a storage perspective

Raja Koduri: 01:05:57

Mhmm. What we

Bryan Cantrill: 01:05:58

did with Flash at Fishworks. But then it's clearly with the GPUs now with researchers realizing, like, hey, wait a minute. I you know, we've got these, like, the this these vastly parallel matrix multipliers out there that have been designed to, like, handle polygons, triangles very quickly. I actually, like, think I could use that to train a neural net. And I mean, that's like in the 2,000 what?

Bryan Cantrill: 01:06:21

12 time frame. Right, Raj? I mean, and that is a that is a huge breakthrough, but it's a it's a parallel that we've seen throughout in history where someone's like, actually, I've got another use for that hardware that may give this thing second life from a you you know, you think that you've made you know, who cares about, you know, you know, the number of frames per second except for you outside a couple of gamers once you've exceeded human cognition or or or capability. But, no, wait a minute. Actually, now we are gonna use this thing as a train and roll nets, and now it takes off again.

Bryan Cantrill: 01:06:53

We I mean, it it it first of all, is that of approximation of history? How how

Raja Koduri: 01:06:57

Yeah. Yeah. Yep. Yep. Because So, you know, the so the early, so what happened was basically right?

Raja Koduri: 01:07:04

So we, you know, so the the the gaming, 3 d graphics survived. Right? And, in 2002, k, we moved to introducing, floating point arithmetic. Okay? And we did the 1st floating point we did was an FP 24 format.

Bryan Cantrill: 01:07:25

The first floating point in a GPU was only in 2,002?

Raja Koduri: 01:07:28

It's 2,002. Right? So and, integer before that. What's that? It was all yeah.

Raja Koduri: 01:07:35

It was all integer or fixed point.

Bryan Cantrill: 01:07:37

Fixed

Raja Koduri: 01:07:37

point. Yeah. Yeah.

Bryan Cantrill: 01:07:38

Fixed points. Yeah.

Raja Koduri: 01:07:40

Yeah. I Right. So Right. First floating point was actually done by ATI, shipped the first one. F p 24 and Microsoft did DirectX 9 with the programmable shaders.

Raja Koduri: 01:07:53

Right? And then 2,005, we did the FP 32, I triple a FP 32 format. Right? And, so every, you know you know you know these things called game consoles. Right?

Raja Koduri: 01:08:06

You know, the the the Sony and, Microsoft, Xbox, PlayStation, Nintendo stuff.

Bryan Cantrill: 01:08:11

So you must be aware of these things. Yes.

Raja Koduri: 01:08:13

Yeah. Yeah. So so this whenever a new generation of game console launches, right, you know, it it gets a lot of, you know, hype and, you know, a lot of excitement, the PC GPU sales drop. Okay? So right here, so there's a period, the Game Console launch here and the next year, you know, you you get a dip, right, in the in the PC GPUs.

Raja Koduri: 01:08:36

And then in circa 2,005 time frame, we were kind of experiencing that dip after Xbox 360 and, you know, PS 3 launches. Right? And, both us and NVIDIA, yeah, and NVIDIA, we were like, hey. We need to and then the integrated, graphics, is also occupying the whole bunch of market. Mobile laptops have become, you know, very popular.

Raja Koduri: 01:09:00

Right? You know, they have very high volumes. So this big honking discrete GPUs that go into your PC, desktop PCs, right, were experiencing a downturn. Interesting. Right?

Raja Koduri: 01:09:12

So so, we were looking for, hey. What else can we do beyond gaming on these GPUs? Right? Like so it was a, you know, kind of a, you know, a solution, looking for a problem in Looking for a problem. Pipeline.

Raja Koduri: 01:09:27

Yeah. Interesting. And and then we saw, you know, the Stanford guys. Right? Mike Houston and Ian Buck and, like, you know, Pat Handelham's guys, you know, were the first ones that were, hey.

Raja Koduri: 01:09:38

We can do matrix multiplies. We can do some fluid simulations on GPUs. And they coined the term g p GPU, general purpose GPU. And, they even did the first, you know, general purpose GPU language called Brooke. And, and they were, you know, g p GPU became a a a paper minting machinery.

Raja Koduri: 01:09:58

Right? Like, you know, basically take any any any any classic algorithm that was there, out in the world for the last, you know, 10 decades and, you know, port it to GPU, and you can get your paper published. So academy, got all on this, like, you know, take some parallel algorithm, map it to g p GPU, and get paper published. Right? So that was This this is happening

Bryan Cantrill: 01:10:20

this is, like, in it's you sound sounds like 2005, 2006,007. Is that

Raja Koduri: 01:10:25

Yeah. Yeah. Okay. That time frame. Right?

Raja Koduri: 01:10:27

They started in 4, 5, 6. Right? You know, that time frame. And, but one thing was clear. Right?

Raja Koduri: 01:10:33

The the GPU languages, right, they look like c. Right? You know, all of actually, even micro, shading languages and all, they're all when you look at the syntax, you'll be very familiar. It looks like c with some weird, constraints. Right?

Raja Koduri: 01:10:48

Weird constraints and weird ex extensions. It's very readable if you're a c programmer, but it wasn't c. The the biggest, gap for something being a c right? So GPU didn't support pointers. K?

Raja Koduri: 01:11:04

So CUDA was the 1st GPU language and the g eighty architecture. So you can't. It's not just a language issue. You the hardware need to support pointers. Right?

Raja Koduri: 01:11:15

If not, it's a nightmare for you to, you know, imitate pointers in the software stack.

Bryan Cantrill: 01:11:20

Programming model and the execution model needs to match the

Raja Koduri: 01:11:22

execution model. Needs to match. So NVIDIA's g 80 architecture that they launched in, you know, 2007 and, you know, with the CUDA, which is the first GPU language with point of support, was the game changer because for people coming from non graphics world, from kinda, you know, CPU, you know, regular programming world, the the it became like, you know, hey. You can program in c. Right?

Raja Koduri: 01:11:48

You know, kinda. Right? And then so you still need to express pack your work differently than writing a, you know, for blue can see. Right? You know, you know, these thread groups and, you know, these concepts like that.

Raja Koduri: 01:12:01

But it was very, you know, simple to teach somebody, that is coming from SeaWorld, you know, how to take a simple formula and annotate it, to a quota program. Right? You know? And so that was kind of, you know, the credit to NVIDIA. And you couldn't have done that with just a programming language.

Raja Koduri: 01:12:21

You need the GPU to match, right, that. Right? So so that kinda, you know, coulda happened in that time frame. But but even then, right, you know, from 2,000 7 till 2012 or even 2013 time frame, 6 years, CUDA was a solution looking for a problem. Right?

Raja Koduri: 01:12:41

You know, that hey. It didn't really make much money to NVIDIA. Right? You know, that that, you know, the, they won some HPC contracts and supercomputer accelerator stuff, but it was, like, in the, you know, 100,000,000 maximum range of revenues they had. Right?

Raja Koduri: 01:12:57

It wasn't enough to justify, like, all the investment that was going on around, CUDA. And, you know, that's, you know, another story for another time. You know, I was very, you know, closely involved with them because I was at Apple. I was, like, I was their customer. Right?

Raja Koduri: 01:13:11

And various customers were working, very closely with them on variety of stuff. So, so that that was one thing. The second problem with GPU or any accelerator stuff that the, you know, the classic software people had was, see, you had to again, using the train analogy, see, I I need to move data into the GPU first before I can execute a program on it. Right? And, and then that this data need to move across the PCIe bus.

Raja Koduri: 01:13:42

Right? I need to load the GPU up fast and then, like, say, go. Right? Right. And the first the initial complaint we got from people, this is great as long as, like, you know, theoretically, once I load the, GPU up and the dispatch computation, it runs fast.

Raja Koduri: 01:14:01

But, you know, I have to pay for the loading cost and then the unloading cost too. Right? I have to bring it back because the main loop is sitting on the CPU. And, so while theoretically, right, you can show micro benchmarks doing great, macro level, it is not that a that big win. Right?

Raja Koduri: 01:14:19

You know, moving data data back and forth. So they were kind of dragging their feet, because, you know, for you to fully utilize a GPU, it is exactly like, you know, if you're if you're committing yourself to traveling by train. Right? You'll have an entire plan, right, in every day. Right?

Raja Koduri: 01:14:35

And how do you get to the train station? How do you get on it? How do you get off it? Right? You know, you have a plan.

Raja Koduri: 01:14:40

But these guys were like, you know, hey. I it's just like a typical Californian. Right? You know, I I love my freedom with the car. Right?

Raja Koduri: 01:14:47

You know? So I can go anywhere. So there was a lot of resistance. So the first people that came and said, we found something, hey, productive. We can go actually ship a commercial product with it are the simulation guys.

Raja Koduri: 01:15:01

Right? Like, you know, the the Dassault systems, you know, the Ansys guys. Right? Ansys that got acquired recently. They were the first people in my I've been

Bryan Cantrill: 01:15:12

guests on Oxide and Friends.

Raja Koduri: 01:15:14

Oh, I see. Yeah. Yeah.

Bryan Cantrill: 01:15:15

Yeah. We actually had a we had a good we were, Ancestor is definitely a partner Voci.

Raja Koduri: 01:15:20

Yeah. So I I remember a, you know, a distinct conversation, and this is actually in the history of, you know, one of the lessons learned. I I say, hey. If you're, McDonald's, right, and, somebody comes to your counter and orders a burger. Okay?

Raja Koduri: 01:15:35

And they're very hungry. Right? You know, I want a burger and a milkshake and some fries. Okay? Don't offer them, hey.

Raja Koduri: 01:15:42

I have something better for you. I have a, you know, a salad which is more healthy for you. You should, you know, get this and all that stuff. Right? You know, as a business, you should never do that.

Raja Koduri: 01:15:50

Right? If your customer wants a burger, give it to him. Right? Or or whoever it is. So in this time frame, you know, it's really into fascinating history because, you know, NVIDIA was only a discrete graphics company.

Raja Koduri: 01:16:04

They didn't have a CPU. Right? All they could do was just keep building discrete graphics, which sit on PCIE bus, right, and, you know, keep doing things. AMD was in a bit of confused state at that point. Right?

Raja Koduri: 01:16:18

It had discrete graphics. In fact, you know, when Ansys came to us, we had, more memory bandwidth and more flops on the discrete graphic socket than NVIDIA did in that generation. But we said, hey, Ansys. We are building this thing called an APU where we are integrating, you know, these graphics blocks and CPU blocks into one guy, you know, with a with a with a unified memory subsystem.

Bryan Cantrill: 01:16:46

Oh my god.

Raja Koduri: 01:16:46

And you don't you don't have to move any data and all these, like, you know, stupid things. And you could have a, like this, you know, HSA programming model right here. This new programming model where you, you know, this this CUDA is, like, you know, so crappy. You don't really want to do that. We have something new coming up.

Raja Koduri: 01:17:06

And, the Ansys guy, I remember, like, and I just I I was 1 week back from, Apple into AMD. I was just invited to the meeting. I was sitting the guy. He's like, yeah. Yeah.

Raja Koduri: 01:17:15

Yeah. Those are all great. Sounds great. You know, that thing is 2 years away. We are telling you today, I can just, with your stupid, discrete graphics when and this whole c p PCIe, you know, loading, we can still take advantage of it because, you know, the matrices are large enough.

Raja Koduri: 01:17:32

And if I pipeline, like, you know, the matrix load into the GPU and compute and the next matrix load, Okay? I can get full utilization of it. And all we need from you and hold on to your kind of, you know, seats right now. You know what they wanted from AMD? Was we need you to support Linux.

Raja Koduri: 01:17:50

And Linux, your GPU or Linux can give us Linux drivers, and that's all we are asking you to do. And, you know, we said, nah, you know, discrete GPUs are for gaming and for compute. So we have this APU thing coming up And, you know, you should,

Bryan Cantrill: 01:18:08

you know, make No hamburgers for the salad.

Raja Koduri: 01:18:10

Yeah. Yeah. Yeah.

Bryan Cantrill: 01:18:12

Yeah. Salad's been for That's

Raja Koduri: 01:18:13

And I got some great salad.

Bryan Cantrill: 01:18:14

It's gonna I'm still growing it up, Mac. It's gonna be ready in a couple years, but it's gonna be pretty much free now. No.

Raja Koduri: 01:18:19

No. Yeah. And and that that was the big you know, the the reason the deep learning didn't happen on AMD or in this stuff was actually, you know, a lot related to AMD not having a at least an open CL like thing running on Linux. Okay. So Linux was deep deep deprioritized for years years years.

Raja Koduri: 01:18:39

And, you know, I, I got it back to some level of priority in 2015 to kinda you know, when when, you know, the, the AI thing was, and deep learning was taking off. Right? You know, you had to fix the the foundations, plumbing. If you don't have Linux drivers and and these things take long, Brian. This is

Bryan Cantrill: 01:18:59

a long time.

Raja Koduri: 01:19:00

You're doing your driver. If you don't have a good driver, good memory manager, good stuff, people think, oh, you know, what the heck? You know, you can you have 1,000,000 of dollars. Just hire people and all. Guess what?

Raja Koduri: 01:19:11

You know, driver program or system program or send all, they don't grow on trees anymore. You can't just buy you know, no matter how much money you throw at them. Right? They're they don't exist anymore. Right?

Raja Koduri: 01:19:20

And so there are very few good people who understand kernel level stuff. And, you know, I know you guys very much appreciate it with what you do. Right? You know? And you find, like, you know, pretty dumb stupid things at bias level too.

Raja Koduri: 01:19:35

Right? And the number of people who understand these things are either gray haired or bald headed and, you know, retired and, you know,

Bryan Cantrill: 01:19:41

this stuff.

Raja Koduri: 01:19:42

So You also have the

Bryan Cantrill: 01:19:43

consequences of, like, it is that that's really interesting. I'm not really, that the gaming course makes total sense that the gaming history of this was really guiding people towards a Windows only approach.

Raja Koduri: 01:19:54

And Yeah.

Bryan Cantrill: 01:19:54

Yeah. Of course, like, the folks that actually wanna use this as as a general programmable thing on the server side, like, no. No. We're not using sorry. No.

Bryan Cantrill: 01:20:01

We're using open source operating systems. We're not using a proprietary

Raja Koduri: 01:20:04

Well, no. Windows had a Windows had a very weird problem with compute. Okay? It's very fascinating. You know?

Raja Koduri: 01:20:09

Yeah. I mean, people are we could have said, yeah. Why don't you kind of, you know, ship it on Windows? Right? You know, and and kinda, you know, do that.

Raja Koduri: 01:20:16

Right? Windows had a very weird problem is that because Windows was a graphics, you know, GUI operating system fundamentally, okay, and it, the GPU device also renders pixels. Okay? So you can't take the you can't take the GPU away for That's right. Talk you Let let no.

Raja Koduri: 01:20:38

100 of seconds away, right, doing a big long loop, it will throw that, you know, Windows, time time out exception error and, reboot the whole system. Right? So you have to context switch it. Like, you have to only you can only give, like, you know, pockets of work that are less than, you know, some number of 100 milliseconds or something like that. Right?

Raja Koduri: 01:21:00

So, yeah, Windows wasn't very, favorable, like, you know, you know, somewhat operating system for, like, doing GPU. Right? And on Linux. Yeah. We could just kinda tie up the GPU for minutes at a time without, you know, yeah, TDR or what it's like.

Raja Koduri: 01:21:17

Yeah.

Bryan Cantrill: 01:21:17

Yeah. Right.

Raja Koduri: 01:21:18

With the

Bryan Cantrill: 01:21:18

with the

Raja Koduri: 01:21:18

You know the

Bryan Cantrill: 01:21:19

the cost of switching workloads makes it especially

Raja Koduri: 01:21:22

interesting there. Yeah. Yeah. No. But but, like, these things are all fixable, but the DNA matters a lot.

Raja Koduri: 01:21:28

Right? You know, I am a fundamentally a UI oriented operator. And this is one of the reasons why, you know, Linux and desktop on Linux are still, you know, you know, something that hasn't been solved yet. Right? Like, you know, that we that you can live on.

Raja Koduri: 01:21:44

Right?

Bryan Cantrill: 01:21:45

No. Listen. I will I will thank you to acknowledge that after some prolonged experimentation, I finally found a Linux audio configuration that allowed us to not be on this great podcast, but it was not a small amount of work, and it did occur to me that

Raja Koduri: 01:21:57

you're about that.

Bryan Cantrill: 01:22:00

But that that is ranching. So so the the the GPGBU folks are putting, I've obviously pressure on this thing to be used in a more general purpose for and then and NVIDIA just sounds like was accommodating of that a little bit earlier. And

Raja Koduri: 01:22:14

and and for them, you know, it's, you know, it's also they didn't have other options. Right? Like, you know, that AMD had CPU. Right? Like, you know, so people were doing other things and really, had to, you know, survive with selling as many discrete GPUs as they could.

Raja Koduri: 01:22:33

Okay? And, you know, then then they ventured off into Tegra. You know, they also, like, you know, you know, everybody believed that you have to get into mobile and all. Right? You know, that because then the the days of discrete GPU were numbered.

Raja Koduri: 01:22:46

Right? And, and and but, you know, the AI and deep learning thing happened and rest is history, of course. Right? And, you know, and in fact, the problem with integration, Brian, when you look back, right, integration is not scalable. Right?

Raja Koduri: 01:23:01

You know, discrete is a lot more scalable. Right? Because right now you look at the DGX box. Okay? So I have 2 CPUs and 8 GPUs are now, like, you know, n v l 72, 72 GPUs or something like that.

Raja Koduri: 01:23:13

Right? So you can independently scale your compute elements, right, based on, you know, the problem, of the, of the day. But if I integrated CPU and GPU onto one die, right, you get the rate the the scaling ratio is fixed. You get way too many CPU course than you need it. Right?

Raja Koduri: 01:23:35

You know, and the stuff. Right? You know, I I didn't need that. So, you know, discrete has interesting property that you can scale the compute elements independent of each other.

Bryan Cantrill: 01:23:47

It does, but then you do have this problem on the especially on these LOM workloads, which are so memory intensive that now you are doing the the fact that it's a discrete GPU, you are now doing a whole lot of work to get memory. As soon as you exceed kind of the HBM limitations

Raja Koduri: 01:24:05

Yeah. Yeah. Yeah. Yeah. Yeah.

Bryan Cantrill: 01:24:06

Yeah. Yeah. They're hitting the network. And Yep. You know, and that's kinda where the the because we're we're kind of in in a in a nutty spot with this or especially with the with the respect to the power that these things draw.

Bryan Cantrill: 01:24:20

I mean, I think

Raja Koduri: 01:24:22

But but I don't I don't really need a bunch of CPU cores to solve or, like, you know, to increase my memory capacity. Right? So the so the equation that I put out, on Twitter, over the weekend. Right? You know, I said the new the performance equation, the numerator has become a function of locks, bandwidth, and capacity.

Raja Koduri: 01:24:41

Right? And kept by capacity, I mean, memory capacity. Right? And, by bandwidth, I mean, you know, memory bandwidth. But, again, the bandwidth and capacity, you have rate you have hierarchies.

Raja Koduri: 01:24:51

Right? You know, you have your you know, all the way from your register, to l ones, to l twos, to, HBM, to LPDDR that might be connected, you know, across the bus on the CPU side to NVLink to, you know, Ethernet to your storage. Right? You know, and BME or whatever. Right?

Raja Koduri: 01:25:11

And each hierarchy has different capacity and different bandwidth. Right? And, so you're managing your data through all these levels of hierarchy. Right? And, you know, so that that problem is actually common no matter how you integrate CPU and GPU is what I learned.

Raja Koduri: 01:25:30

Right? You know? Yes. It is. It looks easier, you know, if I am, on a unified memory architecture.

Raja Koduri: 01:25:37

Right? You know, intuitively, it feels easier. But it just for attaining performance. Yeah. But for attaining performance, I'm dealing with NUMA.

Raja Koduri: 01:25:44

Right? You know, large scale NUMA. Right? You know? So, you have to, you know I I I jokingly say the new the new, Ooma is NUMA.

Raja Koduri: 01:25:57

Right? Right. You know? So, yeah, Numa is here to stay. And, you know, there's there's no escaping it, unfortunately.

Raja Koduri: 01:26:08

So another thing

Bryan Cantrill: 01:26:09

you mentioned in that piece that that I know and I and I talked about this in the past as well is the role of packaging in all this. Mhmm. And the I mean, and you you'd kinda had a note that, like, boy, if I go back to myself, you know, 10 years ago, and explain how important pack I like, I I the fact that packaging has become so important. Yeah. Could you elaborate on that a little bit?

Bryan Cantrill: 01:26:32

Yeah. And why packaging is so important, and how it drives cost as well? Because I think that's one of your one of your Yeah.

Raja Koduri: 01:26:41

Yeah. Yeah. No. I'll I'll so the so the basically, you know, the end of kinda, you know, mood slot and not scaling and all. Right?

Raja Koduri: 01:26:48

So your opportune so now you're looking at system level opportunities to deliver more performance. So one of the talks I used to give, you know, when I was at Intel to the both inside and outside is I used to call it the what is the essence of Moore's Law? Right? So the, the, you know, the kind of the, what you call, the literal, Moore's law statement was, you know, every 18 months, you double the transistor density. Right?

Raja Koduri: 01:27:18

You know, something like that. But the sense of moods was every 18 months or so, you're providing, you know, double performance, double value to the user. Right? You know, at the same price or whatever stuff. Right?

Raja Koduri: 01:27:32

So so once Moore's Law kind of ended, what are the opportunities to for for you to kinda keep the industry going by providing that, you know, increase in performance, right, at a similar power, and similar cost? So all your, you know, optimization opportunities have become I I call it picojoules per bit optimization. Okay? So you have picojoules per flop. And, you know, power is still, you know, grand unifying limiter for all of us.

Raja Koduri: 01:28:00

Right? You know, power thermals and all. You have how many ops can you do, right? You know, how many how many picojoules for an operation and how many picojoules for a bit transferring a bit or moving a bit. So the picojoules per op is, you know, primarily controlled by my, transistor.

Raja Koduri: 01:28:20

Right? Which slowed down. I was only getting, like, you know, to a 10%, 20% improvement generation to generation. But PeopleJules per bit got 10 x, maybe 100 x in some cases opportunity. Right?

Raja Koduri: 01:28:33

So integration, what was integration doing? Let's say I had if you remember old motherboards, even a PC motherboard. Right? I had CPU, south bridge, north bridge. Right?

Raja Koduri: 01:28:44

And memory. So Yeah. And the south bridge and, the c p the central CPU, we're talking through through the motherboard interfaces and all. Right? So they're going through analog 30s and, like, you know, some tens of picojoules per bit interfaces to talk to the other guy.

Raja Koduri: 01:29:00

And when I brought the Southbridge and Northbridge onto the same day, I I took 10 picojoules per bit or higher, you know, picojoules per bit interface and made it into on die interface, which are, like, 0.5 picojoules per bit. Right? 10 x reduction. So that's another kind of way to look at why integration was necessary. Right?

Raja Koduri: 01:29:20

Is to get the energy efficiency of all this stuff. So, so in that realm but then, now I can't you know, I still have limits on how big a die I can do. Right? Like in the stuff. Right?

Raja Koduri: 01:29:33

I have radical limits and all. So to continue that integration packaging so if I can put 2 dice together next to each other and they can talk, you know, low low picojoules per bit, high bandwidth, I have the, you know, same benefits of integration without, you know, making a big monolithic die. Right? That was the genesis of advanced packaging. Right?

Raja Koduri: 01:29:57

How can I stitch 2 dice together, at a very, you know, low picojoules per bit interface, and high bandwidth so that I am mimicking a big large, you know, monolithic integrated die? So that's 1 that's in the 2 dimensions. Now imagine three dimensions. So if I stack the dye on top of each other, right, the distance by the way, the picojoules per bit is a, the energy is also, proportional to the distance the bit needs to travel. So even in 2 d, if I have to travel long distances and consuming, you know, a lot of femtajoules per, you know, the I forget the exact number on how many femtajoules per, you know, nanometer.

Raja Koduri: 01:30:42

You know, I I, referenced Bill Daley's talk in my Twitter thread.

Bryan Cantrill: 01:30:46

Oh, that's

Raja Koduri: 01:30:47

his good talk. I love that talk. Yeah. Yeah. He lists that femtajoules for, you know, nanometer in the top you can compute.

Raja Koduri: 01:30:54

But when I put, 3 d stacking, I'm kind of cheating. Right? Cheating in the sense it's a cool optimization and reduce the distance, you know, how how long the, signal needs to travel. So that kind of, you know, made the 3 d packaging very interesting. Right?

Raja Koduri: 01:31:10

So both the 2 d advanced packaging and 3 d packaging are mechanisms or tools for me to reduce the, you know, the the picojoules per bit while delivering more bits. Right? You know, delivering more bandwidth. So that was, you know, the the, why those tools were important and why we were working on these tools for a long time. And, you know, I was, you know, I call it, my early scars on the back.

Raja Koduri: 01:31:36

Like, if you're the 1st guy to use the, you know, the a new tool, you know, you pay for a lot. Right? You know, you pay for everybody's learning because And this was this

Bryan Cantrill: 01:31:45

was at AMD. Right? I mean, we're

Raja Koduri: 01:31:46

At AMD, I did. Yeah. We did the the Fiji HBM 1 on a GPU. The first HBM GPU was done by AMD. Right?

Raja Koduri: 01:31:55

So we had to, you know, plumb the entire, you know, roads for it and, you know, and discovered a lot of things. But, so now coming to the cost. Right? So there are 2 aspects of this cost always. Right?

Raja Koduri: 01:32:08

1 is until a particular technology and the manufacturing methodology scales, okay, to high volume, your all the initial capex that you put, right, capital expenditure you put, is amortized on this on this chips. Right? So you end up paying, you know, big money, whenever you come up with a new technology, which may be awesome physics wise and all this stuff, but there is an upfront cost to it that gets, you know, somebody has to pay. And the early adopters end up paying, you know, a lion's share of that cost. Right?

Raja Koduri: 01:32:47

So so so there's that cost aspect. You know, the the other cost aspect is tiny cost. Right? So, you know, if if a standard package took me, you know, 1 hour to package a thing and test it and put it out, right, whereas an advanced packaging took, like, you know, 7 or 8 hours. Okay?

Raja Koduri: 01:33:10

That time in the fab also gets billed to you. Right? You know, because time is money. Right? You know, that, every, you know, second, the dollars are ticking for, on these factories.

Raja Koduri: 01:33:24

So you have to come up with methodologies where the amount of time it takes to package these advanced things is also lower. But since the whole advanced packaging is new and, let's say you look at an, you know, latest Nvidia B200 package. Okay. You've got 2 big dies and then, like, you know, 8 HBM surround or something like that, and then an B link connection on the side. Its, layout is very different than Mi 300 from AMD, and its layout is very different than, Ponte Vecchio from Intel.

Raja Koduri: 01:33:58

Right? You know, it says got 47 tiles. So the automation in the assembly line for each of these packages is very different. So so so you don't really get whereas the standard package is automated, like, you know, almost 15, 20 years ago. Right?

Raja Koduri: 01:34:17

It's not very different. So, so so that adds cost. So, you know, it's a long winded way of saying that there is, the time it takes to assemble these packages is longer than the standard packages and you pay. Okay? And the last cost is actually the physical material cost like you may have an organic substrate and this and that and all that stuff.

Raja Koduri: 01:34:38

You know, you have, AMD, has a big interposer underneath, which is a, you know, older generation silicon technology. So you have to pay for that wafer cost. So they all add up. And, you know, right now, it's kind of, you know, pretty high compared to a standard, packaging cost. But, as I said in my, Twitter note, you know, not all of it is, like, you know, kind of physics justified, right, the cost.

Raja Koduri: 01:35:02

A lot of it is, yeah, a lot of it is because AI is in this big hype cycle and Nvidia is making, you know, 90% margins. Everybody's in the ecosystem says, hey. You know, why am I not getting, you know, some of some of it? And they jacked up prices. Everybody jacked up prices.

Raja Koduri: 01:35:18

Right? And this stuff is just, it's just ridiculous right now that every component that's involved in building one of this, you know, GPU type things with HBM is, is not, is out of reach for, you know, you know, regular stuff. And I I I think that's actually unsustainable because, you know, for AI to reach, 7,000,000,000 people, it has to you know, the infrastructure cost need to come down to kind of like the, you know, the the regular general purpose cloud costs are even lower, right, frankly. And, I I'm optimistic that it's going to happen. I think, you know, this this, you know, the this hype cycle need to kinda, you know there there'll be a crash for a little bit.

Raja Koduri: 01:36:04

Yeah. Where oh, yeah. You know, we don't need any more GPU. No. No.

Raja Koduri: 01:36:08

It's because, you know, if I want to build a 1 megawatt data center, Brian, right now with, you know, NVIDIA hardware, it cost me $35,000,000, okay, for 1 megawatt. Okay.

Bryan Cantrill: 01:36:19

For you right. Yeah. And that's assuming that you can find the megawatt. I mean, it's I mean Yeah. There's a real power infrastructure problem here.

Bryan Cantrill: 01:36:28

I mean, this is You know that problem is quite real.

Raja Koduri: 01:36:32

That's interesting. Right? It depends on regions. Right? Like so I'm working with a few, entities in India that have, you know, some of this big, you know, gig no.

Raja Koduri: 01:36:43

Gigawatt solar farms. Okay?

Bryan Cantrill: 01:36:44

Okay. Well, there you go. Yeah. Great. Okay.

Bryan Cantrill: 01:36:46

Yeah.

Raja Koduri: 01:36:46

Yeah. Gigawatt solar farms. And and their daily wasted, power is 100 megawatts because the grid

Bryan Cantrill: 01:36:51

can't They may have the other problem. They may have the problem that you had, you know, back in the day of, like, hey. We gotta find out a way to spend all these gigawatts coming off of the solar arrays.

Raja Koduri: 01:37:00

Yeah. Yeah. Yeah. Yeah. Because, you know, transmitting power is more expensive than transmitting bits.

Raja Koduri: 01:37:06

Right? So, actually, it makes a lot of sense that this, you know, data center should be collocated with this, you know, solar energy stuff. Right? So I I I think there is still power left, but, you know and that's an entire, you know, 2 hour episode on, energy, oil, coal burning, and all that's happening in US. You know?

Raja Koduri: 01:37:26

The oil power is still cheaper here than, other alternative energy power. Right? But that is, I think, more a US statement than rest of the world statement.

Bryan Cantrill: 01:37:36

Well, that's very interesting. It's just about the the the cost of transporting that bit versus collocating with the solar arrays. Alright. So alright. So we we've got the spare gigawatts, so we can just, like, go with them.

Bryan Cantrill: 01:37:46

But I but these are still I mean, as you point out, like, just the cost from the from the buying the chips, buying the the Yes. This is a very expensive capital outlay. Yeah. Yeah. You do wonder.

Bryan Cantrill: 01:37:58

It's like, boy, is that the right does that make sense? Is that the right kind of economic distribution? Does that is that is that the right use of our scarce resources?

Raja Koduri: 01:38:08

Yeah. Yeah. Yeah. Yeah. I think we need to get it to dollar per watt.

Raja Koduri: 01:38:12

Right? Yeah. Something like that. Right? You know?

Raja Koduri: 01:38:15

Yeah. I think that's a good, goal. Right? A dollar per watt. So that'll be a $1,000,000 for a megawatt.

Raja Koduri: 01:38:22

Right? You know, versus 35. Right? Yeah.

Bryan Cantrill: 01:38:25

Yeah. A dollar per watt. So you mean, like, a dollar per watt, the of the in the capital call. Can't say. How about capital?

Raja Koduri: 01:38:34

Cost should be a a dollar compute capital cost should be a dollar per watt.

Bryan Cantrill: 01:38:40

Interesting. Right. Interesting. Yeah. That is, that's aggressive, isn't it?

Bryan Cantrill: 01:38:46

That's that's aggressive. I mean, you know put a man on it. I'm like I I'm like, boy, that's gonna be a big price break on an oxide rack. Oh, boy.

Raja Koduri: 01:38:57

Yeah. Yeah. I think I don't know if we can do it with just the metal around these things. Right? Forget the No.

Bryan Cantrill: 01:39:02

I don't think I'm I'm yeah. I'm pretty sure that our investors just started throwing up in a trash can. That's an interesting goal to kinda think about. Yeah. No.

Bryan Cantrill: 01:39:15

By by the way,

Raja Koduri: 01:39:15

I I just I I just came up with the goal right now. I didn't think about doing this.

Bryan Cantrill: 01:39:20

You know? Oh, thank god.

Raja Koduri: 01:39:21

Sounds good. Oh my god. Dollar for both.

Bryan Cantrill: 01:39:24

God. You made that up. Thank god. I was I was really concerned that that was gonna be the it'd be I mean, it is interesting. Like, though, it was a, like, I in order to be, you know, right now, we do have this kind of disproportionate amount of CapEx going to compute for this one use case around LOMs.

Bryan Cantrill: 01:39:40

And it it or or at least there's there is a lot of froth and fervor around that. It's unclear how much of that is actually gonna land. I I do think that, like, in terms, I mean, because, you know, Adam, this is something you and I have talked about many times over the years about the need to start thinking about our systems holistically in terms of hardware and software, and in terms of the the actual watts Yeah. Of the the the k w of our of our workloads and the the mic watt of our workloads.

Raja Koduri: 01:40:03

Right. Right. But by the way, Brian, that reminds me, interesting segue. Right? Is, you know, like, I I I recall, like, you know, reading, right, you know, that, railroads were a big deal in what, 19 twenties here?

Raja Koduri: 01:40:15

Like, you know, it's like everybody, like, all the billionaires with railroad tycoons and all that stuff. Right? It feels like the, you know, that this big, GPU infrastructure we're building Right. Might might become the next railroad in America. Right?

Raja Koduri: 01:40:28

And the stuff that, you know, the then then then when once, we build the highways and all, they gather dust. Right? And right now, they are, you know, largely used for, this, you know, slow Amtrak for retirees, you know, having

Bryan Cantrill: 01:40:43

time or something like that. Right? And listen. Yeah. Well, no.

Bryan Cantrill: 01:40:46

That's right. I mean, I think that that there is, I mean and certainly in the the the 19th century, fortunes were won and lost many times over in the railroads and became the I mean, it is interesting when you kind of think about how all that infrastructure I mean, and, you know, was I Adam, I keep thinking back to the episode we had with Simon Wilson, where and, Raj, I don't know if you've listened anything from Simon. He's extraordinary. But we talk because talking about the the pressure on getting similar efficacy with much smaller models and

Raja Koduri: 01:41:18

Mhmm.

Bryan Cantrill: 01:41:19

And how like, that will be really economic fruitful and getting some of these sizes down a little bit, which would I I I think just take some of this pressure. I think it is bad for humanity. I think, Raj, as you were kind of saying that when these things are so expensive and so unobtainable, that is not sustainable in a way that's, like, that's kinda not good for the future. You want, like that there's gonna be the the the future compute use cases are gonna be made because someone's got these elements in front of them. They can experiment with way.

Bryan Cantrill: 01:41:48

And Yeah. We we don't want that to only be via a megawatt, even if that's only a $1,000,000 in your in in Russia world. But, we we wanna have ways for people to experiment with this stuff.

Raja Koduri: 01:42:03

Yeah. Well, this Yeah. I mean, if it if you'd if you take inspiration from a mobile computing right now. Right? I mean, if you just look at right?

Raja Koduri: 01:42:11

You know, even an Apple M1, for example. Right? You know, we know the cost of it. We know the energy efficiency of it and all that stuff. Right?

Raja Koduri: 01:42:18

I mean, I did this math before. Right? You scale up a mobile computing element, right, to a megawatt scale. It's, you know, what we recall is it at least the order you know, it's like, you know, 5 to 10 x better than, you know, a h 100 or a b 100 or a, you know, Sapphire Rapids, you know, type stuff. Right?

Raja Koduri: 01:42:39

So we have existence proof that we could, you know, that that there are these, you know, highly energy efficient things in the mobile world. And, and kind of, you know, highly energy inefficient things in the sub world right now. Right? And, I think, you know, my bet is the disruption is going to come from, you know, scaling up from the mobile world. And whether that happens, you know, and and I think the small models and other stuff is an orthogonal, you know, element there.

Raja Koduri: 01:43:15

But, just just design principles wise. Right? Like, you know, that, these things are unnecessary. I mean, there's so much low hanging fruit, you know, energy efficiency and cost efficiency sitting on this, you know, big honking things.

Adam Leventhal: 01:43:30

It's a great prediction. Just that's sort of where NVIDIA and, had disrupted folks like SGI back in the day. So Yeah.

Raja Koduri: 01:43:37

Yeah. Yeah.

Adam Leventhal: 01:43:38

Yeah. Get them power and volume. Yeah.

Raja Koduri: 01:43:40

Yeah. It comes from small. Right? You know, the the the small stuff. And and we see that actually.

Raja Koduri: 01:43:45

I I, you know, if Apple was not a, you know, what was a Silicon company? Right, his shipping chips. Right. Right? It would be a different discussion altogether.

Raja Koduri: 01:43:55

Right? You know, that, Yeah. You know, put a whole bunch of m ones or m m three alphas. Right? You know, rip apart, from my Mac mini or, Mac Studio.

Raja Koduri: 01:44:06

And, I mean, you can do the calculation. Right? Like, you know, less than 100 or 80 watts, they have 192 gig, 1 terabyte per second. Right? Infrastructure.

Raja Koduri: 01:44:17

Okay? So, you know, 192 gigs is for how much, memory does that, b 200 has.

Bryan Cantrill: 01:44:24

Right.

Raja Koduri: 01:44:25

Right? And, yeah. You know, that, you you can you know? Yeah. It's LPDDR, so, you know, they ganged up to get 1 terabyte per second, but you could see that, there's a path to 2, maybe 4 with, with the low power interfaces straight there.

Raja Koduri: 01:44:43

Right? Eighty watts versus, you know, 1 kilowatt.

Bryan Cantrill: 01:44:47

Or short kilowatt. Yeah.

Raja Koduri: 01:44:48

Yeah. Yeah. Yeah. In fact, you know, that that's the other kind of stuff that I was, you know, doing, you know, doing this math. Right?

Raja Koduri: 01:44:55

You know, in a really claims 2 petaflops. Right? It's 0.7 picojoules per bit, sorry, per flop. So 2 petaflops would consume 4 1400 watts just for flops. So the TDP specs that they give are assuming that you're, you know, less than 50 ports in duty cycle on your compute.

Raja Koduri: 01:45:15

Right? Which, right? So if you actually turn on all the flops, you know, you don't you don't have enough power on that on those sockets. Right? So, you know, you need a 2 kilowatts for, to run it at its, you know, full glory.

Raja Koduri: 01:45:31

Crazy. Yeah. Yeah. It's crazy. Right?

Raja Koduri: 01:45:33

You know, one chip, you know, burning so much. So yeah.

Bryan Cantrill: 01:45:36

Burning so much. Yeah. You're right. And it's gotta be And so you the the disruption happening on the efficiency side of this. Once to your point that you started earlier, but in terms of, like, you know, once these things have, there's gonna be some settling that's gonna happen around efficiency and cost, and that has not happened right now on the GPGPs right now, which is kinda all drags.

Raja Koduri: 01:45:55

Yeah. Yeah. Yeah. Yeah. Yeah.

Raja Koduri: 01:45:57

Yep. And that's going to I mean, it's it's the opportunity is, you know, definitely the dollars are there now. Right? And, Right? So, I think, you know, it's the question of who and when as they always say.

Raja Koduri: 01:46:09

Right? It's not a question of if. It's who and when.

Bryan Cantrill: 01:46:14

That's right. Well, in in Russia, I just love also I mean, there's so much to, I think, to to learn here for folks that are especially folks that are earlier in their career, honestly, about having seen so many boom and bust cycles. And the booms were always a little exuberant. And but I also think that, you know, something that your experience tells us is that the busts were overly pessimistic. And, you know, the ultimately, you know, there there was a new kind of revolution that would come along and would find ways to use these things in different ways.

Bryan Cantrill: 01:46:45

Yeah. And that that has kinda been I mean, that change is the the kinda ever present constant here, which makes

Raja Koduri: 01:46:53

it what You know, the the the amazing thing, though. Right? I mean, again, for, you know, the the those in this, you know, the nerds in this space and, like, you know, the enthusiasts and all that stuff is it's just amazing that, you know, the the 30 years I've been. Right? You know, that performance and performance efficiency.

Raja Koduri: 01:47:11

Right? You know, when I say performance, not just absolute. Right? You know, performance in a given constraint. Right?

Raja Koduri: 01:47:16

You know, given power constraint, cost constraint, and all. You know, it's one constant theme. Right? It never went old. Right?

Raja Koduri: 01:47:24

You know? Right. That right? It's just yeah. Keep giving me more performance within certain, you know, constraints.

Raja Koduri: 01:47:31

And there is always a market. Right? You know, a a friend of mine who worked at Apple, you know, Dave Conroy is his name, amazing guy. Like, he made the statement once at Apple when we were debating, and somebody was, like to say, hey. Why do we need to put more CPU, more GPU?

Raja Koduri: 01:47:45

Right? You know, everything runs great on an iPhone 4 or iPhone 5, I remember, in this stuff. Right? It's like, you know, we already had Angry Birds. Right?

Raja Koduri: 01:47:53

You know, it's you remember the game Angry Birds? It's like peak in Zoom session. Yeah. Who wants more graphics than this stuff? This is, like, the most, like, you know, compelling game, and everybody's playing.

Raja Koduri: 01:48:02

What do you want to do within this stuff? And Dave looked at that room, and he said, like, in the history of computing, anyone who bet against performance died. Right? I don't know how that's been. And, you know, don't ask why, do you need more performance.

Raja Koduri: 01:48:21

Ask the question why not. Right? So you just said, like, you know, don't if you can do more performance, do it, but within the constraints. Right? Which is like because I did

Bryan Cantrill: 01:48:31

it it did not at any price. And it's like the you know? And right now, we are at performance at a high CapEx price and a high power price. And

Raja Koduri: 01:48:40

You know, I just by the way, fascinating, just funny story. I just came back from China. Okay? I went there after 5 years. Okay?

Raja Koduri: 01:48:47

So, you know how, you know, there are all these sanctions going on against China and all. Right? And then then we had to do this lower spec, hitch one number, right, to go to China and all. You know, it's and and they didn't actually reduce the price on it. Right?

Raja Koduri: 01:49:01

You know, it's just lower spec h 100. And the fascinating thing is because it's lower spec, I need more of them to run the same model. Right? Okay. So it actually boosted their sales, right, to crazy.

Bryan Cantrill: 01:49:16

Oh my god.

Raja Koduri: 01:49:18

Right? And this stuff because I need more. Right? I was I was just it's just or worse, but it's just, I was just laughing at the show. It's like, wait.

Raja Koduri: 01:49:27

You're buying more.

Bryan Cantrill: 01:49:29

You're buying more. Right. I don't think I had the intent that yeah, exactly. I think

Raja Koduri: 01:49:32

you only had some unintended side effects. Well, I

Bryan Cantrill: 01:49:35

mean, good for me, I guess. Yeah. Yeah. It's great. Yeah.

Raja Koduri: 01:49:39

They they they they landed on in inside this, you know, beautiful thing where, you know, the the less you you utilized it is, the more you need to buy. Right? You know? So yeah. You know, they landed on a problem that is low utilization per GPU, but it still scales with the number of GPUs.

Raja Koduri: 01:49:59

Right? It's

Bryan Cantrill: 01:50:00

just awesome. Boy, it's good to, yeah, good to be the king, I guess. But, I think we can fully expect that as just as you said, like these things are, we can fully expect that to change and this is gonna be, and I would also add, actually, it's funny you mentioned angry birds. I just love the fact that, I mean, we are, I, I really, like, I am almost certain my 12 year old daughter has never heard of Angry Birds. I know my older boys.

Bryan Cantrill: 01:50:26

I know the I know the 19 year old and the 17 year old have. I don't think the 12 year old has. So I just love the idea that, like, Angry Birds kinda capturing this total moment in time. Right. And the I I also think you know, another thing that's very funny about Angry Birds is that in Mark Andreessen's piece, softwares eating the world in 2011.

Bryan Cantrill: 01:50:46

And that piece, by the way, and, you know, I know he's hailed as a prophet because of that piece, which is kinda ridiculous because it's like great software is relevant. Fine. But the software is important. Okay. I guess that was a a breakthrough in 1011.

Bryan Cantrill: 01:50:58

The actual companies that he cites includes Rovio. Rovio is going to be Rovio is gonna be one of the new kings. Rovio, the maker of Angry Birds. And the it's gonna be Rovio, and it's gonna be living social, and it's gonna be Groupon, and it's gonna be Foursquare. Those are literally the companies that he names in that piece, and does not name NVIDIA or AMD or and and I think what's interesting is NVIDIA I mean, just you mentioned this earlier that NVIDIA really understood that the software hardware codesign, and NVIDIA is very much a software company in addition to a hardware company.

Bryan Cantrill: 01:51:34

And, the the I I think that the the future still lies at that interface. That interface remains a very important interface, and I think that, you know, people should not, don't let anyone tell you that there's no innovation to be at the hardware software interface because, Raj, I'm trying to agree. That that remains a very rich

Raja Koduri: 01:51:56

No. No. It's a it's a it's a hugely rich thing. And also because I think, you know, that's why I, you know, firm believer that there is a disruption in front of us. Right?

Raja Koduri: 01:52:06

Because the the most dominant kind of, you know, programming, abstraction became Python and its associated, libraries. Right? And, you know, I keep kinda, you know, the the last year I've been, you know, digging into it because, I think, you know, what is the future? Right? And there's so much it is Python.

Raja Koduri: 01:52:27

And for us hardware guys, you know, our, notion of abstraction stopped at cc plus plus. Right? So yeah. So there's, definitely, you know, who you know, I think that's my bet that somebody is going to come from the Python side of the world. And Python and memory are what I call, you know, the the the marriage of Python and memory is could be where the next disruption comes from.

Bryan Cantrill: 01:52:53

Interesting. Andrew, well, this has been a great conversation. Roger, thank you so much. This is I I think I think I came in with high expectations, and you over delivered. I love the analogies.

Bryan Cantrill: 01:53:06

I think that we, I think we more than answered the question, and I think given folks a lot to think about in terms of the the the past and present and future of of heterogeneous computing. And, if nothing else, hopefully, folks I mean, Raj, part of this I love talking with you is because you and I share, and Adam, a a a real excitement for not only where we've been, but where we're going. And there's a lot that's still possible out there, which is what makes it interesting and exciting.

Raja Koduri: 01:53:37

Yep. Thank you. Thank thank you for having me. Always, you know, fun to nerd out on this topic. You know?

Bryan Cantrill: 01:53:44

No. It's It is always fun to nerd out. It is always fun to nerd out. And well, again, thank you very much. Really, really appreciate it.

Bryan Cantrill: 01:53:51

And I think we've got we we we've teed up, like, a couple more in-depth episodes we're gonna have to have you back for. So, so we'll not be the last for sure, but thank you very much for joining us. Yep. A lot of fun. And, for listeners, if you got other ideas that, that you wanna hear, please let us know because we, love the requests.

Bryan Cantrill: 01:54:09

And you can play about the audio problems too, but, the lack of intro music. Alright. Well, thanks, everyone. Thanks, Adam. Thanks again, Raja.

More episodes

Chapters

Creators and Guests

What is Oxide and Friends?