A podcast about the business and market of semiconductors
Ben Bajarin (00:01.54)
Hello everyone, welcome to another episode of The Circuit. I am Ben Beharren.
Jay (00:07.178)
Hello world, I'm Jay Goldberg.
Ben Bajarin (00:09.636)
Well, we have an excellent guest today. have Jeremy Werner from Micron and everybody has constantly asked us to have more folks talk memory on this podcast. so your wishes and demands are fulfilled. Jeremy, thanks for coming on. We'll just give a brief, highlight and quick intro on yourself and we'll jump into a discussion.
Jeremy Werner (SVP/GM CDBU) (00:30.83)
That's a lot of lot of pressure if I have to carry everyone's expectations here today I'm Jeremy Werner. I lead the core data center business unit at micron we deliver SSDs and DRAM products to data centers all around the world
Ben Bajarin (00:47.962)
I think our listeners are following closely. there we go. Jay's gonna kick us off with the first sort of question and then we'll roll with it.
Jay (01:03.422)
So I've heard memory is doing well.
Jeremy Werner (SVP/GM CDBU) (01:08.248)
Not bad. Can always do better.
Jay (01:12.273)
Well, so that's my question is, like memories famously, stereotypically, hugely, hugely cyclical. And this seems like you all just have to wake up every morning and pinch yourselves and go, is this really happening? So you've been in memory a long time. Can you help us think through, compare this cycle to past? Is this one different? How is it different?
Jeremy Werner (SVP/GM CDBU) (01:36.975)
Well there's a lot of interesting things going on. I'm a little scared to give you the answer after the way you talked about TSMC's earnings call. But it is different because of AI is driving really a change in the way that memory adds value to the data center. And I'm sure we'll talk a lot more about that today.
Memory has become the key strategic asset in breaking the bottleneck for enabling inference in the data center and also enabling the training of the world's most advanced models. So it's a really incredible time for memory and I don't see that trend slowing down.
Ben Bajarin (02:31.671)
I'm curious, do you guys like were you guys surprised by this or was it like, look, you know, we know, we know what we do at some point, like is going to unlock or you saw like AI and deep learning early on because I feel like a lot of well, obviously a lot of people didn't necessarily see AI coming, but especially like the memory side of this, like what like so I'm just curious, like was there. Yeah, we kind of knew this or was it we caught this early on and
Like what sort of just the context on how this how this evolved.
Jeremy Werner (SVP/GM CDBU) (03:04.568)
Well, I think the whole world was a little bit surprised, right? When the model got advanced enough to really be able to, the LLM model got released and everyone figured out what the capabilities were. So along with everyone, I think we were surprised by the capabilities of the technology taking such a incredible leap at once.
compute and training capabilities got kind of over the hump. once we saw that and I think, know, chat GPT, their launch and seeing those capabilities really, I think, woke the world up a few years ago to what was happening. But for sure, I don't think that we, and I'm sure we didn't anticipate.
just how explosive the demand growth would be. So we knew the technical detail of how important memory and storage were going to be, but we didn't anticipate how fast it grow.
Jay (04:23.506)
I mean, Jensen's actually credited your CEO for being an early advocate. You think you guys, so you guys probably knew about it before. Is that safe to say?
Jeremy Werner (SVP/GM CDBU) (04:39.63)
We knew early on and we've been working on a lot of technologies that support AI from closest to the GPU to farther from the GPU whether it's our HBM, LP, DDR5 and the SOCAMs we're talking about, our high performance SSDs, our high capacity SSDs, so these products take
three to five years plus to be able to develop and productize. So it's not like we woke up one day and decided, hey, let's just throw this thing together. We heard AI is a good idea. And I think Sanjay for a long time has done an amazing job of building an infrastructure that allows us to have a long-term vision, plan for the future, explore technologies, build
huge scale operations, execute on our roadmap, work deeply with our customers, and all of those things come together to enable Micron to enjoy a lot of success in terms of helping to enable an incredible time.
Ben Bajarin (05:57.004)
And I think, one of the things I keep, you know, trying to bring back into the context of this conversation of, of what is different is, you know, it's not just TSMC, right? TSMC has sort of said this, all the foundries, but, you guys as well, and other players in memory are investing in capacity increases that you just would not do if the belief is that this just returns to deep cyclicality and even just the numbers, right? As you know, if Jay and I've sort of, you know, constantly talked about on
on the podcast, right? this, if this semiconductor industry does go to, let's just say it's trillion dollar market this year, or next year, which probably happened for sure. And then probably to two trillion very quickly after that, it doesn't just go back down to a $700 billion market and then up to it. Like the floor has changed, the shape of the industry has changed. Those dollars are now larger in TAM. And that's what everybody is planning, right? Scaling and growing into.
Because the belief is that that is a sustainable, right? AI is sustainable innovation for the entirety of the industry.
Jeremy Werner (SVP/GM CDBU) (07:02.894)
Yeah, it's a sustainable innovation and I truly believe we're just scratching the surface of where we're going with AI. So for the last few years, training these models to be more complex was really the biggest driver of the infrastructure build out in the data center. Of course, there were a lot of users using AI, people using it in interesting ways at work. know, help me.
answer this question faster, edit my document. But now, I mean, just in the last six months, I've had to completely reform what's possible and how fast in terms of using these models. And Jay, I was listening to your podcast and you were talking about running 20.
Claude agents in parallel, right? Or only five when you're on the show, but 20 normally.
Jay (08:04.446)
Yeah, right now have zero, because it's early.
Jeremy Werner (SVP/GM CDBU) (08:07.502)
Okay, you ran out of your token budget or something.
Jay (08:12.008)
I was up late last night, I had a lot running. So yeah.
Jeremy Werner (SVP/GM CDBU) (08:14.094)
Okay, mean now with the introduction of the Gentic AI Individuals are using the inference companies are just starting to figure out how to use it and what's possible with it and physical AI hasn't even begun really in a huge meaningful way and that is that's a roadmap that paves
many years in the future of incredible expansion and transformation of human society. I mean, the world is going to look totally different in 10 or 20 years. I don't know, maybe it'll be like an Asimov novel. You know, I think if you want to look at the future, look at the past, look at what he wrote. I think, you know, obviously I don't think that we're going to have a robot living in the middle of the moon.
Ben Bajarin (09:00.813)
Yeah
Jeremy Werner (SVP/GM CDBU) (09:13.624)
But it does, it is prescient in terms of what's possible from autonomous vehicles, flying autonomous vehicles, robots that do everything, incredibly automated production and manufacturing. mean, just have to, the capabilities of this technology.
Ben Bajarin (09:14.457)
You
Jeremy Werner (SVP/GM CDBU) (09:42.719)
We're just at the beginning of scratching the surface of the transformation that's going to happen in the world. And all of that is going to drive a lot of demand for memory and it's going to generate an incredible amount of data that needs to be accessed rapidly, which bodes well for our SSD products as well.
Ben Bajarin (09:48.909)
Yeah, yeah, absolutely.
Ben Bajarin (10:07.181)
Yes. All right. So let's get into the, to the inference point you made. Right. And I think this is probably the best observation and framing for sort of what's happened. Right. We're coming out of really the training era and all of the infrastructure, silicon decisions, architectures around ASICs and GPUs were largely built right with the training era in mind. And now we're seeing this disaggregate to the inference era. Good examples, right. Is that even Nvidia is starting to talk about
Inference specific products with the grok LPUs, you know, Google launched with TPUs, a dedicated inference ASIC, all things we think will happen. You will see training architecture designs and you'll see inference ones, but the inference one has a very different memory problem. Like there's a memory wall as Google calls it. So, so maybe walk us through the workload and why these inference ASICs or inference accelerators are, are, a different demand on memory.
and how that could just sort of play out for the memory industry.
Jeremy Werner (SVP/GM CDBU) (11:08.216)
Sure, sure. So training uses memory to learn and then forget. And then it spits out a model. But inference uses memory to remember. And inference you can very broadly think of in two kind of key stages. One is this pre-fill stage. And that's where all the prompt tokens are processed. And then you have
Decode stage the reality is a little more complex than that There's a number of decode stages, but in the decode stage each token is iterated to get to a better answer and in the decode stage everything that Happened in the future all the knowledge you really want to be able I'm sorry in the past you really want to be able to Feed that in to get the best result the best answer and that's where a lot of the intelligence comes from so
In the decode stage, that's where of inference that's where the memory wall comes in because there's really kind of two ways to do this. Okay, and so Until now with the traditional architectures There's something called KV cache KV cache is more like a concept which is as you go through the decode stage you Calculate your tokens you save the KV cache in memory
and then you feed it back in and then you calculate the token again and you iterate in that way many times. Now the longer the context window, and for those not familiar with context window, you can think of it as like the more information that you've been putting into your question, think of it the length of your chat for instance, the more iterations that you need. And so,
if you don't have enough memory to store what happened in the past, then you have to start at the beginning and recompute everything. And so if you think of every cycle and recomputing, what that means is that you get an exponential growth in the amount of compute that you need to go through. In other words, every cycle takes as much compute as all the prior cycles combined effectively. Whereas if you're able to save
Jeremy Werner (SVP/GM CDBU) (13:35.993)
the state from the last one each cycle just is it's linear one more so if you don't have enough memory and to store enough context then you're taking a square times as much compute and so having enough memory that's fast enough to feed the compute and to store enough context is not just context also the
size of the models, the number of parameters has grown. So it's gotten, and that's important for having more intelligent model. And the more parameters and more tokens per iteration. And then the number of concurrent users, the Gentic AI running per GPU has grown. So all of that drive an incredible amount of KV cash needed per GPU.
in order to avoid having to go back and recompute everything. And if you're successful at delivering enough memory and storage, then in theory, you can get n squared the amount of compute out of a GPU. Then you otherwise would if you were recomputing every time.
Ben Bajarin (14:51.073)
Mmm. Mmm.
Jay (14:53.447)
Maybe, can we take a step, a step back? Cause we have a fair number of non memory, non-technical listeners. Cause I totally understand it. It's not that I'm completely ignorant about memory, but maybe for those people, could you just walk us through the sort of the, the memory hierarchy writ large? Cause there's lots of different kinds. There's just lots of acronyms.
Jeremy Werner (SVP/GM CDBU) (15:07.394)
No, no.
Jeremy Werner (SVP/GM CDBU) (15:15.438)
Sure. There's a lot going on there. So the GPU or TPU or XPU or whatever you want to call it that does the AI calculations. Very close to that is the memory that's probably gotten the most press. So that's called HBM or high bandwidth memory. And it's used
for training but also for token generation in the inference stack. the typical amount of KV cache that's stored very close to the GPU in HBM is maybe 10 to 100 gigabytes.
If you don't have enough HBM to store everything you need, then the KB cache moves a little bit further away from the GPU. So then it goes out to what we would call main memory. Main memory typically is attached to a CPU. So that could be in like an H100 system, it was oftentimes attached to an Intel or AMD x86 CPU.
If you look at something like from NVIDIA NOW with great Blackwell, that main memory is attached to RACE, the CPU. And the amount of, the size of that memory might be anywhere from 4 to 20 times the size of the HBM that you have on the GPU. So you get more memory there, but it's slower, it's further away.
As you go down the stack, beyond that to date, that's pretty much where KVcache has stopped for inference. If you don't have enough memory there, then you start recomputing. But now that inference is getting so complicated and the context window is growing, people are looking for ways to expand that memory footprint. And so...
Jeremy Werner (SVP/GM CDBU) (17:33.987)
Then moving down the stack, have a number of established or new ideas that are coming into play. So right after main memory, there's a concept called expansion memory. This hasn't been deployed really in any meaningful way in production yet. But the idea is that you would take a lot of memory.
Jay (17:51.423)
you
Jeremy Werner (SVP/GM CDBU) (18:01.592)
typically, let's say, very high capacity DIMM modules, and you would connect them maybe through optics in a separate box connected to all the GPUs. And so if you run out of main memory, you can go there for some very fast storage. If you don't have enough there, and this is one of the most exciting parts that Jensen talked about this year,
then you go into what's called context memory storage. And that's where people start using SSDs to store more. So now when you get to context memory storage, might be, your latency's longer, your bandwidth is less, but you're getting a thousand times the amount of capacity that you have, say, in HBM.
Ben Bajarin (19:00.024)
Mmm.
Jeremy Werner (SVP/GM CDBU) (19:00.834)
And then finally at the very, bottom, you just have exabytes of network data lakes in the data center of these massive SSDs. So there's this whole hierarchy of where we're going with this.
Jay (19:19.948)
And so when we see people today complaining or up against constraints, what are we talking about today where we're seeing the the tightest of all this? Like what's really the bottleneck at this moment?
Jeremy Werner (SVP/GM CDBU) (19:38.2)
from a technology perspective or from a production perspective or what's...
Jay (19:44.716)
Well, yeah, where is the pain most acute today?
Jeremy Werner (SVP/GM CDBU) (19:52.055)
I mean both DRAM and SSD all through the stack I think you see people want more, people need more. As soon as we can release product they'll consume it, as soon as we increase the capacity and the performance of the products they'll find a way to deploy it. So, it truly is.
Ben Bajarin (20:21.923)
So the thing I think that boggles my mind in all of this, right, just coming from training and now moving to inference is, in a training solution, if you're a frontier lab or you're a hyperscaler or somebody, it's a small number of people who are just throwing training workloads at X, Y, and Z rack of compute, right? And to your point, that happened very distinctly in memory. The goal was just train so that I've got an output.
that I can then go serve where this like really boggles my mind. And I'm sort of setting this up for like, how does, how does, what specifics of this memory hierarchy helps solve this problem is we're about to move to tens of millions, eventually hundreds of millions, and then billions of people attempting to use context windows at the exact same time. Like we're talking many, many people simultaneously needing massive context windows.
to go do the things they're going to do. And I get like, this is why like with TPU 8i, right? We're building very different architectures to do this. But to me, it's like, okay, I've got, you know, a rack full of chips and there's this memory. And then as you pointed out, we might have memory appliances. Sometimes I just feel like though, like even that feels hard to solve this problem where, you know, tens of millions, billions of people.
literally are just going to need everything they're interacting with to have this crazy amount of memory about what they're doing. just like, where is like, what helps solve that in the memory hierarchy or what is the demand on that kind of simultaneous hitting of all of these people for, their workloads.
Jeremy Werner (SVP/GM CDBU) (22:10.066)
So, speed, right? If the bottleneck is not flops and it's really memory bandwidth, then we need to increase the bandwidth that all of that history can make it to the GPU. So, a lot of it comes down to speed. That's why we've been innovating really rapidly in
for instance, our HBM products. We just announced the HBM4 product. It's over two times the bandwidth of our HBM3e product, which was cutting edge a year before. And of course, when you deliver all that performance, you do start to run into other.
Ben Bajarin (22:41.849)
Mm.
Ben Bajarin (22:54.616)
You
Jeremy Werner (SVP/GM CDBU) (23:05.838)
bottlenecks, especially at the data center level. And this is where I think most people's heads have been at for a long time on the AI rollout, which is power. Can you get enough power to drive all this compute capacity? And can you make the most out of the compute capacity? Can you make the most out of the power? So if you deliver twice as much
Ben Bajarin (23:16.654)
Right.
Jeremy Werner (SVP/GM CDBU) (23:32.751)
performance but you use twice as much power and you have a fixed amount of power, then you don't really deliver more to users. Right? So really what it's about is delivering, if ultimately the amount of power that we have is limiting the growth, then we have to find a way at the data center level to deliver much more efficient performance.
in a fixed power budget. And that's where so much of our innovation comes in. So it's delivering more performance while finding ways to do it without scaling the power consumption equivalently. you know, it's always, look, in memory, people talk about bandwidth a lot, just kind of like how much data is moving at what rate.
the reality is that it's getting more complex than that and that's where there's some really interesting dynamic shifts that are happening in the stack with Micron because in order to become really truly more efficient in your power use you have to look at
into the depths of what's happening and how the inference is being processed. And when you look deeply into it, it drives a significant amount of code design requirement and opportunity. Or where do you do certain things in the GPU, on the memory side, and at different layers of the stack. So that is really...
Ben Bajarin (25:22.744)
Mm.
Jeremy Werner (SVP/GM CDBU) (25:30.71)
a necessity for driving innovation in area that is one of the other kind of transformative elements of the memory industry at this time.
Jay (25:48.908)
Let me ask a question. Something that sort of piqued my interest when we were preparing for this earlier. And you've touched on it here. I get the demand for memory, things closer to the GPU, as much bandwidth as possible for the calculation. But it sounds like there's also big shortages in storage.
writ large, not just on the compute, not on the GPU tray, but sort of across the whole data center ecosystem. Why, why are we seeing that?
Jeremy Werner (SVP/GM CDBU) (26:27.15)
Sure. AI.
It does a few things. First of all, it generates a lot of data itself. Anyone who uses Grok or is on X will know how much image generation is happening with AI. Much faster than even the most adept meme creator would be able to create images. And all of that
data, you know, I always like to say that most people are digital hoarders. We tend not to throw our bits away. Right? maybe I'm going to in 10 years. And the interesting thing is that not only is all of these AI models enabling all of us who maybe
Jay (27:14.635)
Guilty. Guilty. Guilty as charged.
Ben Bajarin (27:15.171)
Mm. Mm.
Jeremy Werner (SVP/GM CDBU) (27:31.545)
didn't have the physical capability to bring our creative ideas to fruition. But now we have this incredibly powerful tool to make our thoughts reality, or at least digital reality. It makes so many, it is a creative revolution for people. we're creating a lot more data, and then businesses as well.
are able to create and use their data. And one of the most important things for getting the most out of AI is actually having all of your data in a place that's accessible. And so, AI, doesn't just create a lot of data, but it accesses that data to provide insights, solve problems, give better answers, like you said, Jay.
Ben Bajarin (28:01.495)
Right.
Jeremy Werner (SVP/GM CDBU) (28:30.73)
And so what that means when you need to access all that data is we call it warming, warming of the data. So, you know, we have this term in storage. So sometimes we call some data hot and some data cold. And hot just means someone's likely to come look at this thing soon. And cold is, hopefully your tax returns from 10 years ago. No one wants to see that, right? So, but with AI,
Ben Bajarin (28:54.435)
You
Jeremy Werner (SVP/GM CDBU) (29:00.834)
you ask it a question, it's looking through everything to try to get an answer. So the things that used to be cold, they're getting warmer. Everything's getting warmer. And when things get warmer, then you need a lot faster storage because you're accessing it a lot more.
Jay (29:21.236)
Yeah, I mean
Jeremy Werner (SVP/GM CDBU) (29:21.262)
That's the past. And there is kind of this exciting future growth as well, which is we're out of, we don't have enough memory for all the KVCache. And so now there's a big future growth of SSDs in the data center to store query scheduling and all the multi-turn workflows that in the past.
Ben Bajarin (29:21.561)
All this.
Jeremy Werner (SVP/GM CDBU) (29:49.287)
If we tried to do it with today's architecture, you'd have to go through that recompute cycle.
Ben Bajarin (29:55.321)
Mm.
Jay (29:57.259)
Yeah, I'll note that in a lot of the AI stuff I'm playing with, I'm spending a lot of time reading up on different harnesses. Do I use open claw? There's Hermes. There's all these new things coming out all the time. And a big part of that is load balancing across models. But an even more important part of it is creating persistent memory for the AI.
Because one of the big problems now is you use an AI agent and it doesn't remember from session to session what you're doing.
Jeremy Werner (SVP/GM CDBU) (30:29.614)
Wouldn't that drive you crazy? It's like teaching a class of students and every day you have to start from the first page of the text.
Jay (30:37.342)
Yeah. Yeah. And so so many of the tools out there that people are saying, you need to use this, you need to use that. It all comes down to creating a file structure that gives your AI agent memory. There are all hacks around that. It's super interesting.
Jeremy Werner (SVP/GM CDBU) (30:56.236)
And so what'll happen is, you know, if you're giving it memory, then your history's sitting there in your memory. If you go away and do something for a week, either you have to leave it there, and if you do a lot of things, then you're run out, or you can send it down to SSD and then bring it back when you pick back up on the task.
Jay (31:23.97)
Yeah, yeah, yeah. So I have like multiple tasks and projects in the sidebar of Cloud Coworks, And I'm very deliberate now of like, I'm doing this thing. I need to switch to the right project or the right context window because I don't want to blend them together. And all those things have to be stored discreetly. And then I'm worried about like each one of those at some point is going to max out the context window. And
constantly trying to juggle that around. And I have conversations with people, all this week I've been hearing, you got to use this tool or that tool or superpowers or whatever. And I think where this will lead is someone is going to create a harness that manages this a little bit more less bespoke to sort of automatically manages it. And everyone will love that. It'll be fantastic because it'll be so much simpler to use. But it's going to come at the cost of lots of extra memory.
Right? Because right now I can sort of manage it on a bespoke basis and I'm very efficient, but I very, very gladly give up that bespoke-ness just to have something that manages it. And I know that will come at the cost of me needing a lot more memory. It'll just be less efficient.
Jeremy Werner (SVP/GM CDBU) (32:38.958)
So, end.
Jeremy Werner (SVP/GM CDBU) (32:42.776)
So what we find is the context length, what you're talking about, is growing right now at 30 times per year.
Ben Bajarin (32:58.553)
Jeez. The other thing that boggles my mind about this conversation we're having is right now, my usage with AI, and let's say that's a chat with ChatGPT or with Claude, is really fragmented memory with fragmented storage. It's only looking at a folder. To your terminology, what's warm and what's cold?
Jay (33:00.554)
That sounds like a challenge.
Ben Bajarin (33:27.545)
It's context related. Like I don't have the entirety of my data available to me. And I recently learned that I can't live on a computer without a tire about of storage because that's how much stuff I have on my device. And it's mostly cold, right? In an AI context. Like I would love to be able to just be like, Hey, what was this? Is this go find it? But it just can't do that yet. And so, like you said, when enterprises bring all of their data, right? Online, both with access to the edge.
Like this feels like even another step function of AI's value when you really do have all of that data available to you, which is again, simultaneously a memory and a storage problem that we still have to solve a lot of problems for.
Jeremy Werner (SVP/GM CDBU) (34:13.23)
So along those lines for enterprise scale data and talking about the power consumption.
One of the new products that we just announced is a 245 terabyte SSD. This thing is barely bigger than a deck of cards.
So when we introduce a product like that, we're able to massively bring down the amount of power that's going into storage at the data center enterprise scale and also reduce the footprint of that storage by over 80%. So that allows
Ben Bajarin (35:09.497)
Mm.
Jeremy Werner (SVP/GM CDBU) (35:12.494)
You know, with one of the big bottlenecks being power, with one of the big bottlenecks being the square footage of data centers, getting data closer to the GPU. That is one of the big future trends, unlocking the capabilities of the data center, giving more performance, more storage, closer to the GPU in a...
Ben Bajarin (35:19.789)
Right.
Jeremy Werner (SVP/GM CDBU) (35:39.596)
lower power footprint. So that's one of the things I'm really excited about in kind of networked file and object storage.
Ben Bajarin (35:48.206)
Hmm. On the PowerPoint, is that just an innovation that happened in storage or is it because you can put so much more in one footprint versus to having to divide that up across two or three different spots? Like maybe just talk a little bit about like what goes on there from a power stand, because I think that's, it's the most important point, right? When you look at what I think everybody is sort of consensus consensually bought into
with Jensen is essentially like, if you have a power budget, you want the most compute and this includes all the infrastructure we're talking about within that. So obviously any innovation in, bringing down costs for power per power matters. So maybe just talk a little bit about like, what is it exactly about this? That's leading to some of that better.
Jeremy Werner (SVP/GM CDBU) (36:34.776)
Yeah, sure. So SSDs are inherently a lot more power efficient in terms of delivering performance than a hard drive. Like from a read perspective, you're getting, you can get a thousand times the read performance depending on your workload. there's this inherent, there's no moving parts in an SSD power benefit just right off the bat. But then another,
huge benefit when you think about the power consumption is what it takes to deliver all that capacity in the data center. So like I said, 245 terabytes in a space that's about a quarter of a hard drive. And the hard drive capacities that are being deployed today
are in the low 30 terabyte range. So that means that you need a lot less networking, you need a lot less connectivity, you need a lot less boxes and power supplies and fans and everything that goes around deploying.
10 times the amount of stuff.
Ben Bajarin (37:58.714)
Hmm.
Jeremy Werner (SVP/GM CDBU) (38:01.166)
has a real cost, has a real power consumption. you just eliminate all that waste, you consolidate down, and now you're just paying for the performance that you need, and that performance is delivered at a much more efficient gigabyte per watt.
Ben Bajarin (38:17.197)
Right. Yeah. I like gigabyte per watt actually as a, as a model framing, to the analysis that's, that's interesting. So, so you guys have done this, like, let's talk a little bit about more of sort of the innovations that, that you guys are working on across the stack. And, and, and I want to sort of look at this like two ways. Well, one, as we talked a little about historically, like, do you, you, do you feel
the cadence in innovation for memory, like there's pressure to increase that. Cause I had always been under the expectation and understanding. you can correct me I'm wrong, this is sort of the memory was historically and maybe to some degree storage a little bit more conservative in pushing the limits, right? Because so much had to go, right? Be consistent. You couldn't have failure rates right on wafers. Now that doesn't go away. I'm not saying that all of a sudden you become logic and you're like, sure, we're happy with 50 % yield. Like, no, that doesn't happen.
But I feel like this is one of those moments where we feel we need to put our foot on the gas. And I'm just curious, as you guys think about this and areas where you can go solve problems. And again, coming from what I think must be also super interesting is we're in a period of time where to be an engineer, especially at a company that's always is one of those, there's problems all over the place, just a giant map board of problems.
that we have to go stall, particularly too for memory and storage. So just how do you guys think about that, the problem at large, accelerating innovation, and just some of those areas you guys are focusing on really across the whole stack?
Jeremy Werner (SVP/GM CDBU) (39:58.827)
It is accelerating. Our timelines are accelerating. The pace that we need to innovate is accelerating. The amount of intelligence we need to build into our products is accelerating. The complexity is accelerating. The rate that we're building mega fabs is accelerating. And it's a real challenge, but it's exciting. I mean, who doesn't want to come to work every day?
and at the end of the day look at the clock and say, wow, like, I barely breathed today. And we're running at an incredible pace and I think it's fun and it's energizing for people at Micron and honestly, one of the things that we're doing is embracing the AI technology. So how do we go faster? Well, there's this incredible tool that's been
I don't want to say dropped in our laps because we helped build it. But that we've helped create, had a meaningful role in creating this incredible technology and we're using it to accelerate what we can do and run faster, improve yields faster, design and develop faster, find issues faster. You know, so all those traditional problems that
We face they still exist now. We just have to do everything faster better more efficiently turn chips faster Go through process technology faster ramp and install tools and more fabs around the world faster faster faster faster and You know, there's only seven days a week. So at some point You have to find ways to innovate and we've always done it
Now AI is an amazing tool that I think we're doing some really interesting things with to take us to the next level.
Jay (42:01.117)
Ben Bajarin (42:07.917)
Yeah. And on that point, I imagine like you did talk about the co-design that's happening. can imagine like customers are like, guys, we need you to do this. Like, can you do this? And you're like, like just engaged in helping solve their problems two, three years away in kind of that deep co-optimization with both memory and storage and Micron alongside their partners there.
Jeremy Werner (SVP/GM CDBU) (42:31.512)
Yeah, the depth of engagement. We've always had a really solid, deep, technical engagement with our customers. And one of the interesting things about what we do is we kind of have to work with everyone from software providers to CPU and GPU hardware designers, the fab technology, the system builders, the data center builders. So we get to work with everybody up and down the stack.
depth of the engagement with what's happened now is beyond anything that we've ever seen before. so that deep, deep co-design is really something that I think, you know, coming back to the beginning of the discussion, is different this time for a
Jay (43:32.798)
So.
Jay (43:36.426)
That's a lot. That's a lot. What do you think people, the market in general,
Jeremy Werner (SVP/GM CDBU) (43:40.918)
You guys don't have memory guys on right? We're going too deep.
Ben Bajarin (43:44.427)
No, no, no, this is perfect. This is perfect.
Jay (43:45.018)
it's perfect,
Jay (43:50.314)
What are we still missing? What do think the market still misunderstands about what's happening?
Jeremy Werner (SVP/GM CDBU) (44:00.802)
I think.
People are seeing the increase in capex from the big CSPs and other data center companies and they're worried about where they're going and it's sustainable. And I think that
those businesses are also going through a tremendous revolution that, you know, I mentioned earlier, is gonna transform human society in so many ways, is gonna solve so many problems. People without access to medicine or surgeons are gonna be able to get it. we're gonna innovate at a much faster rate. We're gonna be able to automate, you know, production.
that's gonna raise the quality of life for billions of people around the globe. I think...
Jeremy Werner (SVP/GM CDBU) (45:14.526)
not all is obvious necessarily at all times of exactly when and how does the monetization of these models translate into revenue streams and you know are they spending more than their capability and i would say no i think the the potential is still greater than what most people can imagine
It's funny because in Silicon Valley, feel people are so excited about this. It's easy to get in your kind of bubble here, where everyone really kind of probably understands the depth of the technology and they hear about all these cool things that people are working on. When I talk to a lot of my friends who aren't necessarily in the industry or in other industries,
Ben Bajarin (45:52.003)
Mm-hmm.
Ben Bajarin (45:55.853)
Right.
Jeremy Werner (SVP/GM CDBU) (46:12.824)
You know, a few of them are saying, wow, I can completely do incredible things in my industry in AI. And they are innovating. I have a friend who's doing some incredible stuff in an industry. I can't say too much about it. But other friends, they see what's happening on chat GPT, LLM. They're not necessarily.
they see what happened on stock market, they're not in tune with the potential of what's going to happen over the next 20 years. I fully believe
Jay (46:56.531)
So let me ask the flip side of that question. I think we would all agree there's something here. So the broad number of people in the world don't quite see it yet. So that implies that over time they'll catch up. They'll start getting more informed. They'll realize chat GPT is more than just coming up with silly memes. You can do real work with it.
Demand keeps accelerating, but in the near term, are we gonna be able to keep up? Let me be more specific, is the memory industry gonna be able to keep up? Or do we get to a point somewhere next year where we're just like, that's it guys. We've built everything we could build. The new fab isn't ready yet. Just hang out for six months until we get there.
Ben Bajarin (47:38.772)
Jeremy Werner (SVP/GM CDBU) (47:47.407)
Well, we're already there. From a keep up from a production perspective, yeah, unfortunately, we didn't build enough fab in the world. And turns out building fabs is not the easiest thing. I'll give you some context. We're building five fabs around the world right now.
Ben Bajarin (48:07.597)
Hmm.
Jeremy Werner (SVP/GM CDBU) (48:17.368)
we've announced in Boise, Idaho, we are building a 600,000 square foot clean room. We've announced construction in upstate New York. So we're bringing memory to the U.S., augmenting our memory fab in Virginia. And that's gonna make the U.S. a huge producer of memory. And each of those
fabs in Boise and New York and we have you know plans beyond the first one as soon as we finish one we start on the next one each of those is the size of 10 football fields that's how big the factory floor is just to give you some contact and then we're building fabs we have announced groundbreaking on the Nan Fab in Singapore to expand our production there we've announced
expansion of our DRAMP facilities as well in Japan. We just bought a fab from PSMC in Taiwan. So we're doing a lot of construction to bring things online. So right now the industry is construction, clean room space limited. And it'll probably be that way.
Ben Bajarin (49:41.176)
Yeah.
Jeremy Werner (SVP/GM CDBU) (49:45.273)
for a while when we talk about being able to meet demand. That is the biggest challenge that we have. We do a lot of other things to push out more technology, running more efficiently. We move to newer technology that gives us more bits per square foot of fab. But ultimately, we've already
Ben Bajarin (50:11.555)
Mm.
Jeremy Werner (SVP/GM CDBU) (50:15.032)
failed to keep up with demand.
Ben Bajarin (50:16.921)
As everybody else has, right? You've got, you know, Intel on earnings. You've got a, you got Nvidia, you got TSMC going like, look guys, like we're at, we're at capacity. Like fabs don't grow on trees. Like you, like you just pointed out. And, but I, know, I think just, just wrapping on the whole conversation as a whole, like what's interesting is, one, how much has changed, right? So like even a year ago, we weren't having these same conversations and there's a good chance that a year from now.
Right? All of this might be very, very different. None of it changes the demands on compute, but all of these things that we're going to solve problems for and sort of even just coming back to like who's adopting right AI, it's just going to get more capable because of more compute, more memory, more storage. You're going to be able to do a whole lot more. Right. Then you, I like this. I forget who said this, but somebody, you know, I like on Twitter said like that. What are you using today is the dumbest AI is ever going to be. And it feels pretty darn smart.
You know, so it's like all of this and like, you know, problems you guys are solving are part of what makes this more useful, more valuable. And the economics come back to the hyper scale. Like they monetize it as it gets better, right? As more people can use it and find value and, and, memory and storage is a huge part of that.
Jeremy Werner (SVP/GM CDBU) (51:32.93)
Yeah. Well said.
Ben Bajarin (51:34.307)
Yeah, thank you. Well, Jeremy, we really appreciate your time and for coming on the circuit. You're a welcomed guest anytime you wanna come in and talk memory. Thanks for listening everybody. We hope you found this insightful and we will talk to you next week.
Jay (51:52.085)
Thank you everybody. Tell your friends and your agents.