1
00:00:00,020 --> 00:00:02,920
Josh:
A couple of weeks ago, we covered the Claude Mythos release,

2
00:00:03,100 --> 00:00:06,140
Josh:
the model that found decade old security flaws overnight and scared the hell

3
00:00:06,140 --> 00:00:08,620
Josh:
out of basically anyone who is following the AI story.

4
00:00:08,860 --> 00:00:11,460
Josh:
So much so that the federal government is involved. But the part that we didn't

5
00:00:11,460 --> 00:00:14,780
Josh:
get into is the backend that powered this model.

6
00:00:14,960 --> 00:00:19,280
Josh:
Mythos was built on a chip from March, 2024 that Jensen pulled out of his pocket

7
00:00:19,280 --> 00:00:21,940
Josh:
on stage at GTC, which was the Blackwell chip.

8
00:00:22,120 --> 00:00:25,460
Josh:
It had 208 billion transistors. Everyone treated it like the future had arrived.

9
00:00:25,580 --> 00:00:29,820
Josh:
And yet it took two years of fabrication for us to get the first manifestation

10
00:00:29,820 --> 00:00:31,720
Josh:
of that, which is Claude Mythos.

11
00:00:32,100 --> 00:00:35,380
Josh:
24 models from Keynote to a working model. It happened with Hopper,

12
00:00:35,500 --> 00:00:39,160
Josh:
it happened again with Blackwell, and it's going to happen again with our future models.

13
00:00:39,260 --> 00:00:43,460
Josh:
But the difference is we have a series of future models that exist today that

14
00:00:43,460 --> 00:00:47,040
Josh:
we can kind of map out to where we're going to be heading based on this trajectory

15
00:00:47,040 --> 00:00:48,280
Josh:
that we've seen with the previous chips.

16
00:00:48,440 --> 00:00:51,880
Josh:
And it's pretty awe-inspiring to see where we are going to go considering there

17
00:00:51,880 --> 00:00:55,880
Josh:
are three generations of chips that have already been announced since Blackwell.

18
00:00:55,960 --> 00:00:58,600
Josh:
We have Vera Rubin, Rubin Ultra, and Feynman.

19
00:00:58,960 --> 00:01:02,440
Josh:
Each one, many multiples more powerful than the last. And when you look at what

20
00:01:02,440 --> 00:01:06,280
Josh:
Blackwell already produced in the very first version, it gets impossible to

21
00:01:06,280 --> 00:01:10,420
Josh:
imagine a world where we don't reach AGI on hardware that's already been designed.

22
00:01:10,840 --> 00:01:14,220
Josh:
Everything that's been announced that is going into production almost certainly

23
00:01:14,220 --> 00:01:17,420
Josh:
is going to produce models indistinguishable from AGI. At least that's what

24
00:01:17,420 --> 00:01:18,540
Josh:
it seems like on surface level?

25
00:01:18,800 --> 00:01:26,400
Ejaaz:
Yeah, so the story here in a single sentence is AGI, like AI models, are already here.

26
00:01:26,920 --> 00:01:31,460
Ejaaz:
We just haven't distributed it because we haven't powered up the GPUs that enable

27
00:01:31,460 --> 00:01:34,140
Ejaaz:
it. So everyone is obsessed with AI models.

28
00:01:34,320 --> 00:01:36,940
Ejaaz:
We talk about our favorite models, how we prompt them, how intelligent they

29
00:01:36,940 --> 00:01:39,380
Ejaaz:
are. But very few people are talking about the fact that

30
00:01:39,760 --> 00:01:42,880
Ejaaz:
The hardware is the thing that powers these things. They train these things.

31
00:01:43,000 --> 00:01:43,740
Ejaaz:
They inference these things.

32
00:01:43,900 --> 00:01:48,440
Ejaaz:
And it's still about 70% of the influence of how intelligent your model is.

33
00:01:48,660 --> 00:01:52,440
Ejaaz:
And the prime example, most recent example of that has been Anthropics Mythos

34
00:01:52,440 --> 00:01:55,680
Ejaaz:
release, right? You just mentioned it. It's discovered a bunch of different cybersecurity flaws.

35
00:01:55,880 --> 00:02:00,160
Ejaaz:
It is this all being powerful thing that the governments around the world, including the U.S.

36
00:02:00,260 --> 00:02:03,080
Ejaaz:
Government, Federal Reserve, they're sharing meetings with the top banks to

37
00:02:03,080 --> 00:02:06,220
Ejaaz:
talk about the craziness of this model we must prepare.

38
00:02:06,380 --> 00:02:08,760
Ejaaz:
There's a lot of doomer news out there in the future.

39
00:02:09,760 --> 00:02:14,320
Ejaaz:
Little do you know that this was powered by a GPU or this was trained by a GPU

40
00:02:14,320 --> 00:02:19,700
Ejaaz:
that was built 20 months ago. So we're talking about almost two years ago.

41
00:02:19,860 --> 00:02:23,920
Ejaaz:
It's called Blackwell. And I want to give you guys an idea of the timeline of what this looked like.

42
00:02:24,060 --> 00:02:28,760
Ejaaz:
So in March 2024, NVIDIA GTC, which is like their developer conference,

43
00:02:28,920 --> 00:02:33,360
Ejaaz:
Jensen Huang comes on stage and he presents this gargantuan scrap of metal.

44
00:02:33,560 --> 00:02:37,120
Ejaaz:
It looks very pretty, by the way. And he goes, this is Blackwell,

45
00:02:37,320 --> 00:02:40,620
Ejaaz:
GB200, GB300, a brand new GPU.

46
00:02:40,860 --> 00:02:44,080
Ejaaz:
We can train frontier models on it. Everyone gets so excited.

47
00:02:44,240 --> 00:02:46,960
Ejaaz:
Their stock price absolutely ascends, right?

48
00:02:47,540 --> 00:02:52,360
Ejaaz:
The thing is, people couldn't get their hands on this until exactly a year later.

49
00:02:52,580 --> 00:02:56,760
Ejaaz:
So to give you guys an idea of the timeline, he announces it in March 2024.

50
00:02:56,980 --> 00:03:00,420
Ejaaz:
Then by the middle of the year, they discover there's like a bit of a design

51
00:03:00,420 --> 00:03:01,760
Ejaaz:
flaw and they amend that.

52
00:03:02,100 --> 00:03:07,140
Ejaaz:
And then by the end of 2024, early 2025, they start shipping these units of

53
00:03:07,140 --> 00:03:10,560
Ejaaz:
Blackwell GPUs out to the top frontier AI labs.

54
00:03:10,680 --> 00:03:15,280
Ejaaz:
But there's an important nuance here, which is it's just the GPU sitting in a data center.

55
00:03:15,420 --> 00:03:20,400
Ejaaz:
They aren't actually powered up. It's not until 6 to 12 months after that fact

56
00:03:20,400 --> 00:03:22,700
Ejaaz:
that these GPUs were finally powered up,

57
00:03:23,000 --> 00:03:27,860
Ejaaz:
used to train models, which is why we now start to see these new AGI-like models

58
00:03:27,860 --> 00:03:31,140
Ejaaz:
like OpenAI SPUD and Claude Mythos come to fruition.

59
00:03:31,280 --> 00:03:36,420
Ejaaz:
So the point is, there is a long gap between the frontier GPUs being announced

60
00:03:36,420 --> 00:03:40,000
Ejaaz:
and rolled out to them actually being powered to train the models.

61
00:03:40,040 --> 00:03:42,960
Ejaaz:
We talked about Elon Musk and XAR a lot on this show before.

62
00:03:42,960 --> 00:03:47,860
Ejaaz:
They actually have the largest arsenal of these Blackwell GPUs.

63
00:03:47,960 --> 00:03:49,120
Ejaaz:
They bought about a million of them.

64
00:03:49,380 --> 00:03:53,580
Ejaaz:
The crazy part about this now is they're not like one, two, but three new NVIDIA

65
00:03:53,580 --> 00:03:56,800
Ejaaz:
GPU models that have been announced in the recent NVIDIA GTC.

66
00:03:57,100 --> 00:04:03,140
Ejaaz:
So there is a major lag between Frontier hardware and the new AI models that are being released.

67
00:04:03,360 --> 00:04:05,640
Ejaaz:
And people don't understand this. And we want to tell you the story.

68
00:04:05,960 --> 00:04:11,220
Josh:
You just remember GPT-4, how long ago that was and how that felt like the huge,

69
00:04:11,220 --> 00:04:13,420
Josh:
most pivotal model that OpenAI ever released.

70
00:04:13,520 --> 00:04:17,120
Josh:
I mean, that was the big one right after ChatGPT came out. That was trained

71
00:04:17,120 --> 00:04:20,600
Josh:
using the Hopper chips. You know, the most recent model.

72
00:04:20,740 --> 00:04:22,680
Ejaaz:
Hopper's a word I haven't heard in a while, Josh.

73
00:04:22,860 --> 00:04:26,640
Josh:
Yeah, well, you know, GPT 5.4, the most recent model that we're using every

74
00:04:26,640 --> 00:04:30,480
Josh:
single day on ChatGPT. That was also trained on Hopper chips.

75
00:04:30,820 --> 00:04:35,540
Josh:
The same chips are training models from GPT-4 to GPT-5.4.

76
00:04:35,620 --> 00:04:40,840
Josh:
And it's a testament to how the efficiency gains of software can actually increase

77
00:04:40,840 --> 00:04:41,920
Josh:
the throughput of hardware.

78
00:04:42,160 --> 00:04:47,520
Josh:
And I think I want to use that as an example because what we just got recently

79
00:04:47,520 --> 00:04:52,300
Josh:
with Mythos through Anthropic, that seems to be the first real implementation process.

80
00:04:52,630 --> 00:04:56,110
Josh:
Of a true Blackwell model. And rumors are that SPUD, the new open AI model,

81
00:04:56,210 --> 00:05:00,590
Josh:
is going to kind of be the same in terms of power that is coming as it relates

82
00:05:00,590 --> 00:05:01,670
Josh:
to the first Blackwell model.

83
00:05:01,810 --> 00:05:06,110
Josh:
And even if we don't actually iterate on the hardware, the amount of progress

84
00:05:06,110 --> 00:05:10,210
Josh:
we're going to get from Blackwell models alone seems like it is going to be

85
00:05:10,210 --> 00:05:13,190
Josh:
difficult to imagine it doesn't become some sort of an AGI, right?

86
00:05:13,350 --> 00:05:16,490
Josh:
It's like when you think about the difference of intelligence between GPT-4

87
00:05:16,490 --> 00:05:22,470
Josh:
and GPT-5.4 and how far we've come, that applied to Blackwell at this new scale,

88
00:05:22,630 --> 00:05:25,450
Josh:
seems crazy but that's not even the crazy part because

89
00:05:25,450 --> 00:05:28,310
Josh:
we have an entire roadmap of these three generations of

90
00:05:28,310 --> 00:05:31,250
Josh:
chips that are coming that we can very clearly map to

91
00:05:31,250 --> 00:05:34,090
Josh:
the gains that we're going to see and i think that's when things

92
00:05:34,090 --> 00:05:37,010
Josh:
get like particularly disturbing because on the chart that we're looking on

93
00:05:37,010 --> 00:05:41,090
Josh:
screen now we have blackwell that's where we are right now blackwell is a significant

94
00:05:41,090 --> 00:05:46,270
Josh:
improvement over the previous model but then we have vera rubin which jumps

95
00:05:46,270 --> 00:05:50,670
Josh:
from 20 petaflops to 50 petaflops that's a two and a half to five times multiple

96
00:05:50,670 --> 00:05:53,310
Josh:
on the compute then we We have Ruben Ultra,

97
00:05:53,610 --> 00:05:56,450
Josh:
which is scheduled for the second half of 2027.

98
00:05:56,450 --> 00:05:59,050
Josh:
That is a 14 times multiple.

99
00:05:59,350 --> 00:06:05,770
Josh:
And then we have Feynman in 2028, which is an estimated 30 to 50 times multiple.

100
00:06:06,300 --> 00:06:11,780
Josh:
On the current chip stack that we have today, assuming that we get no software progress at all.

101
00:06:12,060 --> 00:06:16,820
Josh:
And what we saw with the Hopper chips is that we got a tremendous amount of

102
00:06:16,820 --> 00:06:18,220
Josh:
progress just from software.

103
00:06:18,340 --> 00:06:23,000
Josh:
So when you combine this 30 to 50 times multiple with a maybe another 100 times

104
00:06:23,000 --> 00:06:25,600
Josh:
multiple on software, if we make another breakthrough, we're looking at some

105
00:06:25,600 --> 00:06:30,180
Josh:
pretty insane improvements here that like are really hard to wrap your head around.

106
00:06:30,380 --> 00:06:33,980
Ejaaz:
I want to point out that these improvements, these multiples that you just mentioned

107
00:06:33,980 --> 00:06:39,360
Ejaaz:
are just on the speed and power of these hardware modules, right?

108
00:06:39,740 --> 00:06:44,120
Ejaaz:
So it's going to work 3x harder or 14x harder, but it's also going to cost you

109
00:06:44,120 --> 00:06:47,340
Ejaaz:
a lot less to be able to train the same type of intelligence or model.

110
00:06:47,480 --> 00:06:51,980
Ejaaz:
So the intelligence per density, which is a unit that we completely made up,

111
00:06:52,080 --> 00:06:55,880
Ejaaz:
and we don't know if it exists, but it somehow rhymes in my head at least,

112
00:06:56,200 --> 00:07:00,380
Ejaaz:
is improving and it's going to be cheaper with each successive model.

113
00:07:00,540 --> 00:07:04,520
Ejaaz:
But if you want to get a bit of context as to like what that looks like in terms

114
00:07:04,520 --> 00:07:07,520
Ejaaz:
of like the models that you use today and what it's going to look like tomorrow,

115
00:07:07,520 --> 00:07:11,120
Ejaaz:
we have this other table here, which kind of like maps it out.

116
00:07:11,240 --> 00:07:16,700
Ejaaz:
So with Blackwell today, you get about a two to three X more intelligent, crazier model, right?

117
00:07:16,760 --> 00:07:19,800
Ejaaz:
That's what Claude Mythos is supposedly meant to be. It's like a larger size.

118
00:07:19,920 --> 00:07:21,140
Ejaaz:
It's trained on these Blackwells.

119
00:07:21,320 --> 00:07:24,600
Ejaaz:
You're going to see a bunch of models similar come out from OpenAI and XAI over

120
00:07:24,600 --> 00:07:25,540
Ejaaz:
the next couple of months.

121
00:07:25,860 --> 00:07:29,860
Josh:
Just to pause you there, these are already models deemed too dangerous to release for the public.

122
00:07:30,000 --> 00:07:35,620
Ejaaz:
Yes. Just some emergency meetings literally being called by the federal chair, top banks.

123
00:07:36,180 --> 00:07:42,820
Ejaaz:
Actually, I read something yesterday that the NSA is using or conferring or

124
00:07:42,820 --> 00:07:46,560
Ejaaz:
re-engaged with Anthropic, as well as the Pentagon and the U.S.

125
00:07:46,660 --> 00:07:51,720
Ejaaz:
Defense Department, after banning and blacklisting Anthropic because it's so powerful.

126
00:07:52,040 --> 00:07:53,160
Josh:
And that's where we are today.

127
00:07:53,500 --> 00:07:56,800
Ejaaz:
That's today. So that's right here. 2026, two to three X, right?

128
00:07:57,020 --> 00:08:01,840
Ejaaz:
Yeah, crazy. Now, you might notice that by next year, we have a larger multiple

129
00:08:01,840 --> 00:08:03,200
Ejaaz:
on the original multiple.

130
00:08:03,440 --> 00:08:08,080
Ejaaz:
By next year, we're going to have a 10 to 15x improvement purely through Vero

131
00:08:08,080 --> 00:08:09,840
Ejaaz:
Rubin GPUs. Now, I must emphasize...

132
00:08:10,250 --> 00:08:14,570
Ejaaz:
This does not include post-training. This doesn't include all the fine,

133
00:08:14,710 --> 00:08:18,610
Ejaaz:
fancy techniques that AI labs themselves will implement to make a smart model.

134
00:08:18,750 --> 00:08:21,010
Ejaaz:
This is just the hardware.

135
00:08:21,330 --> 00:08:24,950
Ejaaz:
It's like buying the hardware and training a model today versus next year,

136
00:08:25,110 --> 00:08:28,390
Ejaaz:
you're gonna get a 10 to 15x more intelligent model, but it gets even scarier.

137
00:08:28,790 --> 00:08:31,090
Ejaaz:
2028, 30 to 50x.

138
00:08:31,650 --> 00:08:37,530
Ejaaz:
2029, 100 to 200x. Now, I haven't seen these multiples in any other industry

139
00:08:37,530 --> 00:08:39,930
Ejaaz:
for any kind of performance or hardware improvement.

140
00:08:39,930 --> 00:08:45,890
Ejaaz:
So I can't wrap my head around this because it looks like just a few small numbers

141
00:08:45,890 --> 00:08:48,750
Ejaaz:
that are getting larger, but these are multiples of its predecessor,

142
00:08:49,050 --> 00:08:51,710
Ejaaz:
which means that we're probably going to get AGI,

143
00:08:53,130 --> 00:08:55,090
Ejaaz:
honestly, by the start of next year.

144
00:08:55,230 --> 00:08:58,810
Ejaaz:
And they're trained on hardware that currently exists and is rolling out.

145
00:08:59,030 --> 00:09:01,650
Ejaaz:
I don't know. I'm just kind of scared reading all of this, to be honest,

146
00:09:01,730 --> 00:09:04,350
Ejaaz:
because what happens if we have universal access to this?

147
00:09:04,650 --> 00:09:07,750
Ejaaz:
There's going to be a load of malicious actors which can use these models for

148
00:09:07,750 --> 00:09:10,670
Ejaaz:
various different things. But also, I don't know what these models are going

149
00:09:10,670 --> 00:09:13,030
Ejaaz:
to be capable of. They're going to be so much smarter than humans themselves.

150
00:09:13,330 --> 00:09:17,550
Josh:
The disturbing thing is that this technology is here. Like this is,

151
00:09:17,710 --> 00:09:21,470
Josh:
it's no longer an engineering problem or a physics problem necessarily.

152
00:09:21,650 --> 00:09:25,530
Josh:
It's just a matter of actually producing the thing and plugging it into an outlet and putting it online.

153
00:09:25,770 --> 00:09:30,070
Josh:
And this is coming. Like there are no novel breakthroughs required to make this a reality.

154
00:09:30,230 --> 00:09:33,930
Josh:
Now, what that looks like on the other side, I don't know, but I think it's

155
00:09:33,930 --> 00:09:37,870
Josh:
safe to assume the velocity of improvement we're going to get is certainly not

156
00:09:37,870 --> 00:09:42,250
Josh:
slowing down. It is turning more closely resemble a vertical line than anything else.

157
00:09:42,410 --> 00:09:47,790
Josh:
And I think it begs the question, like, at what point do we reach AGI and how do we even define that?

158
00:09:47,970 --> 00:09:50,570
Josh:
Because I'm not sure we spoke about that much on the show, but Ejaz,

159
00:09:50,850 --> 00:09:54,410
Josh:
when you say AGI, what do you mean by AGI? What would you be looking for?

160
00:09:55,060 --> 00:09:57,940
Josh:
To declare, okay, we have finally reached AGI.

161
00:09:58,260 --> 00:10:05,740
Ejaaz:
Okay, so this is like my own made-up definition, but it's what will make me go, okay, this is AGI.

162
00:10:06,020 --> 00:10:13,040
Ejaaz:
It would be a single AI model, not many, but a single AI model that advances

163
00:10:13,040 --> 00:10:18,700
Ejaaz:
the frontier of three key major industries autonomously. So I'll pick these

164
00:10:18,700 --> 00:10:20,280
Ejaaz:
industries as examples.

165
00:10:20,720 --> 00:10:24,360
Ejaaz:
Financial industry, so it trades better than the average world.

166
00:10:24,360 --> 00:10:27,240
Ejaaz:
Sorry, then the best hedge fund or investor.

167
00:10:27,880 --> 00:10:31,420
Ejaaz:
It is able to make assessments better than any of the financial analysts,

168
00:10:31,580 --> 00:10:33,240
Ejaaz:
the top experts, et cetera, in that industry.

169
00:10:33,700 --> 00:10:38,680
Ejaaz:
In science, it has discovered a bunch of medical cures for some major diseases

170
00:10:38,680 --> 00:10:41,680
Ejaaz:
such as cancer, Alzheimer's, and stuff like that, that scientists,

171
00:10:41,920 --> 00:10:46,400
Ejaaz:
top scientists at their top level could not figure out. It accelerates their research.

172
00:10:46,820 --> 00:10:49,260
Ejaaz:
And maybe one other industry that I can't think of right now,

173
00:10:49,340 --> 00:10:53,260
Ejaaz:
but it's when these models start doing things that the best of the best humans

174
00:10:53,260 --> 00:10:56,840
Ejaaz:
right now couldn't figure out themselves and couldn't have seen themselves.

175
00:10:57,420 --> 00:10:58,960
Ejaaz:
Do you have a similar definition or?

176
00:10:59,220 --> 00:11:02,500
Josh:
Yeah, I think that sounds right. I think, and again, it's very fuzzy.

177
00:11:02,620 --> 00:11:05,800
Josh:
Everyone kind of has their own custom definition of what they believe AGI is going to be.

178
00:11:05,880 --> 00:11:10,060
Josh:
But for me, it's just AI that's smarter than the smartest human at pretty much

179
00:11:10,060 --> 00:11:12,140
Josh:
any cognitive task that exists.

180
00:11:12,200 --> 00:11:15,000
Josh:
So you can go to this model and it will be better

181
00:11:15,590 --> 00:11:18,850
Josh:
anyone else who you can ask on planet earth about anything and

182
00:11:18,850 --> 00:11:21,630
Josh:
the problem with models today is they're very spiky like you can do this

183
00:11:21,630 --> 00:11:24,450
Josh:
for code probably and it can code better than every

184
00:11:24,450 --> 00:11:27,550
Josh:
human on earth but if you ask it you know a generalized question

185
00:11:27,550 --> 00:11:30,410
Josh:
about something that you really know a lot about there's a

186
00:11:30,410 --> 00:11:33,750
Josh:
lot of times where it's not completely accurate or it will respond

187
00:11:33,750 --> 00:11:36,470
Josh:
as if it has the intelligence of a three-year-old it fails the

188
00:11:36,470 --> 00:11:39,470
Josh:
reasoning tests of a lot of simple things it still feels like

189
00:11:39,470 --> 00:11:42,290
Josh:
it's this very spiky entity once it is fully

190
00:11:42,290 --> 00:11:45,310
Josh:
developed once it is actually better at every cognitive task

191
00:11:45,310 --> 00:11:48,510
Josh:
that includes physical things too that includes like understanding physics

192
00:11:48,510 --> 00:11:51,970
Josh:
of the real world world models that feels like agi and

193
00:11:51,970 --> 00:11:54,870
Josh:
then artificial super intelligence asi feels like

194
00:11:54,870 --> 00:11:58,990
Josh:
it is smarter than all humans combined so it's like if we put all of our brains

195
00:11:58,990 --> 00:12:02,570
Josh:
together no matter how long we tried we can never come up with the things that

196
00:12:02,570 --> 00:12:06,690
Josh:
artificial super intelligence will come up with and i mean will we get there

197
00:12:06,690 --> 00:12:11,710
Josh:
using this chip architecture possibly I'm seeing a 50x multiple,

198
00:12:12,030 --> 00:12:13,350
Josh:
not including the software multiples.

199
00:12:13,730 --> 00:12:17,110
Josh:
And like those compounding on top of each other at the rate that we're moving,

200
00:12:17,290 --> 00:12:20,290
Josh:
seems like the only real constraint is going to be physical.

201
00:12:20,510 --> 00:12:23,450
Josh:
It's going to be actually rolling out these models and powering them on.

202
00:12:23,730 --> 00:12:27,690
Ejaaz:
Well, another crazy thing is, I think a lot of people, including myself,

203
00:12:28,330 --> 00:12:33,450
Ejaaz:
would assume that with every chip upgrade, it's going to be more expensive,

204
00:12:33,450 --> 00:12:35,210
Ejaaz:
and it's going to be bigger.

205
00:12:35,450 --> 00:12:38,210
Ejaaz:
It's going to be clunkier, right? Like the data centers are going to get bigger,

206
00:12:38,210 --> 00:12:39,370
Ejaaz:
it's going to be more expensive.

207
00:12:40,130 --> 00:12:43,610
Ejaaz:
I wish I had a chart to show this, but it's actually the complete inverse.

208
00:12:43,810 --> 00:12:47,070
Ejaaz:
And I'll give you some examples, some numbers to explain that, right?

209
00:12:47,190 --> 00:12:53,910
Ejaaz:
So a reasoning task that costs $1 on Blackwell costs $0.20 on Vero Rubin,

210
00:12:54,010 --> 00:12:56,890
Ejaaz:
which is rolling out as we speak or later this year.

211
00:12:57,110 --> 00:13:03,150
Ejaaz:
And it'll only cost $0.07 on Rubin Ultra, which starts to get released by the start of next year.

212
00:13:03,330 --> 00:13:05,950
Ejaaz:
So the cost is going down pretty massively.

213
00:13:06,230 --> 00:13:10,290
Ejaaz:
Now, by 2028, Jensen announced the Feynman GPU, right?

214
00:13:10,750 --> 00:13:14,210
Ejaaz:
A single rack of that. So we're talking about like just a couple of that.

215
00:13:14,660 --> 00:13:19,780
Ejaaz:
Blocked on top of each other, will process more compute than was required to

216
00:13:19,780 --> 00:13:23,240
Ejaaz:
train GPT-4 that you mentioned earlier, Josh.

217
00:13:23,520 --> 00:13:29,900
Ejaaz:
So the point is, less is more, but somehow more powerful, but also somehow more

218
00:13:29,900 --> 00:13:32,040
Ejaaz:
cheap relative to the intelligence that you're building.

219
00:13:32,280 --> 00:13:36,280
Ejaaz:
And if you assume this intelligence is going to reach this ASI,

220
00:13:36,400 --> 00:13:39,160
Ejaaz:
AGI-like state, it's going to make you money as well.

221
00:13:39,160 --> 00:13:44,340
Ejaaz:
So you end up just having i guess i i'm afraid to say this but the best of old

222
00:13:44,340 --> 00:13:47,500
Ejaaz:
worlds both worlds i don't know what humans are going to be doing but it's great for ai.

223
00:13:47,500 --> 00:13:50,220
Josh:
Basically yeah there's no world in which things don't

224
00:13:50,220 --> 00:13:53,260
Josh:
get better and it feels like right now we're really just constrained by this

225
00:13:53,260 --> 00:13:56,060
Josh:
compute power there's this great meme that i saw online it's

226
00:13:56,060 --> 00:13:59,060
Josh:
it it said uh mythos is too powerful for public release

227
00:13:59,060 --> 00:14:02,060
Josh:
but the reality is is that they're just completely out of compute and

228
00:14:02,060 --> 00:14:04,940
Josh:
anthropic can't actually supply the tokens required to give

229
00:14:04,940 --> 00:14:07,820
Josh:
mythos to the world these optimizations these cost structures

230
00:14:07,820 --> 00:14:10,840
Josh:
yeah there it is we got on screen now great meme

231
00:14:10,840 --> 00:14:13,760
Josh:
great meme but these these cost structures that are

232
00:14:13,760 --> 00:14:18,380
Josh:
going to incur from these new models are going to completely destroy that factor

233
00:14:18,380 --> 00:14:22,200
Josh:
at least for now until whatever that next generation of model is that is so

234
00:14:22,200 --> 00:14:25,740
Josh:
powerful that it's constraining gpus and the interesting thing is that open

235
00:14:25,740 --> 00:14:29,040
Josh:
ai has the same exact thing going on all these models are kind of converging

236
00:14:29,040 --> 00:14:33,080
Josh:
on the same spot but they all seem to be compute constrained.

237
00:14:33,600 --> 00:14:37,260
Ejaaz:
I think what critics will push back on though, Josh, for everything that we've

238
00:14:37,260 --> 00:14:39,700
Ejaaz:
said so far is, okay, cool.

239
00:14:39,800 --> 00:14:43,020
Ejaaz:
You can buy these new hardware things, but why would you do that if you could

240
00:14:43,020 --> 00:14:45,960
Ejaaz:
just wait a few months or six months and buy the next thing?

241
00:14:46,300 --> 00:14:49,280
Ejaaz:
Jensen's just shipping out these products. He's making a load more money.

242
00:14:49,520 --> 00:14:52,200
Ejaaz:
It doesn't make sense. These things are depreciating assets.

243
00:14:52,340 --> 00:14:56,000
Ejaaz:
By the time you've bought the first one and you've ramped that up with power

244
00:14:56,000 --> 00:14:59,420
Ejaaz:
and training your next model, there's already three other new chip architectures.

245
00:14:59,540 --> 00:15:02,560
Ejaaz:
And he would be right, that critic would be right,

246
00:15:03,070 --> 00:15:06,470
Ejaaz:
except that they're massively, massively wrong. And we have proof for that,

247
00:15:06,570 --> 00:15:10,210
Ejaaz:
right? GPUs have now become this anti-depreciation machine.

248
00:15:10,470 --> 00:15:14,850
Josh:
One of the most amazing things about this phenomenon, and it feels like a narrative

249
00:15:14,850 --> 00:15:19,890
Josh:
violation, is the idea that the GPUs that were released three years ago are

250
00:15:19,890 --> 00:15:23,330
Josh:
actually more valuable today than they were at the time they launched,

251
00:15:23,370 --> 00:15:25,130
Josh:
which is a pretty bizarre idea.

252
00:15:25,130 --> 00:15:27,650
Josh:
We have this artifact on screen that shows a chart.

253
00:15:27,770 --> 00:15:32,470
Josh:
And an H100 from NVIDIA cost $30,000 when it launched in 2023.

254
00:15:32,470 --> 00:15:35,390
Josh:
At its peak because of the scarcity because everyone

255
00:15:35,390 --> 00:15:38,630
Josh:
needs these things it was selling for a four times multiple at 120 000

256
00:15:38,630 --> 00:15:41,530
Josh:
per h100 this is kind

257
00:15:41,530 --> 00:15:44,250
Josh:
of outrageous it was a little exorbitant we don't need to

258
00:15:44,250 --> 00:15:47,630
Josh:
be paying that much money but now that they are old they're not depreciated

259
00:15:47,630 --> 00:15:51,830
Josh:
but there's much better hardware out there they're still holding their price

260
00:15:51,830 --> 00:15:56,950
Josh:
at 30 000 in fact you can see a rebound that happens in late 2025 where the

261
00:15:56,950 --> 00:16:01,690
Josh:
cost of these h100 gpus actually ticks upwards And I think a lot of the people,

262
00:16:02,030 --> 00:16:04,990
Josh:
Michael Burry most famously, who is the guy behind the big short,

263
00:16:05,590 --> 00:16:09,970
Josh:
He created an entire short thesis around the idea that the depreciation schedule

264
00:16:09,970 --> 00:16:13,530
Josh:
of these GPUs wasn't aggressive enough and they were actually going to lose

265
00:16:13,530 --> 00:16:17,430
Josh:
their value and therefore the market was going to deflate because the companies

266
00:16:17,430 --> 00:16:18,730
Josh:
weren't marking these down properly.

267
00:16:19,010 --> 00:16:22,470
Josh:
The reality is, is that not only are they not going down, they're starting to

268
00:16:22,470 --> 00:16:27,010
Josh:
trend back up because the incremental cost for a token is so low with these

269
00:16:27,010 --> 00:16:29,790
Josh:
and everyone's so desperate for compute that they're like, well,

270
00:16:29,970 --> 00:16:31,250
Josh:
might as well spend some extra money,

271
00:16:31,630 --> 00:16:34,150
Josh:
get the H100s and start generating inference tokens with them.

272
00:16:34,150 --> 00:16:35,710
Josh:
It's this pretty amazing phenomenon that's happening.

273
00:16:36,070 --> 00:16:40,570
Ejaaz:
Yeah, so if you're wondering why this is happening, explicitly it's AI demand

274
00:16:40,570 --> 00:16:43,250
Ejaaz:
is growing faster than chip supply can expand.

275
00:16:43,470 --> 00:16:47,670
Ejaaz:
We don't have enough fabs or the manufacturing prowess or the energy grid to

276
00:16:47,670 --> 00:16:52,530
Ejaaz:
support creating and generating more GPUs to satiate the demand that we're seeing

277
00:16:52,530 --> 00:16:56,770
Ejaaz:
in AI across all these different industries, right? It's a very pervasive bit of technology.

278
00:16:57,270 --> 00:17:01,610
Ejaaz:
Now, the data that we're showing you on the screen right now isn't siloed to

279
00:17:01,610 --> 00:17:04,750
Ejaaz:
like a few research papers. This is happening in the market right now,

280
00:17:04,810 --> 00:17:06,870
Ejaaz:
and it's incredibly liquid.

281
00:17:07,130 --> 00:17:13,030
Ejaaz:
So a new phenomenon of companies in AI whose stocks have all skyrocketed are

282
00:17:13,030 --> 00:17:15,230
Ejaaz:
these things called neoclouds, right?

283
00:17:15,370 --> 00:17:20,130
Ejaaz:
So these are like, think of it as like AWS. They supply compute to train your

284
00:17:20,130 --> 00:17:23,370
Ejaaz:
AI models by setting up their own data centers, and they kind of like provide

285
00:17:23,370 --> 00:17:26,330
Ejaaz:
it to you in like a cloud or data center specific structure.

286
00:17:26,590 --> 00:17:32,650
Ejaaz:
Examples would be CoreWeave, for example. The idea here is these data centers or these GPU providers.

287
00:17:33,470 --> 00:17:38,530
Ejaaz:
70% of the GPUs that they're running are old GPUs that we're showing you on our screen right now.

288
00:17:38,890 --> 00:17:43,950
Ejaaz:
And they're booked out, I'm not exaggerating, 6 to 12 months in advance.

289
00:17:44,230 --> 00:17:48,730
Ejaaz:
In fact, they're done so in contracts and the same providers renew the contracts

290
00:17:48,730 --> 00:17:51,930
Ejaaz:
three months before the contract needs to be renewed just to make sure that

291
00:17:51,930 --> 00:17:54,310
Ejaaz:
they get access to these older GPUs.

292
00:17:54,390 --> 00:17:56,670
Ejaaz:
So the point I'm trying to make, and you mentioned this just now,

293
00:17:56,770 --> 00:18:02,830
Ejaaz:
Josh, is all that matters is can I get AI tokens generated to do the thing that

294
00:18:02,830 --> 00:18:06,310
Ejaaz:
my company needs or answer the prompt that I have?

295
00:18:06,430 --> 00:18:10,370
Ejaaz:
And if the answer is yes, and it's for a reasonable price, I'm down to go for

296
00:18:10,370 --> 00:18:14,110
Ejaaz:
that because the value that you can build and earn on top of that is invaluable,

297
00:18:14,310 --> 00:18:15,890
Ejaaz:
right? They can have a large markup on that.

298
00:18:16,010 --> 00:18:19,990
Ejaaz:
So it makes sense that these assets are kind of like in high demand.

299
00:18:20,170 --> 00:18:24,030
Ejaaz:
And to your earlier point, Michael J. Burry like shorted the entire market saying

300
00:18:24,030 --> 00:18:26,390
Ejaaz:
that these are depreciating assets and he got that completely wrong.

301
00:18:26,590 --> 00:18:31,090
Ejaaz:
And his thesis specifically was based on it can't train frontier models.

302
00:18:31,230 --> 00:18:32,050
Ejaaz:
And he's actually right.

303
00:18:32,250 --> 00:18:36,050
Ejaaz:
The older models can't train frontier models. But what they are being used for

304
00:18:36,050 --> 00:18:40,750
Ejaaz:
is one thing very specifically, inference, which is if someone has a question,

305
00:18:40,950 --> 00:18:43,550
Ejaaz:
how do I get them the answer? How do I process the prompt?

306
00:18:43,870 --> 00:18:47,550
Ejaaz:
That's what the older GPUs are being used for. And they're really damn good at it.

307
00:18:47,630 --> 00:18:50,770
Ejaaz:
And the reason why it's important and essential for AI labs specifically who

308
00:18:50,770 --> 00:18:54,110
Ejaaz:
are training models, who you might think might want the expensive models is

309
00:18:54,110 --> 00:18:55,630
Ejaaz:
they have a ton of inference.

310
00:18:55,930 --> 00:19:00,210
Ejaaz:
They use inference to even train the new models. So it's this new paradigm where

311
00:19:00,210 --> 00:19:05,910
Ejaaz:
all these old GPU architectures are being re-found or repurposed for this really

312
00:19:05,910 --> 00:19:07,270
Ejaaz:
important thing that is inference.

313
00:19:07,490 --> 00:19:11,230
Ejaaz:
So important context to understand if you're investing in some of these companies, for example.

314
00:19:11,810 --> 00:19:14,450
Josh:
Yeah. And why is it so valuable? Well, it's a testament to the software improvements,

315
00:19:14,570 --> 00:19:17,850
Josh:
right? So we have those software efficiency improvements that we didn't have three years ago.

316
00:19:17,950 --> 00:19:20,350
Josh:
So that same hardware generates a lot more value.

317
00:19:20,470 --> 00:19:24,450
Josh:
And if we scroll down to the value multiplier section of this artifact it shows

318
00:19:24,450 --> 00:19:29,630
Josh:
that the cost of a chatbot inference in 2023 was three dollars an hour and now

319
00:19:29,630 --> 00:19:33,850
Josh:
autonomous agents completing these complex tasks is 30 to 300 dollars per hour

320
00:19:34,540 --> 00:19:40,120
Josh:
The value that you can charge for these tokens is significantly higher than it was in the past.

321
00:19:40,340 --> 00:19:44,080
Josh:
And the amount of tokens that you're able to generate efficiently at higher

322
00:19:44,080 --> 00:19:45,460
Josh:
quality is much higher as well.

323
00:19:45,560 --> 00:19:50,000
Josh:
So there's all these converging forces that are just making the market desperate for compute.

324
00:19:50,260 --> 00:19:54,440
Josh:
Nobody has the compute required that they want. And NVIDIA is trying to put

325
00:19:54,440 --> 00:19:56,480
Josh:
it online as fast as they can, but it's not fast enough.

326
00:19:56,640 --> 00:20:00,940
Josh:
And I assume as we go through this, we're going to continue to see varying bottlenecks

327
00:20:00,940 --> 00:20:03,340
Josh:
and the efficiencies will move to where there are bottlenecks,

328
00:20:03,340 --> 00:20:07,060
Josh:
which creates new bottlenecks right now we're seeing some convergence around

329
00:20:07,060 --> 00:20:10,360
Josh:
cpus and cpus seem to be like they're going to be hitting a

330
00:20:10,360 --> 00:20:13,340
Josh:
shortage somewhat soon because we're out of gpus let's move to cpus

331
00:20:13,340 --> 00:20:16,340
Josh:
and it's it's this really interesting dynamic but that is the idea

332
00:20:16,340 --> 00:20:19,280
Josh:
on this nvidia episode or just the chip episode in

333
00:20:19,280 --> 00:20:22,160
Josh:
general that it is hard to imagine a world in which we don't reach

334
00:20:22,160 --> 00:20:25,980
Josh:
agi given the currently announced infrastructure it

335
00:20:25,980 --> 00:20:29,340
Josh:
doesn't require any breakthroughs it's just if nvidia does

336
00:20:29,340 --> 00:20:32,180
Josh:
what they announced on stage through jensen huang through these next three

337
00:20:32,180 --> 00:20:36,080
Josh:
chips it is almost impossible to imagine what the world of intelligence is going

338
00:20:36,080 --> 00:20:39,700
Josh:
to look like and i think it's important to understand is that mythos is trained

339
00:20:39,700 --> 00:20:44,020
Josh:
on a two-year-old chip and no one's really talking about that so it blew my

340
00:20:44,020 --> 00:20:47,760
Josh:
mind hopefully it blew yours as well uh at least found it a little bit fascinating

341
00:20:47,760 --> 00:20:50,880
Josh:
and that is our episode today thank you guys so much for watching we really appreciate it

342
00:20:50,880 --> 00:20:53,700
Ejaaz:
And i know some of you are probably thinking oh there's a bunch of challenges

343
00:20:53,700 --> 00:20:57,480
Ejaaz:
here and josh actually just mentioned one of them which is like you got cpus

344
00:20:57,480 --> 00:21:01,180
Ejaaz:
we don't have enough energy, we don't have enough memory.

345
00:21:01,340 --> 00:21:04,260
Ejaaz:
And that's like another episode that we can get into.

346
00:21:04,480 --> 00:21:08,000
Ejaaz:
So all of those things assumed will be leveled at some point.

347
00:21:08,160 --> 00:21:11,760
Ejaaz:
And we're gonna see all those industries grow versus being constrained.

348
00:21:12,600 --> 00:21:15,240
Ejaaz:
People are throwing trillions of dollars into this industry.

349
00:21:15,680 --> 00:21:17,920
Ejaaz:
So all of those problems should theoretically be fixed.

350
00:21:18,060 --> 00:21:21,340
Ejaaz:
But rest be sure, we will be the first show to cover it and give you those thoughts

351
00:21:21,340 --> 00:21:22,700
Ejaaz:
before it happens, by the way.

352
00:21:22,900 --> 00:21:26,800
Ejaaz:
And Intel is a sneaky one to get into. But we'll talk about that another time.

353
00:21:26,880 --> 00:21:30,120
Ejaaz:
Thank you so much for listening. If you are not subscribed to us, please subscribe.

354
00:21:30,560 --> 00:21:34,540
Ejaaz:
It helps us out massively. We are having banger weeks on YouTube,

355
00:21:34,740 --> 00:21:36,740
Ejaaz:
Spotify, Apple, and wherever you listen to us.

356
00:21:36,860 --> 00:21:40,160
Ejaaz:
Please rate us. Leave us a comment. We love hearing your feedback.

357
00:21:40,520 --> 00:21:44,320
Ejaaz:
There are like thousands of newbies that are listening to the show, welcome.

358
00:21:44,740 --> 00:21:47,980
Ejaaz:
And also give us feedback about stuff that we may not be covering that you want

359
00:21:47,980 --> 00:21:49,460
Ejaaz:
to hear more of. We're always open to feedback.

360
00:21:49,920 --> 00:21:51,780
Ejaaz:
But until then, I guess we'll see you on the next one.