1
00:00:03,040 --> 00:00:06,860
Ejaaz:
All right josh the ai nerds are

2
00:00:06,860 --> 00:00:09,940
Ejaaz:
fighting again this past weekend there was

3
00:00:09,940 --> 00:00:14,620
Ejaaz:
a very prestigious competition called the international math olympiad which

4
00:00:14,620 --> 00:00:18,360
Ejaaz:
hosts some of the brightest smartest mathematicians of our time and they're

5
00:00:18,360 --> 00:00:22,300
Ejaaz:
typically high schoolers and basically they come together and they take a really

6
00:00:22,300 --> 00:00:27,360
Ejaaz:
hard math test this is like four to five hours and those that score the highest, get medals.

7
00:00:27,540 --> 00:00:31,680
Ejaaz:
You can get bronze, silver, and the highest scorers get gold medals.

8
00:00:31,860 --> 00:00:33,220
Ejaaz:
So what's this going to do with AI?

9
00:00:33,580 --> 00:00:37,820
Ejaaz:
Well, recently, over the last couple of years, the organizers of this International

10
00:00:37,820 --> 00:00:43,020
Ejaaz:
Math Olympiad decided to start inviting AI models to participate as contestants.

11
00:00:43,500 --> 00:00:49,760
Ejaaz:
And they did terribly. Like, no one's come even near the human geniuses.

12
00:00:50,140 --> 00:00:53,820
Ejaaz:
Except this year, Josh, where they came to play and not one,

13
00:00:53,960 --> 00:01:00,700
Ejaaz:
but two AI models achieved not silver, but gold medals, which is just an insane thing, right?

14
00:01:01,340 --> 00:01:04,780
Ejaaz:
So it should be all fun and games, right? What a fairytale story.

15
00:01:05,280 --> 00:01:11,100
Ejaaz:
Well, unfortunately, OpenAI and Google got into an online spat where they started

16
00:01:11,100 --> 00:01:13,000
Ejaaz:
accusing each other of cheating.

17
00:01:13,440 --> 00:01:17,460
Ejaaz:
Now, remember, these are trillion dollar companies. So essentially,

18
00:01:17,580 --> 00:01:20,720
Ejaaz:
Josh, I was teleported this weekend back to my high school days where I felt

19
00:01:20,720 --> 00:01:25,200
Ejaaz:
like the teacher had to come in, separate the kids from arguing over some kind

20
00:01:25,200 --> 00:01:28,320
Ejaaz:
of random homework problem and get them to chill out.

21
00:01:28,480 --> 00:01:31,880
Josh:
We will look back at this episode and laugh at it like it's a joke because these

22
00:01:31,880 --> 00:01:35,820
Josh:
AIs, they're competing against high schoolers. That's so lame.

23
00:01:35,960 --> 00:01:38,540
Josh:
Only high schoolers? Like, come on, and you're just barely getting gold.

24
00:01:38,740 --> 00:01:42,420
Ejaaz:
Well, in their defense, Josh, these are some pretty smart high schoolers,

25
00:01:42,520 --> 00:01:44,580
Ejaaz:
man. Like I was looking at some of these math problems.

26
00:01:44,720 --> 00:01:48,300
Ejaaz:
I don't know if you can see my screen here. I'm sharing the official site.

27
00:01:48,300 --> 00:01:53,120
Ejaaz:
And if you look at some of these problems, here we go.

28
00:01:53,340 --> 00:01:56,460
Ejaaz:
And then like, okay, so they have basically, they host this competition in a

29
00:01:56,460 --> 00:01:57,560
Ejaaz:
different country each year.

30
00:01:57,800 --> 00:02:01,040
Ejaaz:
And you can kind of like download the test yourselves after the fact to see

31
00:02:01,040 --> 00:02:05,260
Ejaaz:
how well you could do it. I had a look at this one, Josh from the Afrikaans.

32
00:02:07,000 --> 00:02:10,020
Ejaaz:
I basically don't understand anything. One second. All right,

33
00:02:10,260 --> 00:02:13,240
Ejaaz:
take a look at that. Take a look at this.

34
00:02:13,780 --> 00:02:16,380
Josh:
That looks like quite a bit of squiggly lines on a page.

35
00:02:17,090 --> 00:02:22,050
Ejaaz:
You know what? That could be mistaken for a piece of art in a gallery if you

36
00:02:22,050 --> 00:02:24,870
Ejaaz:
didn't peer too closely at it. This looks insane.

37
00:02:25,530 --> 00:02:28,290
Josh:
Okay, so I take it back. So the high schoolers are probably pretty smart then.

38
00:02:28,490 --> 00:02:33,130
Josh:
And I guess the AI performing as well as the high schoolers is probably a pretty big deal, right?

39
00:02:33,210 --> 00:02:37,390
Josh:
Because that looks like very complicated math problems that I'm assuming most

40
00:02:37,390 --> 00:02:39,190
Josh:
of the smartest people in the world cannot solve.

41
00:02:39,190 --> 00:02:44,070
Ejaaz:
Exactly. Yeah. This is like something that is technically set for high schoolers

42
00:02:44,070 --> 00:02:49,910
Ejaaz:
and sometimes college kids, but is meant to demonstrate prowess in the field.

43
00:02:50,090 --> 00:02:54,010
Ejaaz:
So there's a lot of university academics, which obviously do math degrees and

44
00:02:54,010 --> 00:02:58,850
Ejaaz:
they do PhDs, but those are in very specific problems. So you kind of like in

45
00:02:58,850 --> 00:03:02,490
Ejaaz:
science, you just need to kind of pick and choose your lane and then dedicate your life to it.

46
00:03:03,110 --> 00:03:07,510
Ejaaz:
High schoolers is kind of college kids are kind of like the last point before

47
00:03:07,510 --> 00:03:09,050
Ejaaz:
you jump into your specialization.

48
00:03:09,310 --> 00:03:13,590
Ejaaz:
So really, if you're the best at generalized maths, you're going to compete in this competition.

49
00:03:13,670 --> 00:03:17,650
Ejaaz:
And what's so interesting is typically AI models haven't been able to perform

50
00:03:17,650 --> 00:03:22,810
Ejaaz:
very well because they needed a lot of context beforehand about the problem, Josh.

51
00:03:22,810 --> 00:03:27,410
Ejaaz:
So they needed to know that, you know, there was certain, you know,

52
00:03:27,510 --> 00:03:29,950
Ejaaz:
X equals something and Y equals something.

53
00:03:30,230 --> 00:03:33,270
Ejaaz:
And they had to have defined parameters to kind of figure out the problem.

54
00:03:33,590 --> 00:03:37,190
Ejaaz:
But this was the first time that AI models basically were just given a blank

55
00:03:37,190 --> 00:03:38,810
Ejaaz:
sheet of paper or not a blank sheet of paper.

56
00:03:38,950 --> 00:03:42,630
Ejaaz:
But they stared at the problem just as we just looked at it just now and had

57
00:03:42,630 --> 00:03:47,370
Ejaaz:
to read the words, read the characters, interpret what that meant in the context

58
00:03:47,370 --> 00:03:51,130
Ejaaz:
of that situation and the way that the question was framed and then figure it out themselves.

59
00:03:51,130 --> 00:03:55,530
Ejaaz:
So it's as if the AI models had a camera that looked at a paper,

60
00:03:55,690 --> 00:04:01,050
Ejaaz:
similar way that we look at test papers as kids through our eyes and figure it out themselves.

61
00:04:01,970 --> 00:04:06,150
Josh:
So what changed? What happened in the last year that made it so much better?

62
00:04:06,250 --> 00:04:11,930
Josh:
Because it went from, what, basically zero of six to now six or five of six questions answered.

63
00:04:11,970 --> 00:04:14,310
Josh:
Now it's a gold medalist. So what happened?

64
00:04:14,310 --> 00:04:18,730
Ejaaz:
So listen, I'm not going to try and explain it, but maybe you and I can decipher

65
00:04:18,730 --> 00:04:22,210
Ejaaz:
it through the legends themselves that built these models, right?

66
00:04:22,350 --> 00:04:24,470
Ejaaz:
Okay, so let me paint the scene for you, Josh.

67
00:04:24,890 --> 00:04:26,810
Ejaaz:
It is Saturday evening.

68
00:04:27,250 --> 00:04:30,230
Ejaaz:
You know, normal people are usually out and about. They're having fun.

69
00:04:30,350 --> 00:04:33,450
Ejaaz:
They're probably having dinner, catching up with friends or chilling at home, watching a movie.

70
00:04:34,090 --> 00:04:38,990
Ejaaz:
And this guy called Alexander Wei, who is OpenAI's head of reasoning.

71
00:04:39,810 --> 00:04:43,850
Ejaaz:
Reasoning is basically this new fancy technique that AI models have typically

72
00:04:43,850 --> 00:04:47,570
Ejaaz:
demonstrated, which has brought them up to like the frontier level of AI models.

73
00:04:47,590 --> 00:04:51,730
Ejaaz:
Basically, if your model can do reasoning, it's typically a pretty smart model, right?

74
00:04:52,010 --> 00:04:57,350
Ejaaz:
And he posts this tweet saying, I'm excited to share that our latest OpenAI

75
00:04:57,350 --> 00:05:01,950
Ejaaz:
Experimental Reasoning LLM has achieved a longstanding grand challenge in AI,

76
00:05:02,230 --> 00:05:06,810
Ejaaz:
a gold medal level performance on the world's most prestigious math competition,

77
00:05:07,110 --> 00:05:08,730
Ejaaz:
the International Math Olympiad.

78
00:05:09,100 --> 00:05:14,640
Ejaaz:
And he goes on to describe, you know, how the model basically took on each problem

79
00:05:14,640 --> 00:05:19,440
Ejaaz:
in its own regard and solved it and how this is a massive success and win for

80
00:05:19,440 --> 00:05:21,260
Ejaaz:
AI models and how, most importantly.

81
00:05:21,820 --> 00:05:25,420
Ejaaz:
OpenAI was the first ever model to complete this.

82
00:05:25,640 --> 00:05:31,200
Ejaaz:
And not too long after he posts that tweet, Josh, Sam Altman jumps in here, right?

83
00:05:31,320 --> 00:05:34,420
Ejaaz:
And he goes, again, he kind of echoes similar thoughts. We achieved gold medal

84
00:05:34,420 --> 00:05:38,300
Ejaaz:
level performance on the 2025 IMO competition with general purpose reasoning.

85
00:05:38,800 --> 00:05:41,840
Ejaaz:
And then he kind of like shells GPT-5 at the end. Basically,

86
00:05:41,840 --> 00:05:44,240
Ejaaz:
it's like a promotive thing for OpenAI.

87
00:05:44,720 --> 00:05:49,140
Ejaaz:
And I will say that this is really cool because what they've achieved is something

88
00:05:49,140 --> 00:05:53,180
Ejaaz:
that hasn't been done before, right? So very impressive feat.

89
00:05:53,360 --> 00:05:58,760
Ejaaz:
And in terms of how this works specifically, Cheryl Su here gives a really good breakdown.

90
00:05:58,980 --> 00:06:04,000
Ejaaz:
She says, the model solves these problems without tools like coding or Lean,

91
00:06:04,120 --> 00:06:05,080
Ejaaz:
which is another coding tool.

92
00:06:05,420 --> 00:06:09,200
Ejaaz:
It just uses natural language. So as I said earlier, It kind of reads the paper

93
00:06:09,200 --> 00:06:12,140
Ejaaz:
and just kind of interprets what it thinks it means.

94
00:06:12,520 --> 00:06:17,220
Ejaaz:
And it also has the same amount of time to do the test as other kits, so 4.5 hours.

95
00:06:17,640 --> 00:06:22,060
Ejaaz:
And she says, we see the model reason at a very high level, trying out different

96
00:06:22,060 --> 00:06:26,860
Ejaaz:
strategies, making observations from examples, and testing different hypotheses out.

97
00:06:27,060 --> 00:06:32,040
Ejaaz:
And she says, it's crazy how we've gone from 12% on the AIME test,

98
00:06:32,200 --> 00:06:38,200
Ejaaz:
which is what GPT-4O, which is OpenAI's early model, got to IMO gold,

99
00:06:38,400 --> 00:06:41,340
Ejaaz:
International Math Olympiad gold medal in 15 months.

100
00:06:41,520 --> 00:06:45,280
Ejaaz:
So just to set that in context, Josh, that is a crazy leap in 15 months.

101
00:06:45,400 --> 00:06:50,040
Ejaaz:
Imagine going from eighth grade level math to the best.

102
00:06:51,200 --> 00:06:55,100
Ejaaz:
Mathematician in the world in 15 months. It's a pretty insane thing.

103
00:06:55,300 --> 00:06:58,980
Ejaaz:
Yeah, I'd say so. So essentially the breakthrough that Cheryl is highlighting

104
00:06:58,980 --> 00:07:02,840
Ejaaz:
here is number one, the model didn't need any context.

105
00:07:03,120 --> 00:07:08,400
Ejaaz:
Number two, it used really high level reasoning to figure out the problems from first principles.

106
00:07:08,880 --> 00:07:14,000
Ejaaz:
And number three, it was able to test out multiple hypotheses at the same time

107
00:07:14,000 --> 00:07:15,680
Ejaaz:
instead of trying to one shot the problem.

108
00:07:15,920 --> 00:07:19,220
Ejaaz:
Typically in the past when AI models have been given a prompt or a problem,

109
00:07:19,400 --> 00:07:23,280
Ejaaz:
it tries to just like give it its best shot and give you one solution, Josh.

110
00:07:23,600 --> 00:07:27,000
Ejaaz:
Whereas what these models, these reasoning models do really well is they are

111
00:07:27,000 --> 00:07:30,520
Ejaaz:
able to hypothetically entertain many different scenarios and then pick the

112
00:07:30,520 --> 00:07:32,280
Ejaaz:
best one of which it thought it was an answer.

113
00:07:32,420 --> 00:07:34,440
Ejaaz:
And it ended up with the gold medal, which is insane, right?

114
00:07:34,560 --> 00:07:37,920
Ejaaz:
But it wasn't entirely without a few glitches here and there, Josh.

115
00:07:38,080 --> 00:07:42,440
Ejaaz:
So if you look at this post from Jasper, he read through the entire kind of

116
00:07:42,440 --> 00:07:48,840
Ejaaz:
like problem set that OpenAI's model went through. and he points out that some weird anomalies.

117
00:07:49,040 --> 00:07:52,120
Ejaaz:
So he kind of like talks about like how it kind of like analyzed and a bunch of things.

118
00:07:52,400 --> 00:07:55,460
Ejaaz:
And he goes, however, the write-up is kind of messy. He goes,

119
00:07:55,660 --> 00:07:58,680
Ejaaz:
it overuses shorthand and sentence fragments.

120
00:07:58,940 --> 00:08:04,820
Ejaaz:
It introduces new terms without definitions, for example, forbidden and sunny partners.

121
00:08:04,860 --> 00:08:10,300
Ejaaz:
I have no idea what either of those terms could mean, but it was just apparently

122
00:08:10,300 --> 00:08:13,860
Ejaaz:
just interspersing these phrases during its analysis.

123
00:08:13,860 --> 00:08:17,840
Ejaaz:
And so as a reviewer, or as an examiner, they were reading this,

124
00:08:17,960 --> 00:08:20,180
Ejaaz:
they were like, sorry, wait, what is it talking about?

125
00:08:20,240 --> 00:08:23,220
Ejaaz:
It got to the right answer, but what is it talking about, right?

126
00:08:23,380 --> 00:08:29,720
Ejaaz:
The other key point from this post is it was unable to solve one problem, problem six.

127
00:08:29,940 --> 00:08:33,520
Ejaaz:
And I'm not even gonna try and get into why it failed on that problem,

128
00:08:33,640 --> 00:08:35,920
Ejaaz:
but it was just particularly hard for it to figure out.

129
00:08:36,000 --> 00:08:40,000
Ejaaz:
But it still scored a high enough percentage that it got a gold medal.

130
00:08:40,140 --> 00:08:44,880
Ejaaz:
So it's basically a win for OpenAI, but that's when the drama starts unfolding.

131
00:08:44,880 --> 00:08:51,260
Ejaaz:
So I've got this post up from Mikhail Samin, which kind of like sparks this entire fight, Josh.

132
00:08:51,520 --> 00:08:55,280
Ejaaz:
He goes, according to a friend, the IMO, which is the International Math Olympiad.

133
00:08:55,700 --> 00:09:01,320
Ejaaz:
Asked AI companies not to steal the spotlight from kids and to wait a week after

134
00:09:01,320 --> 00:09:03,760
Ejaaz:
the closing ceremony to announce the results.

135
00:09:04,380 --> 00:09:09,220
Ejaaz:
OpenAI instead announced the results before the closing ceremony. Yeah.

136
00:09:09,400 --> 00:09:13,440
Ejaaz:
And then he goes on to basically say how this is essentially like some kind

137
00:09:13,440 --> 00:09:15,800
Ejaaz:
of clout chasing move from OpenAI.

138
00:09:16,020 --> 00:09:20,120
Ejaaz:
And OK, I tried to evaluate this, Josh, from OpenAI's kind of perspective,

139
00:09:20,280 --> 00:09:23,560
Ejaaz:
which is they basically want to steal the limelight,

140
00:09:23,660 --> 00:09:27,620
Ejaaz:
but also say that they were the first AI model to ever achieve gold on this

141
00:09:27,620 --> 00:09:31,760
Ejaaz:
competition, which puts them in a good light and makes users want to choose

142
00:09:31,760 --> 00:09:35,460
Ejaaz:
OpenAI and solidify the branding that OpenAI is the best. right?

143
00:09:35,680 --> 00:09:39,760
Ejaaz:
But on the other side, you know, they're kind of like stealing the spotlight

144
00:09:39,760 --> 00:09:43,540
Ejaaz:
from the kids, as this post says. But that's not actually the main trope.

145
00:09:44,020 --> 00:09:49,920
Ejaaz:
The main trope here, Josh, is OpenAI wasn't the only model to achieve a goal, right?

146
00:09:50,120 --> 00:09:58,060
Ejaaz:
At the same time, during the same testing period, you had Google achieving the exact same score.

147
00:09:58,280 --> 00:10:03,780
Ejaaz:
So then the question becomes, okay, well, it was whoever was ethical about announcing their own result.

148
00:10:04,460 --> 00:10:08,820
Ejaaz:
This post from Demis Hassabis, which is Google's head of AI,

149
00:10:09,000 --> 00:10:14,600
Ejaaz:
basically posts, and I'll note two days later, Official results are in.

150
00:10:14,940 --> 00:10:19,840
Ejaaz:
Gemini, which is their flagship model, achieved gold medal level in the International Math Olympiad.

151
00:10:20,100 --> 00:10:23,040
Ejaaz:
An advanced version was able to solve five out of six problems.

152
00:10:23,180 --> 00:10:25,780
Ejaaz:
So same as OpenAI, same thing, struggled on the sixth problem.

153
00:10:26,300 --> 00:10:29,140
Ejaaz:
Incredible progress. Huge congrats to the team.

154
00:10:29,260 --> 00:10:31,540
Ejaaz:
And a tweet here says that Google

155
00:10:31,540 --> 00:10:34,900
Ejaaz:
basically had to wait for marketing to approve the tweet until Monday.

156
00:10:35,020 --> 00:10:37,640
Ejaaz:
But OpenAI shared theirs first at 1 a.m.

157
00:10:38,040 --> 00:10:40,280
Ejaaz:
On Saturday and stole the spotlight.

158
00:10:40,720 --> 00:10:44,660
Ejaaz:
And we see the screenshot from Demis Hassabis, which, you know,

159
00:10:44,760 --> 00:10:46,380
Ejaaz:
he further clarifies this, basically saying,

160
00:10:46,540 --> 00:10:50,560
Ejaaz:
by the way, as an aside, we didn't announce on Friday because we respected the

161
00:10:50,560 --> 00:10:55,100
Ejaaz:
IMO's board's original request that all AI labs share the results only after

162
00:10:55,100 --> 00:10:56,680
Ejaaz:
the official results have been verified.

163
00:10:57,020 --> 00:10:59,000
Ejaaz:
Now that we've been given permission to share, blah, blah, blah,

164
00:10:59,260 --> 00:11:02,520
Ejaaz:
he shares. So Demis is playing the like good Samaritan here.

165
00:11:02,620 --> 00:11:06,120
Ejaaz:
He's like, ah, you know, we also have the good model, but we,

166
00:11:06,240 --> 00:11:10,620
Ejaaz:
you know, we have some pride and some manners about how we deal with these things.

167
00:11:10,760 --> 00:11:15,260
Ejaaz:
That's where it starts to get a little uglier, Josh, because we have OpenAI

168
00:11:15,260 --> 00:11:21,500
Ejaaz:
chiming in to this tweet, which basically says, and this is some random commenting

169
00:11:21,500 --> 00:11:23,900
Ejaaz:
on OpenAI and this entire situation.

170
00:11:24,560 --> 00:11:29,640
Ejaaz:
So OpenAI basically has zero advantages except the size of the team,

171
00:11:30,260 --> 00:11:34,440
Ejaaz:
aka the OpenAI team was claimed to be smaller than Google Gemini's team.

172
00:11:34,520 --> 00:11:38,680
Ejaaz:
So what he's inferring here is there's no real difference between OpenAI's models

173
00:11:38,680 --> 00:11:41,800
Ejaaz:
and Google Gemini's models. You can pretty much use either or.

174
00:11:42,380 --> 00:11:46,100
Ejaaz:
OpenAI maybe has a smaller team to build that model, but who the hell cares?

175
00:11:46,380 --> 00:11:52,420
Ejaaz:
And then one of the AI model researchers at OpenAI basically comes in and says,

176
00:11:52,520 --> 00:11:55,520
Ejaaz:
well, I think it's also interesting that they they

177
00:11:55,520 --> 00:11:59,080
Ejaaz:
being google curated and provided useful context

178
00:11:59,080 --> 00:12:02,020
Ejaaz:
to the model which we did not feels like

179
00:12:02,020 --> 00:12:07,100
Ejaaz:
taking your tutor's cheat sheet with you into the exam so shots basically being

180
00:12:07,100 --> 00:12:12,900
Ejaaz:
fired from open ai saying hey um you cheated you gave context to your model

181
00:12:12,900 --> 00:12:17,980
Ejaaz:
and that was why it was able to achieve gold we open ai didn't provide any of

182
00:12:17,980 --> 00:12:21,080
Ejaaz:
that context and it was able to reason from first principles, there you have it.

183
00:12:21,260 --> 00:12:26,580
Ejaaz:
But then directly beneath it, Vinay Rameshes, who is a Google DeepMind AI researcher, responds,

184
00:12:27,020 --> 00:12:32,320
Ejaaz:
it's worth noting actually that a deep think system, which is Google's AI system

185
00:12:32,320 --> 00:12:36,900
Ejaaz:
with no access to this corpus, so no context, also got gold.

186
00:12:36,900 --> 00:12:40,880
Ejaaz:
Again, according to the official graders, and he puts this in brackets because

187
00:12:40,880 --> 00:12:43,880
Ejaaz:
OpenAI didn't wait for the official graders to mark their score,

188
00:12:44,160 --> 00:12:45,820
Ejaaz:
with exactly the same score.

189
00:12:45,820 --> 00:12:51,960
Ejaaz:
So basically, this is like a pissing contest between two of the top AI model providers.

190
00:12:52,300 --> 00:12:56,260
Ejaaz:
Here's my take, Josh. And then I really want to kind of lean into what you think

191
00:12:56,260 --> 00:12:57,140
Ejaaz:
about this whole debacle.

192
00:12:57,640 --> 00:12:59,800
Ejaaz:
Number one, this seems so childish to me.

193
00:13:00,320 --> 00:13:05,000
Ejaaz:
Like, eventually, AI models were eventually going to get smarter or smart enough

194
00:13:05,000 --> 00:13:06,880
Ejaaz:
to solve these mathematical problems.

195
00:13:07,020 --> 00:13:10,260
Ejaaz:
And I think you said this earlier on.

196
00:13:10,600 --> 00:13:14,080
Ejaaz:
This is something that they're going to probably laugh about 10 years from now,

197
00:13:14,180 --> 00:13:17,860
Ejaaz:
right? that they were able to solve whatever, the most complex mathematic problems

198
00:13:17,860 --> 00:13:19,640
Ejaaz:
for humans, mere humans.

199
00:13:19,780 --> 00:13:24,220
Ejaaz:
And now AI is off creating wonderful scientific discoveries for us that we would

200
00:13:24,220 --> 00:13:26,800
Ejaaz:
have never comprehended or figured out ourselves, right?

201
00:13:27,060 --> 00:13:31,140
Ejaaz:
So firstly, you're arguing over something that's so silly.

202
00:13:31,200 --> 00:13:36,100
Ejaaz:
But number two, this kind of seems desperate on the open AI side.

203
00:13:36,240 --> 00:13:38,860
Ejaaz:
And maybe I'm being biased, but I'm just going to give you my take.

204
00:13:39,240 --> 00:13:43,180
Ejaaz:
Open AI has kind of had a series of stumbles recently.

205
00:13:43,180 --> 00:13:45,980
Ejaaz:
They claimed that they were going to release gpt5 which

206
00:13:45,980 --> 00:13:49,080
Ejaaz:
is their brand new frontier model but they've delayed it many months

207
00:13:49,080 --> 00:13:52,620
Ejaaz:
now um they got outperformed by

208
00:13:52,620 --> 00:13:55,600
Ejaaz:
grok 4 from xai uh so now

209
00:13:55,600 --> 00:13:58,720
Ejaaz:
they have a new benchmark that they need to beat a new model that they basically

210
00:13:58,720 --> 00:14:02,880
Ejaaz:
need to outcompete uh they claimed that they were going to release a new open

211
00:14:02,880 --> 00:14:07,200
Ejaaz:
source model and then delayed it after a chinese open source model was released

212
00:14:07,200 --> 00:14:11,300
Ejaaz:
and had one trillion parameters and outperformed not just their model,

213
00:14:11,400 --> 00:14:13,380
Ejaaz:
but any other open source model out there.

214
00:14:13,960 --> 00:14:16,180
Ejaaz:
And so I feel like they're looking

215
00:14:16,180 --> 00:14:21,000
Ejaaz:
for a win, right? They released their agent this week or last week.

216
00:14:21,200 --> 00:14:24,000
Ejaaz:
And so, you know, that had mixed review, mixed feedback.

217
00:14:24,220 --> 00:14:26,580
Ejaaz:
So I feel like Sam is desperate for a win.

218
00:14:26,860 --> 00:14:32,140
Ejaaz:
People are criticizing consistently their moat, asking what has OpenAI got?

219
00:14:32,360 --> 00:14:35,000
Ejaaz:
They've lost a ton of researchers to Meta and other companies.

220
00:14:35,240 --> 00:14:37,260
Ejaaz:
I feel like their back's against the wall.

221
00:14:37,680 --> 00:14:41,100
Ejaaz:
Sam's scared and he basically needs to grab any kind of win.

222
00:14:41,220 --> 00:14:42,600
Ejaaz:
So it reeks of desperation.

223
00:14:43,600 --> 00:14:44,540
Ejaaz:
What's your take, Josh?

224
00:14:44,820 --> 00:14:49,500
Josh:
I do empathize with the team. They've been coming under fire from every single angle.

225
00:14:49,780 --> 00:14:54,720
Josh:
I mean, you have Zuck poaching all of their talent, and then all of the other

226
00:14:54,720 --> 00:14:57,040
Josh:
open-source AI models are beating them at their own game.

227
00:14:57,360 --> 00:15:00,540
Josh:
And they're just kind of, they're really getting beat up now.

228
00:15:00,580 --> 00:15:04,440
Josh:
And I think that they're looking to get some footing. I'm sure this probably plays a role in it.

229
00:15:04,760 --> 00:15:09,560
Josh:
But I'm sure behind the scenes, they're really trying to fight hard to put their

230
00:15:09,560 --> 00:15:13,240
Josh:
feet back on stable ground, to get GPT-5 out the door, to build Project Stargate

231
00:15:13,240 --> 00:15:14,560
Josh:
and make this big infrastructure network.

232
00:15:14,720 --> 00:15:18,340
Josh:
They need some wins. So sure, this was probably an attempt to get ahead,

233
00:15:18,480 --> 00:15:20,660
Josh:
make them look good, win over some more hearts and minds.

234
00:15:21,240 --> 00:15:25,160
Josh:
But I think the most interesting part of the whole story is less the drama and

235
00:15:25,160 --> 00:15:28,480
Josh:
more the fact that these models were able to accomplish a really impressive

236
00:15:28,480 --> 00:15:29,980
Josh:
feat over such a short period of time.

237
00:15:30,500 --> 00:15:34,840
Josh:
From what I understand, previously when they attempted to solve these problems,

238
00:15:35,080 --> 00:15:37,820
Josh:
they used a custom training data set.

239
00:15:37,960 --> 00:15:42,660
Josh:
They used custom tool sets. It was mostly a model trained on solving mathematical problems.

240
00:15:42,840 --> 00:15:49,360
Josh:
And with this version, both the OpenAI version and the Gemini models,

241
00:15:49,520 --> 00:15:51,140
Josh:
they were both general purpose models.

242
00:15:51,260 --> 00:15:54,960
Josh:
They were not trained specifically with the intention of solving mathematical problems.

243
00:15:55,080 --> 00:15:58,020
Josh:
These are the general models that people day to day are using.

244
00:15:58,160 --> 00:16:02,200
Josh:
They're just now able to solve these math problems using this new general intelligence.

245
00:16:02,200 --> 00:16:05,240
Josh:
So it's a really interesting breakthrough that I think we get from reinforcement

246
00:16:05,240 --> 00:16:11,780
Josh:
learning that now there is not so much of an advantage to training a model specific

247
00:16:11,780 --> 00:16:14,180
Josh:
to one's skill set when you could just make it great at everything.

248
00:16:14,820 --> 00:16:19,540
Josh:
There was one thing that I noticed that some people call it cheating, other people don't.

249
00:16:19,860 --> 00:16:23,640
Josh:
But so with the mathematical, with the actual test that high school was had

250
00:16:23,640 --> 00:16:26,540
Josh:
to take, they're not allowed to use tools and they have a limited amount of

251
00:16:26,540 --> 00:16:27,820
Josh:
time per question to answer.

252
00:16:28,420 --> 00:16:32,680
Josh:
The models that, the OpenAI model and the Gemini model, they had infinite amount

253
00:16:32,680 --> 00:16:34,420
Josh:
of time to answer and they were allowed to use tools.

254
00:16:34,620 --> 00:16:37,320
Josh:
So there still are small differences in these.

255
00:16:37,580 --> 00:16:39,020
Ejaaz:
Were they allowed to like use the internet?

256
00:16:39,340 --> 00:16:42,200
Josh:
I don't know the specifics. I would imagine at least calculators,

257
00:16:42,380 --> 00:16:46,120
Josh:
at most probably the full repertoire of what we have currently available to

258
00:16:46,120 --> 00:16:49,600
Josh:
us, which is full internet search, code writing abilities. They could do their

259
00:16:49,600 --> 00:16:50,540
Josh:
own mathematical checks.

260
00:16:51,000 --> 00:16:54,220
Josh:
So I would just assume the minimum amount of constraints possible.

261
00:16:54,540 --> 00:16:58,520
Josh:
So there was much less constraints on the models, But they did solve the questions.

262
00:16:58,680 --> 00:17:01,680
Josh:
And I think that's super impressive. They got five out of six right.

263
00:17:02,350 --> 00:17:06,530
Josh:
Which was gold and better than almost every student, if I'm not mistaken.

264
00:17:06,730 --> 00:17:09,830
Josh:
Only a few students got the six out of six completely correct.

265
00:17:10,030 --> 00:17:12,710
Josh:
It's just cool to see the rate of progress of these models getting better.

266
00:17:12,910 --> 00:17:17,350
Josh:
That over the course of the last 15 months or so, they went from horrible and

267
00:17:17,350 --> 00:17:20,590
Josh:
narrowly trained to incredible and generally trained.

268
00:17:20,750 --> 00:17:24,530
Josh:
And as long as that trend keeps going, I think the drama matters less than the

269
00:17:24,530 --> 00:17:28,550
Josh:
output, which is models are getting really good at solving really hard math problems.

270
00:17:28,690 --> 00:17:30,950
Josh:
And original ones too, that the world has never seen before.

271
00:17:31,490 --> 00:17:35,070
Ejaaz:
Yeah, well, that last point is actually the main takeaway that I had,

272
00:17:35,190 --> 00:17:39,210
Ejaaz:
Josh, which is it's original, never-before-seen problems.

273
00:17:39,390 --> 00:17:43,790
Ejaaz:
Typically, these AI models are trained on things that they've seen before, as you said, right?

274
00:17:44,090 --> 00:17:46,530
Ejaaz:
They're trained on data sets. So they've already seen the problem,

275
00:17:46,570 --> 00:17:49,350
Ejaaz:
and then they have to work out, they know the answer, and they have to work

276
00:17:49,350 --> 00:17:52,310
Ejaaz:
out how to get there, right? So they kind of have a leading factor.

277
00:17:52,750 --> 00:17:55,570
Ejaaz:
Here, it's just kind of like completely unknown.

278
00:17:56,290 --> 00:18:01,130
Ejaaz:
The other thing is, this is kind of like the culmination of a trend,

279
00:18:01,310 --> 00:18:06,950
Ejaaz:
Josh, which is these AI models are really good at doing kind of binary tasks.

280
00:18:07,190 --> 00:18:12,990
Ejaaz:
And I don't want to reduce mathematics to binary tasks, but technically it's

281
00:18:12,990 --> 00:18:17,910
Ejaaz:
numbers, sequential formulas, that kind of stuff, right?

282
00:18:18,090 --> 00:18:23,730
Ejaaz:
So if you can run enough compute at a thing, and if you can get that AI model

283
00:18:23,730 --> 00:18:28,410
Ejaaz:
to consider all different decision parts, It's going to eventually get to the answer, right?

284
00:18:28,610 --> 00:18:32,010
Ejaaz:
But it's always a specific answer at the end of that, right?

285
00:18:32,210 --> 00:18:36,990
Ejaaz:
Whereas when it comes to more subjective things, more human experiential things,

286
00:18:37,550 --> 00:18:39,450
Ejaaz:
AI has typically struggled to...

287
00:18:40,150 --> 00:18:43,390
Ejaaz:
Improve at the same rate that it has for like all these different scientific

288
00:18:43,390 --> 00:18:47,710
Ejaaz:
and math problems so i'm glad that we've reached this pinnacle feat i think

289
00:18:47,710 --> 00:18:53,690
Ejaaz:
ai models have are really good at one thing and not so great at other things

290
00:18:53,690 --> 00:18:57,870
Ejaaz:
and i'm excited to see how like they kind of like try to start leapfrogging

291
00:18:57,870 --> 00:18:59,490
Ejaaz:
each other over the next couple of years.

292
00:18:59,490 --> 00:19:02,250
Josh:
Yeah it's it's that directional progress that we like

293
00:19:02,250 --> 00:19:05,030
Josh:
math is clearly the first because you can write down

294
00:19:05,030 --> 00:19:08,910
Josh:
proofs and you could check your work and there is an actual verifiable solution

295
00:19:08,910 --> 00:19:12,110
Josh:
and i think that's why we're seeing a lot of the progress start early

296
00:19:12,110 --> 00:19:15,450
Josh:
in math and then hopefully go on to these other places but

297
00:19:15,450 --> 00:19:18,590
Josh:
what we are seeing is these first signs of

298
00:19:18,590 --> 00:19:21,830
Josh:
new knowledge breakthroughs where it's solving a

299
00:19:21,830 --> 00:19:24,710
Josh:
new and novel problem that hasn't been

300
00:19:24,710 --> 00:19:28,470
Josh:
released before based on its previous data set

301
00:19:28,470 --> 00:19:31,370
Josh:
so it's not just pattern matching like you mentioned earlier where it has

302
00:19:31,370 --> 00:19:34,830
Josh:
this data set of questions it's kind of finding the right examples and

303
00:19:34,830 --> 00:19:37,710
Josh:
then applying that logic to the question it's actually

304
00:19:37,710 --> 00:19:40,910
Josh:
reasoning and it's it's reasoning in many instances and

305
00:19:40,910 --> 00:19:44,110
Josh:
then it's comparing its work and it's it's coming to a conclusion

306
00:19:44,110 --> 00:19:46,950
Josh:
and we saw this with the grok heavy model last week too when

307
00:19:46,950 --> 00:19:49,810
Josh:
it released um where i think the the new

308
00:19:49,810 --> 00:19:52,610
Josh:
meta is many instances solving hard

309
00:19:52,610 --> 00:19:55,930
Josh:
problems and then comparing so you lower that error rate more

310
00:19:55,930 --> 00:19:59,210
Josh:
and more and more each time and what we're seeing is great progress so

311
00:19:59,210 --> 00:20:04,130
Josh:
i mean although open ai and google are fighting again they're both they're both

312
00:20:04,130 --> 00:20:08,630
Josh:
fighting over over exciting progress and sure maybe one tried to sweep in and

313
00:20:08,630 --> 00:20:13,610
Josh:
steal the valor but they both did an excellent job in actually completing these

314
00:20:13,610 --> 00:20:19,170
Josh:
problems and placing gold in a test that was previously not possible to do from an ai model you

315
00:20:19,170 --> 00:20:21,250
Ejaaz:
Know who the real winners are here out of this josh.

316
00:20:21,250 --> 00:20:23,570
Josh:
Who's that high school kids

317
00:20:24,160 --> 00:20:27,640
Ejaaz:
Who now have an AI model that can do all their math homework for them.

318
00:20:28,100 --> 00:20:30,800
Josh:
Isn't that incredible? Like, man, think about it.

319
00:20:30,940 --> 00:20:31,540
Ejaaz:
I wish I had that.

320
00:20:31,780 --> 00:20:35,780
Josh:
You have an AI model that is as smart as the smartest people on planet Earth

321
00:20:35,780 --> 00:20:38,440
Josh:
in high school. If it could solve those math problems, it could solve anything.

322
00:20:38,700 --> 00:20:43,000
Ejaaz:
It sounds human as well, Josh. So, like, your teacher is going to struggle unless

323
00:20:43,000 --> 00:20:48,820
Ejaaz:
they use AI themselves to figure out whether you just did that yourself or completely

324
00:20:48,820 --> 00:20:51,920
Ejaaz:
just ran that through GPT, your mom's GPT subscription.

325
00:20:51,920 --> 00:20:54,620
Josh:
It really forces you to re-evaluate the school model, right?

326
00:20:54,840 --> 00:20:59,520
Josh:
Because now that this information is so readily accessible, it's so easy to solve these problems.

327
00:20:59,680 --> 00:21:04,700
Josh:
Is that the actual thing worth learning? Or is it how to use these tools that's

328
00:21:04,700 --> 00:21:06,220
Josh:
more important to get to the answer?

329
00:21:06,220 --> 00:21:09,040
Josh:
And there's this there's this dual pronged approach and we see we see

330
00:21:09,040 --> 00:21:12,000
Josh:
developers and programmers talk about this a lot where as soon

331
00:21:12,000 --> 00:21:14,900
Josh:
as they start to rely too heavily on the tools they start

332
00:21:14,900 --> 00:21:18,340
Josh:
to lose their touch they start to lose their ability to to deeply

333
00:21:18,340 --> 00:21:22,260
Josh:
understand how it reaches conclusions um but

334
00:21:22,260 --> 00:21:25,980
Josh:
is that worth it in exchange for getting to the answer much quicker and then

335
00:21:25,980 --> 00:21:29,620
Josh:
being able to seek many more answers i don't know it's weird dynamic if i was

336
00:21:29,620 --> 00:21:32,840
Josh:
a teacher i'd be worried because i mean similar to what we saw with the calculator

337
00:21:32,840 --> 00:21:38,380
Josh:
it just replace the thinking process and just yield you an answer and

338
00:21:38,380 --> 00:21:42,140
Ejaaz:
The thing with the calculator is like you you're

339
00:21:42,140 --> 00:21:44,940
Ejaaz:
using the calculator so it figures out the answer for you but you kind of

340
00:21:44,940 --> 00:21:48,400
Ejaaz:
loosely understand how it is working right you

341
00:21:48,400 --> 00:21:52,880
Ejaaz:
know what numbers it's crunching to get to that answer and then typically you

342
00:21:52,880 --> 00:21:57,300
Ejaaz:
do a few things on a calculator and then you get to your eventual answer for

343
00:21:57,300 --> 00:22:01,540
Ejaaz:
whatever the original question was the issue with or the concern that you're

344
00:22:01,540 --> 00:22:07,080
Ejaaz:
highlighting here with AI is it's doing really complex problems,

345
00:22:07,080 --> 00:22:11,740
Ejaaz:
which kids don't even need to understand in the first place just to get an answer,

346
00:22:11,980 --> 00:22:15,940
Ejaaz:
which they can then give to their teacher, get a grade and then go to university.

347
00:22:15,940 --> 00:22:19,060
Ejaaz:
But the kids don't actually learn actively in that process.

348
00:22:19,700 --> 00:22:24,360
Ejaaz:
And it's going to be a concerning trend if we see kids just trying to go from

349
00:22:24,360 --> 00:22:28,360
Ejaaz:
zero to 100% without understanding anything in between.

350
00:22:29,460 --> 00:22:30,580
Ejaaz:
A trend to watch.

351
00:22:31,160 --> 00:22:34,220
Josh:
This is our episode from a few weeks ago. Is AI making you dumber?

352
00:22:34,480 --> 00:22:37,480
Josh:
Yes. And I think that's just going to continue to be the question.

353
00:22:37,720 --> 00:22:41,080
Josh:
Oh, God. And I think the answer is it's all dependent on how you choose to use

354
00:22:41,080 --> 00:22:41,940
Josh:
the tools that you're given.

355
00:22:42,220 --> 00:22:46,820
Josh:
And if you use these tools as further leverage. So I'm sure these math olympiads

356
00:22:46,820 --> 00:22:50,940
Josh:
who can actually complete the problems would love to have this model to check

357
00:22:50,940 --> 00:22:53,740
Josh:
the problems and to work through the problems and to figure out shortcuts on

358
00:22:53,740 --> 00:22:54,440
Josh:
solving these problems.

359
00:22:54,620 --> 00:22:57,880
Josh:
Where if you deeply understand it, then this becomes an amazing tool to check

360
00:22:57,880 --> 00:22:59,380
Josh:
your work, to generate new questions for you.

361
00:22:59,560 --> 00:23:04,560
Josh:
It's a great study, buddy. or if you are not an olympiad and you still want

362
00:23:04,560 --> 00:23:07,500
Josh:
to get to the answer well you just kind of cheat your way through and you just

363
00:23:07,500 --> 00:23:11,060
Josh:
ask it for exactly what you want so it's that it's that split again and it's

364
00:23:11,060 --> 00:23:14,620
Josh:
up to the person to take their own agency solve their own problems and try to

365
00:23:14,620 --> 00:23:19,380
Josh:
use these for for tools of leverage instead of just problem solving machines that

366
00:23:19,380 --> 00:23:23,780
Ejaaz:
Actually reminds me of this tweet i saw yesterday josh um so what you're looking

367
00:23:23,780 --> 00:23:28,700
Ejaaz:
at here is a tweet from dave white dave White is a very prestigious investment

368
00:23:28,700 --> 00:23:32,720
Ejaaz:
slash research advisor at this fund called Paradigm,

369
00:23:32,800 --> 00:23:37,580
Ejaaz:
which basically it's a crypto fund, but it is one of the wealthiest funds out there.

370
00:23:37,740 --> 00:23:42,220
Ejaaz:
So a lot of the investments they made were massive wins. And a lot of the reasoning

371
00:23:42,220 --> 00:23:44,620
Ejaaz:
of those wins was from Dave White's analysis.

372
00:23:44,900 --> 00:23:50,700
Ejaaz:
He is a deeply thoughtful mathematician at his core, and he is famed for doing

373
00:23:50,700 --> 00:23:56,060
Ejaaz:
a lot of analyses on companies, mathematical analyses that have ended up, you know.

374
00:23:57,000 --> 00:24:01,060
Ejaaz:
Determining whether a fund puts $100 million in a company or zero, right?

375
00:24:01,120 --> 00:24:04,100
Ejaaz:
So a very important job worth hundreds of millions of dollars, right?

376
00:24:04,400 --> 00:24:08,600
Ejaaz:
And what he says here, basically, is him having an identity crisis,

377
00:24:08,600 --> 00:24:13,000
Ejaaz:
because he has looked up to the IMO, the International Math Olympiad.

378
00:24:13,220 --> 00:24:18,500
Ejaaz:
And he goes on to say in this tweet that subconsciously, whenever he's met a

379
00:24:18,500 --> 00:24:22,640
Ejaaz:
gold medalist IMO champion, he's always subconsciously thought that they were

380
00:24:22,640 --> 00:24:25,680
Ejaaz:
smarter than him, that he is more respecting of them.

381
00:24:25,920 --> 00:24:30,240
Ejaaz:
And now with this news that AI models basically can do his job for him,

382
00:24:30,400 --> 00:24:35,140
Ejaaz:
can reason better than him at some of these math problems, he now has an identity crisis.

383
00:24:35,280 --> 00:24:39,880
Ejaaz:
He doesn't know kind of where to go from this. And if people like Dave White

384
00:24:39,880 --> 00:24:45,400
Ejaaz:
is having this kind of like disillusioned sentiment from how smart AI is,

385
00:24:45,520 --> 00:24:49,360
Ejaaz:
you can imagine how this is going to happen for everyone else in all of the

386
00:24:49,360 --> 00:24:50,620
Ejaaz:
other sectors, Josh, right?

387
00:24:50,840 --> 00:24:54,180
Ejaaz:
It doesn't matter if you're a mathematician or an investment research advisor,

388
00:24:54,180 --> 00:24:59,140
Ejaaz:
you could be a technician in some kind of engineering industrial role,

389
00:24:59,260 --> 00:25:02,560
Ejaaz:
or you could be a teacher, or you could be a kid or a high schooler.

390
00:25:02,920 --> 00:25:06,780
Ejaaz:
I think this disillusionment is going to spread. And I think it's super important

391
00:25:06,780 --> 00:25:09,540
Ejaaz:
for people to kind of like evolve their thinking, like you said,

392
00:25:09,640 --> 00:25:13,140
Ejaaz:
Josh, and learn how to leverage these tools versus just consume.

393
00:25:13,640 --> 00:25:16,520
Josh:
Yeah, this is, I mean, this is crazy. There's a lot of people that are going

394
00:25:16,520 --> 00:25:21,840
Josh:
to have to adapt to this new world order of intelligence, where if you build

395
00:25:21,840 --> 00:25:25,720
Josh:
up your entire identity around being intelligent, well, perhaps you're going to have to alter the way

396
00:25:26,370 --> 00:25:29,970
Josh:
present yourself as intelligent because the meaning of intelligence is becoming

397
00:25:29,970 --> 00:25:34,550
Josh:
commoditized among these tools that are now reduced down to a single chat box.

398
00:25:34,970 --> 00:25:37,450
Ejaaz:
Yep. Benchmarks are going to have to reset themselves completely.

399
00:25:37,750 --> 00:25:42,130
Ejaaz:
But folks, that is the end of this episode. Thank you so much for tuning in again.

400
00:25:42,710 --> 00:25:46,070
Ejaaz:
Josh and I are going hammer and tong at Limitless.

401
00:25:46,290 --> 00:25:51,250
Ejaaz:
Our goal is to get you the hottest and trending topics and news fresh out the

402
00:25:51,250 --> 00:25:55,550
Ejaaz:
door, give you our commentary, our thoughts, and hopefully some useful insights for you.

403
00:25:55,550 --> 00:25:58,510
Ejaaz:
If you enjoyed this episode if you enjoyed any of our previous episodes please

404
00:25:58,510 --> 00:26:02,690
Ejaaz:
continue to share and spread them with all your friends and family and whoever

405
00:26:02,690 --> 00:26:05,930
Ejaaz:
you think might be interested in this we are getting tons of feedback from you

406
00:26:05,930 --> 00:26:09,990
Ejaaz:
guys and with every episode that we release we're getting better so please remember

407
00:26:09,990 --> 00:26:13,790
Ejaaz:
to like subscribe follow us it's hugely appreciative and helpful for us and

408
00:26:13,790 --> 00:26:14,530
Ejaaz:
we'll see you on the next one.