1
00:00:00,020 --> 00:00:03,700
Josh:
A few months ago, we told you to use Claude. Now we're telling you to switch

2
00:00:03,700 --> 00:00:06,720
Josh:
back because for those of you who aren't familiar, well, over Christmas break,

3
00:00:06,820 --> 00:00:09,820
Josh:
there was a major vibe shift where AI coding went from this like fun tool to

4
00:00:09,820 --> 00:00:12,400
Josh:
things that developers actually use when they're shipping code.

5
00:00:12,600 --> 00:00:15,460
Josh:
And even if you're not a developer, the amount of use cases and applications

6
00:00:15,460 --> 00:00:18,000
Josh:
that were created around that time were really strong.

7
00:00:18,200 --> 00:00:21,360
Josh:
And since then, Anthropic has gone on this generational run of shipping these

8
00:00:21,360 --> 00:00:24,920
Josh:
incredible products seemingly every single day that has turned Claude code into

9
00:00:24,920 --> 00:00:28,480
Josh:
this supercharged super app that is the place that EJS,

10
00:00:28,540 --> 00:00:33,220
Josh:
I know you've gone to, I've gone there too, in order to get all of our AI progress done.

11
00:00:33,360 --> 00:00:35,340
Josh:
Any work that we have, we've gone to Cloud Code.

12
00:00:35,500 --> 00:00:39,000
Josh:
Now, OpenAI has woken up. And over the last few weeks, Codex has shipped more

13
00:00:39,000 --> 00:00:41,440
Josh:
features than most companies ship in a year.

14
00:00:41,500 --> 00:00:44,480
Josh:
And I bet, I guarantee that you haven't heard of some of these features that

15
00:00:44,480 --> 00:00:45,860
Josh:
we're going to talk about in this episode.

16
00:00:46,060 --> 00:00:49,820
Josh:
The pendulum has fully swung back, or at least I believe, because I'm totally Codex-pilled.

17
00:00:49,980 --> 00:00:53,200
Josh:
And in this episode, we're going to kind of walk through the differences between

18
00:00:53,200 --> 00:00:57,760
Josh:
these two and why the model that you're using today probably won't be the model you're using tomorrow.

19
00:00:58,000 --> 00:01:00,840
Josh:
And I don't think we're going to convince you, but maybe we could show you why

20
00:01:00,840 --> 00:01:03,740
Josh:
you might want to consider using something else here.

21
00:01:03,920 --> 00:01:06,880
Ejaaz:
I just want to talk through some of the crazy stats here because the script

22
00:01:06,880 --> 00:01:11,660
Ejaaz:
has genuinely flipped. A few months ago, Claude Code was anything everyone could talk about.

23
00:01:11,820 --> 00:01:17,520
Ejaaz:
And every software engineer was using Claude Code. Every enterprise was installing it. It was crazy.

24
00:01:17,660 --> 00:01:21,200
Ejaaz:
But just over the last couple of weeks, specifically by the end of April,

25
00:01:21,730 --> 00:01:26,250
Ejaaz:
Chat GPT 5.5 was released, and that was plugged into the coding AI model.

26
00:01:26,390 --> 00:01:27,570
Ejaaz:
It's all one and the same.

27
00:01:27,950 --> 00:01:31,190
Ejaaz:
And OpenAI went on this code red run where they focus on nothing but building

28
00:01:31,190 --> 00:01:33,990
Ejaaz:
the best coding AI model and the best LLM.

29
00:01:34,150 --> 00:01:36,030
Ejaaz:
And the numbers show that it's worked.

30
00:01:36,210 --> 00:01:41,950
Ejaaz:
Over the last week, Codex has been downloaded over or installed over 46 million times.

31
00:01:42,290 --> 00:01:46,890
Ejaaz:
Cold code, under 500,000 times. Now, that is crazy to say, because if you look

32
00:01:46,890 --> 00:01:51,310
Ejaaz:
at the historical data, cold code downloads and installs has absolutely dwarfed

33
00:01:51,310 --> 00:01:53,810
Ejaaz:
Codex, but something changed over the last couple of weeks.

34
00:01:53,930 --> 00:01:58,930
Ejaaz:
That something was OpenAI putting out just a better model. You mentioned that

35
00:01:58,930 --> 00:02:00,030
Ejaaz:
you were Codex pill, Josh.

36
00:02:00,190 --> 00:02:04,750
Ejaaz:
I think so am I. I've spent the last couple of days playing around with Codex.

37
00:02:04,830 --> 00:02:09,350
Ejaaz:
This morning, we prepped a bunch of really cool demos, and it is just completely flipped script.

38
00:02:09,530 --> 00:02:13,090
Ejaaz:
But it's one thing saying it. It's another thing actually showing the direct

39
00:02:13,090 --> 00:02:15,990
Ejaaz:
comparison. So we created this visual artifact to kind of

40
00:02:16,370 --> 00:02:18,890
Ejaaz:
give you the scoreboard. And you can see it at the top here.

41
00:02:19,070 --> 00:02:23,710
Ejaaz:
It's OpenAI Codex at 11 and Anthropic Claude at two. But let me explain why.

42
00:02:23,910 --> 00:02:26,290
Ejaaz:
Okay. So number one, computer use.

43
00:02:26,830 --> 00:02:30,370
Ejaaz:
Codex and Claude code can use your computer.

44
00:02:30,570 --> 00:02:33,290
Ejaaz:
It can take over your desktop and it can like move your cursor around.

45
00:02:33,410 --> 00:02:37,950
Ejaaz:
Now, Claude pioneered it. There were the first ones there, but it was super slow.

46
00:02:38,170 --> 00:02:41,610
Ejaaz:
It kind of runs into a bunch of obstacles and you have to kind of like handhold

47
00:02:41,610 --> 00:02:43,510
Ejaaz:
it and improve it to do a bunch of different things.

48
00:02:43,950 --> 00:02:47,770
Ejaaz:
Codex is not only quicker than me, it's quicker than the average person.

49
00:02:47,910 --> 00:02:50,670
Ejaaz:
In fact, I can actually see the cursor move around so quickly.

50
00:02:50,710 --> 00:02:55,890
Ejaaz:
And it's like using a computer, but it's a superhuman and it can run pretty much 24-7 at this point.

51
00:02:56,350 --> 00:03:01,570
Ejaaz:
Long horizon autonomy. Codex can work for longer in a much more intelligent

52
00:03:01,570 --> 00:03:05,750
Ejaaz:
manner versus Claude code, which is, again, crazy to say because literally a

53
00:03:05,750 --> 00:03:07,790
Ejaaz:
month ago, it was the inverse of this.

54
00:03:07,950 --> 00:03:11,630
Ejaaz:
Claude right now can run for a decent number of times or amount of time,

55
00:03:11,730 --> 00:03:13,810
Ejaaz:
but not as long as Codex can.

56
00:03:14,250 --> 00:03:17,090
Ejaaz:
And then the last two that I want to talk about here is browser use.

57
00:03:17,290 --> 00:03:22,010
Ejaaz:
So Codex can take over your browser. It can do a lot more intentional things.

58
00:03:22,070 --> 00:03:25,670
Ejaaz:
It understands what it's looking at, very importantly. Previously, it could not do that.

59
00:03:25,970 --> 00:03:29,130
Ejaaz:
Claude can do the same, but not as intelligently. And then finally,

60
00:03:29,670 --> 00:03:35,510
Ejaaz:
ChatGPT Images 2.0 got released, what was it, like two weeks ago now? Oh, it's so good.

61
00:03:35,610 --> 00:03:40,070
Ejaaz:
Yeah, it's the image generation model from OpenAI, and it is absolutely astounding.

62
00:03:40,170 --> 00:03:44,330
Ejaaz:
In fact, it beat all the other predecessors, including Google's,

63
00:03:44,330 --> 00:03:49,970
Ejaaz:
what is it, Nano Banana 2.0 Pro, which previously held the lead.

64
00:03:50,110 --> 00:03:51,890
Ejaaz:
It beat it across every single benchmark.

65
00:03:52,370 --> 00:03:54,970
Ejaaz:
Anthropic, on the other hand, doesn't even have an image gen model.

66
00:03:55,110 --> 00:03:57,270
Ejaaz:
So, so far, it's crushing. Yeah.

67
00:03:57,650 --> 00:04:01,030
Josh:
Yeah, I think a lot of the best is now bundled into codex. The image gen for

68
00:04:01,030 --> 00:04:04,590
Josh:
anyone who uses any sort of visual work is unbelievable. And being able to use

69
00:04:04,590 --> 00:04:07,050
Josh:
that directly in your software is awesome.

70
00:04:07,390 --> 00:04:10,630
Josh:
One thing that you mentioned is the long horizon autonomy. I think that needs

71
00:04:10,630 --> 00:04:14,410
Josh:
a double clicking on because it's really impressive how well it works.

72
00:04:14,830 --> 00:04:17,090
Josh:
Traditionally, there's been this thing called a Ralph loop that we use.

73
00:04:17,190 --> 00:04:20,230
Josh:
It's actually named after the character from The Simpsons who is very persistent.

74
00:04:20,470 --> 00:04:25,130
Josh:
And it's basically a planning mode where you give the AI a goal and it will

75
00:04:25,130 --> 00:04:27,550
Josh:
continue to iterate towards that goal until it accomplishes it.

76
00:04:27,630 --> 00:04:32,030
Josh:
So like, let's say you want to build a Lego car or something and you give it the exact parameters.

77
00:04:32,130 --> 00:04:35,150
Josh:
It will go and go and go until it solves that problem and gives you exactly

78
00:04:35,150 --> 00:04:37,470
Josh:
what you want in a way that other AI models haven't.

79
00:04:37,730 --> 00:04:41,090
Josh:
Codex did that. And this is the only native implementation that you can get

80
00:04:41,090 --> 00:04:44,950
Josh:
of this long horizon thinking where it actually will go for days on end.

81
00:04:45,030 --> 00:04:48,610
Josh:
I've seen screenshots of some thinking for as long as 36 hours to accomplish the goal.

82
00:04:48,710 --> 00:04:52,590
Josh:
So if you have really difficult tasks, Codex is going to be really good at solving those.

83
00:04:52,670 --> 00:04:55,390
Josh:
Now, continuing to scroll down, there was another feature that was just released

84
00:04:55,390 --> 00:04:56,950
Josh:
this week called auto review.

85
00:04:57,170 --> 00:05:01,190
Josh:
And a huge pain in the ass for people who are creating code for working on complex

86
00:05:01,190 --> 00:05:05,170
Josh:
projects, whatever it may be, is you're constantly having to sit there and approve

87
00:05:05,170 --> 00:05:07,750
Josh:
things because the permission system is a little finicky, right?

88
00:05:07,810 --> 00:05:09,370
Josh:
You don't want to give it full access to your computer.

89
00:05:10,180 --> 00:05:13,160
Josh:
You also don't want to sit there and approving every time it wants to use Chrome

90
00:05:13,160 --> 00:05:14,740
Josh:
or every time I want to access your file.

91
00:05:15,200 --> 00:05:19,940
Josh:
So Codex created auto review and they rolled it out last week where the agent is kind of smart.

92
00:05:20,100 --> 00:05:24,980
Josh:
It knows which things are going to possibly be systemic existential threats

93
00:05:24,980 --> 00:05:26,060
Josh:
and which approvals aren't.

94
00:05:26,140 --> 00:05:29,900
Josh:
And it will just automatically approve all the things that aren't going to get you in a lot of trouble.

95
00:05:30,020 --> 00:05:33,120
Josh:
It creates a much easier user interface where you can just kind of walk away

96
00:05:33,120 --> 00:05:35,940
Josh:
from the computer for a little while and come back and things get done.

97
00:05:35,940 --> 00:05:38,680
Josh:
Memory and context is pretty strong i'd say

98
00:05:38,680 --> 00:05:41,520
Josh:
the one thing and we haven't mentioned many claude winners the

99
00:05:41,520 --> 00:05:44,780
Josh:
place where claude wins currently is on their open claw capability funny

100
00:05:44,780 --> 00:05:49,060
Josh:
enough because open ai bought open claw but dispatch is the mobile app feature

101
00:05:49,060 --> 00:05:53,480
Josh:
for claude in which you can actually engage with claude code remotely that doesn't

102
00:05:53,480 --> 00:05:57,020
Josh:
currently exist on codex and while the team has promised to ship that you don't

103
00:05:57,020 --> 00:06:01,160
Josh:
actually have that currently today claude has that also in terms of the personality

104
00:06:01,160 --> 00:06:04,440
Josh:
and ui claude is just so much better i think we're going to get into our.

105
00:06:04,720 --> 00:06:08,180
Josh:
Personal takes but whenever you're using an llm versus an actual

106
00:06:08,180 --> 00:06:10,980
Josh:
tool set or a harness claude is pretty great and the

107
00:06:10,980 --> 00:06:13,740
Josh:
ui is very warm so there's there's some kind of

108
00:06:13,740 --> 00:06:17,140
Josh:
instances in which claude is better but for the most part codex is

109
00:06:17,140 --> 00:06:20,220
Josh:
really just kind of crushing it and i've really enjoyed using it one of the

110
00:06:20,220 --> 00:06:25,400
Josh:
fun things is pets i mean just recently they released pets and claude also released

111
00:06:25,400 --> 00:06:28,800
Josh:
pets but these pets are a little bit different this is an example of angry dario

112
00:06:28,800 --> 00:06:32,260
Josh:
we're seeing on the screen and it's fun because you have this persistent character

113
00:06:32,260 --> 00:06:34,740
Josh:
that exists throughout your computer use.

114
00:06:34,940 --> 00:06:38,440
Josh:
And as you're engaging with Codex, it'll just kind of chat with you in the background

115
00:06:38,440 --> 00:06:39,920
Josh:
so you can see your progress, see where you're at.

116
00:06:40,040 --> 00:06:43,220
Josh:
It's fun, it's playful, and it just shows that they kind of care about the user experience.

117
00:06:43,380 --> 00:06:47,240
Josh:
Now, one feature I would guarantee most people don't know is Chronicle, EJS.

118
00:06:47,300 --> 00:06:48,920
Josh:
And you were just telling me about Chronicle and how cool it is,

119
00:06:49,000 --> 00:06:52,080
Josh:
how it kind of monitors your screen as you go. This seems like novel technology

120
00:06:52,080 --> 00:06:53,100
Josh:
that we haven't seen yet.

121
00:06:53,380 --> 00:06:58,340
Ejaaz:
Yeah, so one of the earliest episodes that we did here on Limitless was an interview

122
00:06:58,340 --> 00:07:00,740
Ejaaz:
with the folks at OpenAI that created...

123
00:07:01,240 --> 00:07:03,880
Ejaaz:
Something called, what was it called, Josh? Do you remember?

124
00:07:04,080 --> 00:07:07,720
Ejaaz:
It was like agent mode or personal mode, something like that.

125
00:07:08,240 --> 00:07:10,060
Josh:
Yes. It thought overnight for you, right? Yes.

126
00:07:10,200 --> 00:07:15,240
Ejaaz:
It basically took all the conversations that you'd had with ChatGPT the night

127
00:07:15,240 --> 00:07:20,020
Ejaaz:
before or the day before or the week before, and it created important context

128
00:07:20,020 --> 00:07:22,860
Ejaaz:
around you in the form of something called memories.

129
00:07:22,860 --> 00:07:27,480
Ejaaz:
This is where AI memory was birthed from OpenAI themselves, from the OpenAI

130
00:07:27,480 --> 00:07:31,720
Ejaaz:
team. And what it would do is it would feed you a report in the morning that

131
00:07:31,720 --> 00:07:35,340
Ejaaz:
would update you on information that it thought you would be interested to read about.

132
00:07:35,520 --> 00:07:37,900
Ejaaz:
So say, for example, you were interested in the stock market,

133
00:07:37,920 --> 00:07:40,900
Ejaaz:
it'll give you an update on a bunch of advancements that had happened overnight

134
00:07:40,900 --> 00:07:42,820
Ejaaz:
or over the last week or whatever it might be.

135
00:07:43,140 --> 00:07:47,580
Ejaaz:
Right now, fast forward today, memory is embedded across every single AI model

136
00:07:47,580 --> 00:07:50,780
Ejaaz:
and tool. The reason why is context is so important.

137
00:07:50,940 --> 00:07:54,300
Ejaaz:
It's one thing a user asking for something explicitly and directly.

138
00:07:54,300 --> 00:07:58,160
Ejaaz:
It's a complete other thing for an AI to actually understand what you mean,

139
00:07:58,400 --> 00:08:01,420
Ejaaz:
the nuance in the sentence that you've created, and even better,

140
00:08:01,660 --> 00:08:03,320
Ejaaz:
to predict what you want.

141
00:08:03,460 --> 00:08:07,760
Ejaaz:
But there was still an obstacle, which was you needed to feed it the context

142
00:08:07,760 --> 00:08:10,780
Ejaaz:
and say, hey, Claude, hey, ChatGPT, can you remember this?

143
00:08:11,060 --> 00:08:18,340
Ejaaz:
OpenAI recently released a feature called Chronicle, where it observes what you scroll through,

144
00:08:18,700 --> 00:08:23,740
Ejaaz:
What you click on, what you type, and it builds its own context and memories

145
00:08:23,740 --> 00:08:28,120
Ejaaz:
around you without you needing to feed it, which actually led to a really cool

146
00:08:28,120 --> 00:08:31,340
Ejaaz:
prompt that you pointed out, Josh, or that you found, which was,

147
00:08:31,570 --> 00:08:35,170
Ejaaz:
what have I been doing very inefficiently on my computer, according to Chronicle,

148
00:08:35,510 --> 00:08:38,230
Ejaaz:
which is this new memory feature, make some recommendations,

149
00:08:38,490 --> 00:08:41,890
Ejaaz:
be direct, tell me what I need to hear. That's, that's pretty awesome.

150
00:08:42,170 --> 00:08:45,270
Josh:
Yeah. So this is alpha because I don't think a lot of people recognize that

151
00:08:45,270 --> 00:08:49,010
Josh:
this is a possibility because Codex and OpenAI didn't do a good job of explaining this.

152
00:08:49,110 --> 00:08:52,550
Josh:
When they released Chronicle, they said it's a way of the system to review your

153
00:08:52,550 --> 00:08:55,830
Josh:
code as you've gone because it's been taking sequential screenshots.

154
00:08:55,990 --> 00:08:58,130
Josh:
But it's the reality is, is that it's much bigger than this.

155
00:08:58,170 --> 00:09:01,710
Josh:
And I suspect they didn't market it this way because it could be a bit of a privacy issue,

156
00:09:01,710 --> 00:09:05,210
Josh:
but it's essentially constantly monitoring your screen and taking screenshots

157
00:09:05,210 --> 00:09:09,130
Josh:
of what's happening on your screen and interpreting it so it understands your

158
00:09:09,130 --> 00:09:11,530
Josh:
habits, the way that you work, the thing that you do.

159
00:09:11,750 --> 00:09:14,750
Josh:
And then you can ask it, what have I been doing very inefficiently on my computer?

160
00:09:14,890 --> 00:09:17,190
Josh:
According to Chronicle, make some recommendations, be direct,

161
00:09:17,350 --> 00:09:18,150
Josh:
tell me what I need to hear.

162
00:09:18,250 --> 00:09:21,110
Josh:
And it'll actually evaluate how you've been using your computer,

163
00:09:21,310 --> 00:09:24,570
Josh:
how long you've been scrolling on Twitter, perhaps, how long you haven't been

164
00:09:24,570 --> 00:09:27,130
Josh:
doing the things you're supposed to be working on, or just generally how to

165
00:09:27,130 --> 00:09:31,890
Josh:
improve your workflow and give you real feedback based on your actual actions that it's seen.

166
00:09:31,990 --> 00:09:35,370
Josh:
And I think this is a super powerful thing currently only available to pro members.

167
00:09:35,450 --> 00:09:39,490
Josh:
So if you pay for the $100, $200 a month subscription, you get access to this.

168
00:09:39,590 --> 00:09:42,970
Josh:
But I suspect this is the early signs of a very important feature they're going

169
00:09:42,970 --> 00:09:48,470
Josh:
to roll out, which is that entire computer monitoring system to improve your

170
00:09:48,470 --> 00:09:51,650
Josh:
system and also probably train the models to get better at engaging with your system.

171
00:09:51,750 --> 00:09:54,910
Josh:
But I found Chronicle to be one of those kind of secret features that not a

172
00:09:54,910 --> 00:09:57,830
Josh:
lot of people know about, but has a lot of upside if you use it to your advantage

173
00:09:57,830 --> 00:10:01,070
Josh:
and let it monitor what you're doing and improve your workflow on a day-to-day basis.

174
00:10:01,650 --> 00:10:05,110
Ejaaz:
So the point is, from both of these companies, Anthropic and OpenAI,

175
00:10:05,290 --> 00:10:09,470
Ejaaz:
we are getting feature releases every single week. In fact, every single day.

176
00:10:09,910 --> 00:10:13,010
Ejaaz:
And it's becoming, I'm being bombarded by this.

177
00:10:13,150 --> 00:10:16,630
Ejaaz:
And it's hard to keep track with all of this. So what is the number one litmus

178
00:10:16,630 --> 00:10:19,350
Ejaaz:
test for both of these models and products and companies?

179
00:10:19,670 --> 00:10:23,090
Ejaaz:
It's to actually use the thing. It's to build the thing.

180
00:10:23,330 --> 00:10:27,470
Ejaaz:
And we have two special demos that we have prepared for you that we're about

181
00:10:27,470 --> 00:10:30,550
Ejaaz:
to jump into. Now, Josh, can you guess what my first demo is about?

182
00:10:30,690 --> 00:10:33,430
Josh:
The theme. First one's a game. We're gamers, man. I want to play a game.

183
00:10:33,570 --> 00:10:34,810
Josh:
I want to see how well it does on a game.

184
00:10:34,970 --> 00:10:38,630
Josh:
I know we did this demo in the past months ago. It left a lot to be desired.

185
00:10:38,710 --> 00:10:42,550
Josh:
So I'm curious to see the current up-to-date status as it relates to Cloud Code

186
00:10:42,550 --> 00:10:45,410
Josh:
versus Codex. Who's winning on the one-shot game prompt?

187
00:10:45,750 --> 00:10:49,690
Ejaaz:
Indeed. Okay. So I am a nostalgic kind of guy. And so I was like,

188
00:10:49,810 --> 00:10:51,830
Ejaaz:
oh, back in the day, I loved Mario.

189
00:10:52,070 --> 00:10:58,110
Ejaaz:
So I want you, both of these models, to create the best Mario type or inspired

190
00:10:58,110 --> 00:11:00,110
Ejaaz:
game, a side scroller, but make it futuristic.

191
00:11:00,780 --> 00:11:03,560
Ejaaz:
Maybe add a little bit of neon, sprinkle a bit of neon in there,

192
00:11:04,020 --> 00:11:06,960
Ejaaz:
create levels. I want game design. I want there to be enemies.

193
00:11:07,080 --> 00:11:08,200
Ejaaz:
I want there to be pitfalls.

194
00:11:08,380 --> 00:11:11,560
Ejaaz:
And I also want there to be a scoreboard and also tell me how to do this thing.

195
00:11:11,640 --> 00:11:13,020
Ejaaz:
I want, give me the whole package.

196
00:11:13,140 --> 00:11:17,940
Ejaaz:
Basically, I fed this prompt or idea into ChatGPT and Claude.

197
00:11:18,020 --> 00:11:21,740
Ejaaz:
And I said, can you create a detailed prompt that I can then feed into your coding models?

198
00:11:21,920 --> 00:11:25,300
Ejaaz:
I then set each of the coding models to their highest settings.

199
00:11:25,480 --> 00:11:28,200
Ejaaz:
So what you're about to see is the best of the best for the most

200
00:11:28,200 --> 00:11:30,960
Ejaaz:
detailed prompt that they came up with and let's see what they

201
00:11:30,960 --> 00:11:33,960
Ejaaz:
did so step number one or example number

202
00:11:33,960 --> 00:11:37,260
Ejaaz:
one is called opus 4.7 so

203
00:11:37,260 --> 00:11:40,440
Ejaaz:
this is called code at the highest setting with their latest model

204
00:11:40,440 --> 00:11:46,240
Ejaaz:
um okay it took the prompt pretty literally it's titled this neon plumber moon

205
00:11:46,240 --> 00:11:49,940
Ejaaz:
base run which is obviously mario inspired and it said hey this is a demo edition

206
00:11:49,940 --> 00:11:53,160
Ejaaz:
by the way this is not production ready what i like about this is it's giving

207
00:11:53,160 --> 00:11:57,500
Ejaaz:
me the instructions but how does the game actually play out let's see it looks

208
00:11:57,500 --> 00:11:59,420
Ejaaz:
good can you see me here josh i

209
00:11:59,420 --> 00:12:01,580
Josh:
Can yes i can and it looks like.

210
00:12:01,580 --> 00:12:04,300
Ejaaz:
The animations are pretty good i'm jumping around i think

211
00:12:04,300 --> 00:12:07,380
Ejaaz:
i'm like a little robot i can see my feet pitter-pattering now

212
00:12:07,380 --> 00:12:10,980
Ejaaz:
i'm guessing this thing is about to kill me so let's see if i can jump oh i

213
00:12:10,980 --> 00:12:16,420
Ejaaz:
can jump there we go that's awesome um one bit can i kill this guy oh yes i

214
00:12:16,420 --> 00:12:22,180
Ejaaz:
can now one bit of feedback i've noticed is uh i can't double jump and it told

215
00:12:22,180 --> 00:12:23,460
Ejaaz:
me in the menu that i could double

216
00:12:23,460 --> 00:12:26,800
Ejaaz:
jump so that's weird so the physics hasn't really paid off can i die

217
00:12:28,650 --> 00:12:30,590
Josh:
Oh, it certainly looks like you could die.

218
00:12:30,770 --> 00:12:35,650
Ejaaz:
I can die. Great. Okay. So that is Claude's attempt at it. What's your feedback

219
00:12:35,650 --> 00:12:37,350
Ejaaz:
on this, Josh? I think the graphics are pretty good.

220
00:12:37,470 --> 00:12:41,090
Josh:
The graphics are great. For one shot, I mean, granted, this is only one single

221
00:12:41,090 --> 00:12:43,310
Josh:
prompt. So for one prompt, it created great graphics.

222
00:12:43,470 --> 00:12:46,670
Josh:
It had sound design that actually sounds pretty accurate to what you would expect in the game.

223
00:12:46,810 --> 00:12:49,990
Josh:
It has similar principles. It's following gaming principles.

224
00:12:50,130 --> 00:12:52,710
Josh:
You kind of understand what looks dangerous, what doesn't.

225
00:12:52,910 --> 00:12:55,130
Josh:
You knew that those spikes were going to hurt you and they hurt you.

226
00:12:55,130 --> 00:13:00,210
Josh:
The logic seems to be a little bit flawed i think it's having problems with gravity or at least that

227
00:13:00,430 --> 00:13:03,310
Josh:
double jump functionality because it looks like those coins that you probably

228
00:13:03,310 --> 00:13:06,390
Josh:
want to collect you can't actually reach because you can't do the double jump

229
00:13:06,390 --> 00:13:11,190
Josh:
so in terms of logic not so hot in terms of visuals aesthetics in terms of i

230
00:13:11,190 --> 00:13:14,590
Josh:
mean how good this game is from one shot very impressive yeah.

231
00:13:14,590 --> 00:13:20,030
Ejaaz:
I think it's important to understand that i started from zero it literally asked

232
00:13:20,030 --> 00:13:23,430
Ejaaz:
me to give it a folder to build in and the folder was completely empty.

233
00:13:23,670 --> 00:13:27,470
Ejaaz:
So all the visual renderings, all the graphics, the animation style,

234
00:13:27,630 --> 00:13:32,410
Ejaaz:
the scoring system, the way that the avatar moves and looks was created from

235
00:13:32,410 --> 00:13:34,890
Ejaaz:
scratch from a bunch of characters from this AI model.

236
00:13:35,010 --> 00:13:39,110
Ejaaz:
So this is Claude Code's current best attempt and it is way better than what

237
00:13:39,110 --> 00:13:42,870
Ejaaz:
we tested out and honestly demoed on this show about a month ago.

238
00:13:42,970 --> 00:13:49,430
Ejaaz:
But now let's see what OpenAI's ChatGPT 5.5 codex at the highest possible setting cooked up

239
00:13:49,430 --> 00:13:52,350
Josh:
Okay and this is using the same prompt correct so you just fed the model the

240
00:13:52,350 --> 00:13:56,510
Josh:
same prompt identical but identical right oh god i'm excited i hope codex did

241
00:13:56,510 --> 00:13:59,930
Josh:
well because now now that i'm a fan i'm gassing it up it better perform here okay.

242
00:13:59,930 --> 00:14:02,630
Ejaaz:
So this is gpt 5.5's attempt now you

243
00:14:02,630 --> 00:14:05,610
Ejaaz:
might notice that this isn't the entire browser that's because

244
00:14:05,610 --> 00:14:08,770
Ejaaz:
codex has a very unique feature which is not only

245
00:14:08,770 --> 00:14:11,390
Ejaaz:
can it do all the coding in a single app for you but it

246
00:14:11,390 --> 00:14:14,110
Ejaaz:
has an in-app browser so it can

247
00:14:14,110 --> 00:14:17,770
Ejaaz:
live test the thing in the app without you needing to go to google chrome or

248
00:14:17,770 --> 00:14:22,710
Ejaaz:
whatever but anyway we have the starting screen here it has also called it neo

249
00:14:22,710 --> 00:14:27,210
Ejaaz:
neon plumber moon base run it looks a little more rudimentary from the start

250
00:14:27,210 --> 00:14:31,110
Ejaaz:
but i do like the background animation josh we didn't get this in the previous

251
00:14:31,110 --> 00:14:33,150
Ejaaz:
one or at least not this side scrolling thing well let's

252
00:14:33,740 --> 00:14:34,120
Ejaaz:
Oh.

253
00:14:34,500 --> 00:14:35,480
Josh:
Oh, this is nice.

254
00:14:35,740 --> 00:14:38,680
Ejaaz:
This is nice. I think this has good logic.

255
00:14:39,360 --> 00:14:43,640
Ejaaz:
Wait, but this is no music. There's no music. I can't double jump.

256
00:14:44,100 --> 00:14:46,320
Ejaaz:
Might be a skill issue. Might be a prompt issue.

257
00:14:46,580 --> 00:14:48,620
Josh:
Let's have a look. Did it say you can double jump?

258
00:14:49,140 --> 00:14:50,380
Ejaaz:
That's a good question, actually.

259
00:14:50,500 --> 00:14:52,940
Josh:
This is a fully playable game.

260
00:14:53,120 --> 00:14:57,960
Ejaaz:
Yes. And I like that it's like zoomed in. There's like... Oh,

261
00:14:58,040 --> 00:15:00,820
Ejaaz:
we got the boost. I can jump on the platforms. Let's see if I can kill this guy.

262
00:15:02,520 --> 00:15:03,720
Josh:
Yes. nice okay.

263
00:15:03,720 --> 00:15:05,720
Ejaaz:
And can i jump the gap there's a scoring system

264
00:15:05,720 --> 00:15:09,040
Josh:
You could see your hearts oh dude this is way better power.

265
00:15:09,040 --> 00:15:13,300
Ejaaz:
Up wait oh my god i want the power up i'm still gonna go back double jump

266
00:15:13,300 --> 00:15:16,540
Josh:
You can you could go back go back to the last platform oh god.

267
00:15:16,540 --> 00:15:19,000
Ejaaz:
I died i'm going i'm going to the last platform here

268
00:15:19,000 --> 00:15:22,380
Josh:
We go it looks like they're sequentially gaining height which is interesting,

269
00:15:23,000 --> 00:15:25,920
Josh:
oh but okay so if i'm comparing these two i'm actually i'm not feeling very

270
00:15:25,920 --> 00:15:29,320
Josh:
let down this is good aside from the music not existing which we may not have

271
00:15:29,320 --> 00:15:33,520
Josh:
explicitly asked um this it looks like the logic plays better the actual gameplay

272
00:15:33,520 --> 00:15:38,660
Josh:
is usable this is a full i don't know if it's glitching or if this is you glitching no no.

273
00:15:38,660 --> 00:15:40,460
Ejaaz:
That is it's glitching it's glitching a bit

274
00:15:40,460 --> 00:15:43,300
Josh:
Okay so it's still there are some edge case errors yeah but

275
00:15:43,300 --> 00:15:46,600
Josh:
this is different in the sense that you have your hearts clearly projected you

276
00:15:46,600 --> 00:15:49,920
Josh:
have a score system that's clearly in place you're able to get these power-ups

277
00:15:49,920 --> 00:15:53,620
Josh:
they work they function i mean this is a very clean and functional game so i

278
00:15:53,620 --> 00:15:58,880
Josh:
would give this to codex i think the experience perhaps the design of claude

279
00:15:58,880 --> 00:16:01,440
Josh:
was better and And perhaps the music,

280
00:16:01,600 --> 00:16:03,820
Josh:
I mean, music was definitely better versus none, but Claude,

281
00:16:04,000 --> 00:16:07,000
Josh:
in terms of just, or Codex, in terms of just coding logic and making a better

282
00:16:07,000 --> 00:16:09,860
Josh:
game, I give, I give this Codex. Do you have a take?

283
00:16:10,580 --> 00:16:15,600
Ejaaz:
Yeah. So on the build side of things, I had a much more pleasant experience.

284
00:16:16,470 --> 00:16:19,230
Ejaaz:
Using codex as well so i think codex wins

285
00:16:19,230 --> 00:16:22,230
Ejaaz:
on this um i one-shotted it in the true sense

286
00:16:22,230 --> 00:16:24,930
Ejaaz:
where i just gave it a single prompt and codex didn't ask for

287
00:16:24,930 --> 00:16:28,410
Ejaaz:
any permissions it just kind of went on and did the thing i saw

288
00:16:28,410 --> 00:16:34,930
Ejaaz:
it it's thinking and at points where it was unsure it thought amongst itself

289
00:16:34,930 --> 00:16:39,110
Ejaaz:
and then made the decision to progress forwards whereas with cloud code it would

290
00:16:39,110 --> 00:16:43,610
Ejaaz:
come to me now that might just be a developer engineer's preference right like

291
00:16:43,610 --> 00:16:45,770
Ejaaz:
if you're building a production ready app for like, I don't know,

292
00:16:46,050 --> 00:16:49,890
Ejaaz:
a big company that you work for, you probably want to have more hands-on involvement.

293
00:16:50,070 --> 00:16:52,670
Ejaaz:
Whereas if you're just building a game like we did today, where I don't really

294
00:16:52,670 --> 00:16:56,710
Ejaaz:
care what it ends up looking like or what it does, then the hands-off preference

295
00:16:56,710 --> 00:17:00,770
Ejaaz:
is probably something that you would use Codex for. But I think Codex wins this.

296
00:17:01,290 --> 00:17:04,890
Josh:
So for our second demo, we have this handwritten piece of paper that I actually

297
00:17:04,890 --> 00:17:06,090
Josh:
wrote and took a picture of.

298
00:17:06,470 --> 00:17:10,150
Josh:
I didn't. It's GPT Image Gen 2.0, but it looks like it's handwritten.

299
00:17:10,250 --> 00:17:11,350
Josh:
The handwriting was too nice.

300
00:17:11,450 --> 00:17:12,250
Ejaaz:
Josh. That was the giveaway.

301
00:17:12,250 --> 00:17:15,730
Josh:
Yeah, my handwriting is far sloppier than this. But the idea is that you can

302
00:17:15,730 --> 00:17:19,370
Josh:
even write things on the back of a napkin and you could turn that into an application.

303
00:17:19,570 --> 00:17:23,230
Josh:
So what we did here is we just asked for it to create a generic limitless dashboard

304
00:17:23,230 --> 00:17:27,590
Josh:
application on the back of a piece of paper, fed it into the model, and this is what we got.

305
00:17:27,770 --> 00:17:29,930
Josh:
So it looks like it did a pretty good job.

306
00:17:30,170 --> 00:17:33,970
Josh:
I could tell this is Claude before you even tell me which model it is because

307
00:17:33,970 --> 00:17:35,630
Josh:
it has the standard design principles.

308
00:17:36,150 --> 00:17:39,950
Josh:
Claude design is so basic and

309
00:17:39,950 --> 00:17:42,670
Josh:
it's so predictable where like okay i've seen this

310
00:17:42,670 --> 00:17:45,430
Josh:
dashboard before it looks like it was a mission success there's a

311
00:17:45,430 --> 00:17:48,050
Josh:
lot of text on this page a lot of stuff going on a lot

312
00:17:48,050 --> 00:17:51,370
Josh:
of graphics i give a lot of credit for kind of inferring what

313
00:17:51,370 --> 00:17:54,090
Josh:
we would want to be seeing from something like this where we have a

314
00:17:54,090 --> 00:17:59,050
Josh:
proper trip budget i don't think we asked for a trip budget um but okay i think

315
00:17:59,050 --> 00:18:02,990
Josh:
it looks like it made it did a lot of inferring right like it kind of made a

316
00:18:02,990 --> 00:18:05,250
Josh:
lot of assumptions but in the end of the day it did take what we had on the

317
00:18:05,250 --> 00:18:10,910
Josh:
napkin and it turned it into a pretty generic dashboard of sorts based on very

318
00:18:10,910 --> 00:18:12,130
Josh:
limited information that we gave it.

319
00:18:12,830 --> 00:18:15,670
Ejaaz:
I think the issue with this is we asked for something

320
00:18:15,670 --> 00:18:19,110
Ejaaz:
completely different it created a dashboard um but

321
00:18:19,110 --> 00:18:22,170
Ejaaz:
we asked it for it to be based around the limitless podcast and

322
00:18:22,170 --> 00:18:24,790
Ejaaz:
it created a travel planning board so i don't know

323
00:18:24,790 --> 00:18:27,650
Ejaaz:
whether that was a a prompt issue or whether we just fed

324
00:18:27,650 --> 00:18:30,510
Ejaaz:
it the wrong image but but here we go here is where we're

325
00:18:30,510 --> 00:18:37,690
Ejaaz:
at um now let's take a look at what openai did okay so here we have the same

326
00:18:37,690 --> 00:18:43,770
Ejaaz:
prompt fed into gpt 5.5 and it's funny i can instantly tell this is GPT-515

327
00:18:43,770 --> 00:18:48,570
Ejaaz:
because it's cleaner and it's not neon and it's not trying to go for some futuristic spin.

328
00:18:48,930 --> 00:18:53,590
Ejaaz:
It looks very simplistic. This is actually a website or app that I would probably

329
00:18:53,590 --> 00:18:55,430
Ejaaz:
be more inclined to engage with.

330
00:18:55,630 --> 00:18:58,110
Ejaaz:
It's also more visually perceptive to me, right?

331
00:18:58,370 --> 00:19:03,490
Ejaaz:
Like, what do I have at the front here? It's this five-day trip that I want to go on.

332
00:19:03,610 --> 00:19:06,210
Ejaaz:
It's giving me the basic information that I need to know at the start.

333
00:19:06,210 --> 00:19:08,310
Ejaaz:
It has a bunch of different tabs as well.

334
00:19:08,410 --> 00:19:12,110
Ejaaz:
But again, it isn't what I specified on the napkin. So I think this might be

335
00:19:12,110 --> 00:19:15,390
Ejaaz:
a skill to show on our side, Josh. But otherwise, like, look at these graphics.

336
00:19:15,610 --> 00:19:18,630
Ejaaz:
They're like really good. One thing I've noticed is stylistically,

337
00:19:19,030 --> 00:19:26,010
Ejaaz:
although both models create very different looking things, the animation style looks the same.

338
00:19:26,270 --> 00:19:29,250
Ejaaz:
Have you noticed that even with the game previously that we just demoed,

339
00:19:29,370 --> 00:19:31,030
Ejaaz:
the avatar looked the same.

340
00:19:31,170 --> 00:19:35,970
Ejaaz:
It was given the same sort of title and the objects interacted in the same way

341
00:19:35,970 --> 00:19:41,670
Ejaaz:
we're seeing this here so maybe it's just a change in quality i actually prefer gpt 5.5 on this one

342
00:19:41,670 --> 00:19:44,550
Josh:
Yeah this is crazy i'm just going to suspect

343
00:19:44,550 --> 00:19:48,690
Josh:
there was a prompt issue there where yes like we clearly we asked for something

344
00:19:48,690 --> 00:19:51,950
Josh:
that we didn't actually want but here it is i think if you're just comparing

345
00:19:51,950 --> 00:19:56,850
Josh:
them apples to apples uh chat gpt and codex is like no-brainer 10 times better

346
00:19:56,850 --> 00:20:01,010
Josh:
i far prefer this if you look at the original napkin photo this is much more

347
00:20:01,010 --> 00:20:04,150
Josh:
accurate to what the design looked like on that original piece of paper.

348
00:20:04,330 --> 00:20:07,410
Josh:
And then if you also just compare the general design, this is far easier to understand.

349
00:20:07,670 --> 00:20:11,290
Josh:
It's just a lot less dense. It's designed better. I wouldn't even say this is really...

350
00:20:12,190 --> 00:20:15,870
Josh:
A fair comparison it seems like codex just like completely crushed this and

351
00:20:15,870 --> 00:20:19,770
Josh:
it has all the functionality built in it looks good i am giving another win

352
00:20:19,770 --> 00:20:20,930
Josh:
to codex here that's two for two.

353
00:20:20,930 --> 00:20:25,650
Ejaaz:
Wow look i've got like a re-optimization uh toggle at the top and it actually

354
00:20:25,650 --> 00:20:29,050
Ejaaz:
updated i wonder where it's pulling that data from it's already hooked

355
00:20:29,050 --> 00:20:31,130
Josh:
Into data look at that yeah impressive stuff.

356
00:20:31,130 --> 00:20:36,210
Ejaaz:
Very very cool now one major reason why both of these models have advanced so

357
00:20:36,210 --> 00:20:41,290
Ejaaz:
rapidly over the last couple of months is something known as the ai model harness

358
00:20:41,290 --> 00:20:45,490
Ejaaz:
now you have the AI model, which is something that you and I have interacted with quite a lot.

359
00:20:45,670 --> 00:20:48,210
Ejaaz:
It's via ChatGPT or Claude itself.

360
00:20:48,450 --> 00:20:51,470
Ejaaz:
But there's an added layer that you can put on top of this model,

361
00:20:51,590 --> 00:20:56,530
Ejaaz:
which comes in the form of prescripted prompts that are engineered to make the

362
00:20:56,530 --> 00:20:58,570
Ejaaz:
model act in a particular way.

363
00:20:58,690 --> 00:21:01,670
Ejaaz:
But it's also the environment that the model works in.

364
00:21:01,730 --> 00:21:05,930
Ejaaz:
It's also the policies that you set to make sure that the model acts and behaves

365
00:21:05,930 --> 00:21:08,310
Ejaaz:
and sounds in a particular way.

366
00:21:08,410 --> 00:21:09,610
Ejaaz:
That's why we talked about Claude's

367
00:21:09,610 --> 00:21:14,170
Ejaaz:
personality earlier being better than ChatGPT. It all plays into the

368
00:21:15,970 --> 00:21:19,510
Ejaaz:
We figured out was it's an entirely new product category on its own.

369
00:21:19,630 --> 00:21:25,050
Ejaaz:
In fact, Cursor had some news over the last couple of days where they made their

370
00:21:25,050 --> 00:21:28,370
Ejaaz:
harness, Cursor SDK, available via API.

371
00:21:28,490 --> 00:21:33,130
Ejaaz:
And the reason why this is such a big deal is critics criticized Cursor for

372
00:21:33,130 --> 00:21:37,190
Ejaaz:
being an AI wrapper, which meant that Cursor doesn't have a model of its own.

373
00:21:37,310 --> 00:21:44,970
Ejaaz:
It would just create this harness, a set of prompts environments around, say, Claude or ChatGPT.

374
00:21:45,210 --> 00:21:49,430
Ejaaz:
And so people would say, cursor isn't actually special. Turns out the wrapper

375
00:21:49,430 --> 00:21:52,670
Ejaaz:
or the harness actually made these models way more intelligent.

376
00:21:52,810 --> 00:21:58,310
Ejaaz:
In fact, if you added cursor's harness on top of GPT 5.5 and Claude Opus 4.7

377
00:21:58,310 --> 00:22:02,610
Ejaaz:
right now, you end up with a smarter, more intelligent, more efficient model

378
00:22:02,610 --> 00:22:04,370
Ejaaz:
than the actual base models themselves.

379
00:22:04,510 --> 00:22:07,790
Ejaaz:
Now, remember, AI Labs spent hundreds of millions of dollars to train these

380
00:22:07,790 --> 00:22:10,350
Ejaaz:
models and to create the best thing and put their best foot forward.

381
00:22:10,470 --> 00:22:13,030
Ejaaz:
And still you have a startup which is worth, what is it now,

382
00:22:13,130 --> 00:22:16,750
Ejaaz:
$10 billion right now, potentially being acquired by XAI for $60 billion,

383
00:22:17,270 --> 00:22:18,910
Ejaaz:
creating a better model on top.

384
00:22:19,090 --> 00:22:23,170
Ejaaz:
So the harness and the AI model are arguably one and the same at this point.

385
00:22:23,290 --> 00:22:26,290
Ejaaz:
And it's just a valuable moat to point out that these models aren't just better

386
00:22:26,290 --> 00:22:30,670
Ejaaz:
at coding because of the base model itself. It's because of this thing known as a harness.

387
00:22:31,110 --> 00:22:35,230
Josh:
Yeah. And the harness is the difference maker when it comes to building this super app.

388
00:22:35,350 --> 00:22:38,190
Josh:
It's like every single company is trying to build the super

389
00:22:38,190 --> 00:22:41,010
Josh:
app the all-in-one application that kind of serves

390
00:22:41,010 --> 00:22:43,850
Josh:
as your operating system anytime you need to engage with ai

391
00:22:43,850 --> 00:22:47,050
Josh:
this is the place that you could do it and it's all encompassing it's

392
00:22:47,050 --> 00:22:50,090
Josh:
all in one now one of the best applications we've seen for this in the early

393
00:22:50,090 --> 00:22:53,930
Josh:
days has been something like open claw where it's this extension of what an

394
00:22:53,930 --> 00:22:58,850
Josh:
operating system could look like starting with ai at the foundation and open

395
00:22:58,850 --> 00:23:02,170
Josh:
claw did a really amazing job of that now in some news this week you can now

396
00:23:02,170 --> 00:23:05,990
Josh:
use your chat gpt account to generate tokens with OpenClaw.

397
00:23:06,110 --> 00:23:09,810
Josh:
So previously you had to use the API, whether you were using Anthropic or OpenAI

398
00:23:09,810 --> 00:23:13,150
Josh:
or any of the other models, and it was pretty expensive. It costs a lot of money.

399
00:23:13,390 --> 00:23:16,890
Josh:
Now, thanks to Sam Altman this week announcing, you can actually use your account

400
00:23:16,890 --> 00:23:20,270
Josh:
connected with it. And I think this is the beginning of a multi-step plan.

401
00:23:20,800 --> 00:23:25,280
Josh:
To really integrate OpenClaw directly into Codex in a way that Anthropik can't.

402
00:23:25,360 --> 00:23:28,200
Josh:
Because if you'll remember, OpenAI owns OpenClaw.

403
00:23:28,380 --> 00:23:31,120
Josh:
They bought Peter and granted OpenClaw will stay open source forever,

404
00:23:31,360 --> 00:23:34,460
Josh:
but they have the ability to actually integrate directly into their products.

405
00:23:34,580 --> 00:23:35,920
Josh:
And I suspect that's what we're going to see.

406
00:23:36,180 --> 00:23:39,960
Josh:
In fact, we even got some confirmation from another post from one of the Codex

407
00:23:39,960 --> 00:23:44,460
Josh:
developers who replied to a post that was saying, Codex only needs a native

408
00:23:44,460 --> 00:23:47,020
Josh:
editor, an iOS app, a full browser, and OpenClaw.

409
00:23:47,180 --> 00:23:50,060
Josh:
And the developer, Tebow, said all of this and more

410
00:23:50,060 --> 00:23:52,900
Josh:
is coming to which sam altman retweeted it so we are

411
00:23:52,900 --> 00:23:55,580
Josh:
indeed getting open claw inside of codex we're getting a

412
00:23:55,580 --> 00:23:58,500
Josh:
mobile ios apps that you can access it remotely and soon

413
00:23:58,500 --> 00:24:01,280
Josh:
there's going to be no reason to really use a different app

414
00:24:01,280 --> 00:24:04,680
Josh:
because it's going to be all-encompassing now are there still downfalls yes

415
00:24:04,680 --> 00:24:08,820
Josh:
computer use 20 faster on codex but yesterday i was playing around with it i

416
00:24:08,820 --> 00:24:13,560
Josh:
told it to increase the volume of my music and it took 10 minutes to do it because

417
00:24:13,560 --> 00:24:17,200
Josh:
it tried to increase the slider on spotify even though it was max without actually

418
00:24:17,200 --> 00:24:21,200
Josh:
increasing my system audio so it's still a little dumb, but it is getting better.

419
00:24:21,360 --> 00:24:25,440
Josh:
And I think this leads me to this post that I really love, the vanilla maxing

420
00:24:25,440 --> 00:24:27,000
Josh:
post we have to talk about.

421
00:24:27,500 --> 00:24:32,140
Josh:
Which starts by saying, you should 100% be vanilla maxing. Just use the tools

422
00:24:32,140 --> 00:24:34,000
Josh:
as they're handed to you. That's it.

423
00:24:34,550 --> 00:24:37,110
Josh:
Because a lot of people, and I've found this personally, and in fact,

424
00:24:37,230 --> 00:24:40,430
Josh:
I've been caught by this personally, is that you try to get caught up in using

425
00:24:40,430 --> 00:24:42,270
Josh:
all these different repos and these skills and these plugins,

426
00:24:42,470 --> 00:24:46,530
Josh:
when the reality is, is if you just wait, the AI labs are shipping fast enough,

427
00:24:46,630 --> 00:24:48,610
Josh:
they'll just integrate it into your own native application.

428
00:24:48,850 --> 00:24:50,810
Josh:
So I'm vanilla maxing you, Jess.

429
00:24:50,970 --> 00:24:54,870
Ejaaz:
I'm totally vanilla maxing as well, dude. Like, listen, OpenClaw,

430
00:24:55,130 --> 00:25:00,010
Ejaaz:
when it was hyped up, was incredibly impressive and still is incredibly impressive.

431
00:25:00,170 --> 00:25:05,030
Ejaaz:
It opened up an entirely new product market and segment. That's why OpenAI acquired them.

432
00:25:05,190 --> 00:25:08,430
Ejaaz:
But something's majorly changed over the last couple of months,

433
00:25:08,550 --> 00:25:12,130
Ejaaz:
which is OpenClaw has kind of fallen off. No one talks about it anymore.

434
00:25:12,450 --> 00:25:16,090
Ejaaz:
People who are complaining about the errors and bugs that we're facing have

435
00:25:16,090 --> 00:25:18,710
Ejaaz:
kind of gone silent because they've just grown bored and they don't want to

436
00:25:18,710 --> 00:25:20,030
Ejaaz:
put their energy and effort into it.

437
00:25:20,150 --> 00:25:24,470
Ejaaz:
And the reason why is because although these tools are very frontier level,

438
00:25:24,710 --> 00:25:27,570
Ejaaz:
they can't actually be scaled to a practical use.

439
00:25:27,650 --> 00:25:31,410
Ejaaz:
You don't feel safe integrating OpenClaw into your desktop where you have personal

440
00:25:31,410 --> 00:25:35,510
Ejaaz:
files. I've seen horror stories where they access credit card data and expose

441
00:25:35,510 --> 00:25:39,530
Ejaaz:
that or where they deleted old wedding photos and the wife was super angry,

442
00:25:39,830 --> 00:25:40,690
Ejaaz:
a bunch of the stuff like that.

443
00:25:40,870 --> 00:25:45,170
Ejaaz:
If you are able to get given or access to a tool that comes under a branded

444
00:25:45,170 --> 00:25:52,270
Ejaaz:
reputation, such as ChatGPT, Codex, or Claude Cowork, where it kind of like

445
00:25:52,270 --> 00:25:54,970
Ejaaz:
takes over your computer, but in a sandboxed environment.

446
00:25:54,970 --> 00:25:58,510
Ejaaz:
I know that NVIDIA also released NemoClaw, which is like the enterprise-grade

447
00:25:58,510 --> 00:26:00,590
Ejaaz:
secure version of OpenClaw.

448
00:26:00,890 --> 00:26:03,870
Ejaaz:
You're vanilla maxing. That is the way to do it. And there's no need to rush

449
00:26:03,870 --> 00:26:09,230
Ejaaz:
ahead and lose all your data as a consequence. So that's basically it for the episode.

450
00:26:09,410 --> 00:26:15,970
Ejaaz:
We wanted to give you a comprehensive guide and insight into Codex GPT 5.5 versus Claude Opus 4.7.

451
00:26:16,090 --> 00:26:18,970
Ejaaz:
There's a lot of numbers in there, but basically the best coding models from

452
00:26:18,970 --> 00:26:20,830
Ejaaz:
both sides to see which is better.

453
00:26:21,050 --> 00:26:25,970
Ejaaz:
And the truth is, there isn't a clear winner right now. I would say it's probably

454
00:26:25,970 --> 00:26:31,210
Ejaaz:
Codex GPT 5.5, but the narrative switched so recently that maybe,

455
00:26:31,430 --> 00:26:34,870
Ejaaz:
maybe Claude can still catch up. And the only reason why I say that,

456
00:26:34,950 --> 00:26:35,730
Ejaaz:
Josh, is that's the only reason

457
00:26:36,020 --> 00:26:39,600
Ejaaz:
there's a model that we haven't discussed or demonstrated yet because we can't.

458
00:26:39,680 --> 00:26:40,960
Ejaaz:
It's called Claude Mythos.

459
00:26:41,300 --> 00:26:45,620
Ejaaz:
It was kind of pseudo-released about a few weeks ago.

460
00:26:45,800 --> 00:26:49,900
Ejaaz:
And on all benchmarks, it is technically better than 5.5.

461
00:26:50,080 --> 00:26:53,240
Ejaaz:
But the reason why we can't demo it is we can't get access to it.

462
00:26:53,340 --> 00:26:56,900
Ejaaz:
And the reason cited by Anthropic was because it's too dangerous.

463
00:26:57,060 --> 00:26:59,980
Ejaaz:
It's a cybersecurity risk. In fact, it wasn't just Anthropic saying it.

464
00:27:00,120 --> 00:27:03,400
Ejaaz:
It was Peter Heskett of the US Department of War also saying this,

465
00:27:03,540 --> 00:27:05,160
Ejaaz:
right? So there's concerns around that.

466
00:27:05,460 --> 00:27:09,060
Ejaaz:
OpenAI has created a mythos level type model here, but has made it available

467
00:27:09,060 --> 00:27:12,220
Ejaaz:
to everyone. And so the argument could be made that it's just because Anthropic

468
00:27:12,220 --> 00:27:13,220
Ejaaz:
doesn't have enough compute.

469
00:27:13,360 --> 00:27:16,200
Ejaaz:
So there's a lot of rumors around this, but I'm excited to get my hands on the

470
00:27:16,200 --> 00:27:18,280
Ejaaz:
best models from each of these and compare them directly.

471
00:27:18,600 --> 00:27:21,360
Josh:
Yeah. And the compute's actually been degrading. So I think I want to wrap this

472
00:27:21,360 --> 00:27:23,800
Josh:
up on like, what do you actually currently use?

473
00:27:23,880 --> 00:27:26,780
Josh:
What is the limitless production stack? How are we using these AI models?

474
00:27:26,920 --> 00:27:29,760
Josh:
And for me, at least, it's not even close. I'm codex-pilled.

475
00:27:29,920 --> 00:27:35,480
Josh:
I'm fully switched over. I am codex superior domination. It's going to be the month of codex.

476
00:27:35,760 --> 00:27:39,080
Josh:
Maybe Anthropic will have a comeback, but that's not happening until at least

477
00:27:39,080 --> 00:27:41,980
Josh:
June, July, because this month is codex month.

478
00:27:42,100 --> 00:27:46,160
Josh:
So I've been using codex for basically everything, all of the difficult tasks that I need.

479
00:27:46,300 --> 00:27:50,900
Josh:
What I have found is that GPT 5.5 as an LLM, as a language model,

480
00:27:51,000 --> 00:27:55,200
Josh:
as a chatbot is a little bit inferior to Opus 4.7, which I believe to be the

481
00:27:55,200 --> 00:27:57,320
Josh:
better model if you're just chatting with an AI.

482
00:27:57,580 --> 00:28:00,720
Josh:
I like its personality it's warmer it's more precise it normally

483
00:28:00,720 --> 00:28:03,640
Josh:
gets the idea of what i want so if i am building a complex

484
00:28:03,640 --> 00:28:06,940
Josh:
project opus 4.7 is the orchestrator and

485
00:28:06,940 --> 00:28:11,720
Josh:
codex is the actual implementer the executor of this code of this plan i've

486
00:28:11,720 --> 00:28:16,240
Josh:
also noticed that opus 4.7 is a bit inferior to 4.6 at a few things and i think

487
00:28:16,240 --> 00:28:20,760
Josh:
this is another piece of alpha here i actually use opus 4.6 whenever i'm doing

488
00:28:20,760 --> 00:28:25,160
Josh:
anything relating to writing or word ingestion so one of the projects i've been doing recently.

489
00:28:25,900 --> 00:28:29,400
Josh:
Is andre carpathy he created this like wiki for

490
00:28:29,400 --> 00:28:32,120
Josh:
your own person where it ingests files and it kind of writes

491
00:28:32,120 --> 00:28:35,880
Josh:
these summaries for you and it creates a personal knowledge wiki i use opus

492
00:28:35,880 --> 00:28:40,920
Josh:
4.6 exclusively for that because opus 4.7 i think is far inferior at summarizing

493
00:28:40,920 --> 00:28:44,420
Josh:
and kind of rewriting these topics that i use in my obsidian so that's kind

494
00:28:44,420 --> 00:28:48,200
Josh:
of my stack i use opus for llms codex for everything else it just what are you

495
00:28:48,200 --> 00:28:49,880
Josh:
currently optimizing for what.

496
00:28:49,880 --> 00:28:52,940
Ejaaz:
Are you planning so it's two things when i have

497
00:28:52,940 --> 00:28:56,180
Ejaaz:
a uh my stack is actually way more diverse when

498
00:28:56,180 --> 00:28:59,460
Ejaaz:
it comes to just like the research side of things only because i'm

499
00:28:59,460 --> 00:29:02,300
Ejaaz:
using the ai that's like available readily wherever i am

500
00:29:02,300 --> 00:29:05,320
Ejaaz:
right so if i'm on x a lot and i see breaking news i'm

501
00:29:05,320 --> 00:29:09,100
Ejaaz:
just tapping grok because honestly it's a recent model i think it's like what

502
00:29:09,100 --> 00:29:13,380
Ejaaz:
was it 4.3 at this point uh is actually pretty good and they have multiple agents

503
00:29:13,380 --> 00:29:17,120
Ejaaz:
that are kind of like running at this right but for the core bulk of the work

504
00:29:17,120 --> 00:29:22,400
Ejaaz:
i've started shifting towards gpt 5.5 for the research because 5.5 research

505
00:29:23,100 --> 00:29:26,760
Ejaaz:
things for so much longer. And it has a much more in-depth discussion.

506
00:29:26,880 --> 00:29:32,380
Ejaaz:
In fact, I tested it out today because I was curious about the AI power stack

507
00:29:32,380 --> 00:29:36,520
Ejaaz:
and what stocks I should be investing in to get exposure to the power grid lines

508
00:29:36,520 --> 00:29:38,680
Ejaaz:
that are currently constraining AI data centers, right?

509
00:29:38,740 --> 00:29:41,720
Ejaaz:
And I was like, all right, I gave a detailed prompt to both Claude Opus 4.7

510
00:29:41,720 --> 00:29:46,480
Ejaaz:
and 5.5 and 5.5 completely cooked 4.7.

511
00:29:46,620 --> 00:29:51,200
Ejaaz:
And it gave good reasoning why, whereas 4.7 did not.

512
00:29:51,420 --> 00:29:56,400
Ejaaz:
I had to ask it more question. So all in all, I think 5.5 is my preference right

513
00:29:56,400 --> 00:29:58,880
Ejaaz:
now. I still use 4.7 because of the personality.

514
00:29:59,120 --> 00:30:02,620
Ejaaz:
It's like less of an AI type of voice versus GPT 5.5.

515
00:30:02,760 --> 00:30:06,000
Ejaaz:
But again, I feel like OpenAI is on a generational run right now,

516
00:30:06,020 --> 00:30:09,900
Ejaaz:
and they might just kind of fix this in the next couple of hours at this point.

517
00:30:10,420 --> 00:30:13,140
Josh:
Yeah, it's coming. It's coming quick. And I think now is a good time to kind

518
00:30:13,140 --> 00:30:15,260
Josh:
of get familiar with Codex to understand the way it works.

519
00:30:15,400 --> 00:30:19,740
Josh:
And as they implement these features, you'll be able to adopt them within the hour, within the day.

520
00:30:19,940 --> 00:30:23,640
Josh:
It's pretty amazing. And it's been fun to just experiment. It's been fun to try something new.

521
00:30:23,940 --> 00:30:27,940
Josh:
And it's, again, competition is just better for everyone. So the end winner

522
00:30:27,940 --> 00:30:31,360
Josh:
of this is the user, because for as low as $20 a month, you get access to all

523
00:30:31,360 --> 00:30:33,440
Josh:
this frontier intelligence, all these capabilities.

524
00:30:33,760 --> 00:30:38,580
Josh:
And it's just, it's really been unbelievable to watch. So that is the comparison, Codex versus Opus.

525
00:30:38,720 --> 00:30:42,620
Josh:
If you have not tried both of them, I encourage you to give it a try.

526
00:30:42,900 --> 00:30:45,460
Josh:
Test the prompts against one another. If you have any type of work that you

527
00:30:45,460 --> 00:30:49,280
Josh:
need, if you're working on a computer at all, chances are you can use AI to

528
00:30:49,280 --> 00:30:50,680
Josh:
help you do your job even better.

529
00:30:50,780 --> 00:30:54,160
Josh:
Or you could just use it to help you do hobbies and side projects that you've

530
00:30:54,160 --> 00:30:55,520
Josh:
always wanted to do. So give it a try.

531
00:30:56,100 --> 00:30:58,860
Josh:
Let us know your preference, codex, cloud code. Which one is it going to be?

532
00:30:59,720 --> 00:31:02,400
Josh:
I think that's probably it for the episode. Thank you guys so much for watching.

533
00:31:02,540 --> 00:31:04,720
Josh:
If you enjoyed it, please don't forget to share with your friends.

534
00:31:05,100 --> 00:31:08,740
Josh:
Let them know which model they picked. And also don't forget to rate it five

535
00:31:08,740 --> 00:31:12,860
Josh:
stars on your favorite podcast listening platform. Any final thoughts, EJS, before we go?

536
00:31:13,220 --> 00:31:16,520
Ejaaz:
No, that's it. Thank you guys so much for listening and we'll see you on the next one.