Limitless: An AI Podcast

Anthropic's Claude Opus 4.6 and OpenAI's Codex 5.3 have come out back to back, so we dive in and compare their shocking capabilities and implications for AI development. 

We compare Claude's orchestration skills against Codex's superior coding efficiency through live demos, revealing the potential impact on job automation in tech. Try them out, see which one you prefer, and let us know!

------
🌌 LIMITLESS HQ ⬇️

NEWSLETTER:    https://limitlessft.substack.com/
FOLLOW ON X:   https://x.com/LimitlessFT
SPOTIFY:             https://open.spotify.com/show/5oV29YUL8AzzwXkxEXlRMQ
APPLE:                 https://podcasts.apple.com/us/podcast/limitless-podcast/id1813210890
RSS FEED:           https://limitlessft.substack.com/

------
TIMESTAMPS

0:05 AI Showdown: Claude vs. Codex
0:43 Live Demo of Coding Models
4:13 Comparing Model Outputs
4:47 Codex vs. Claude Performance
6:15 Exploring the Models' Features
8:58 The Future of Work with AI
9:32 Building a Stock Analysis Tool
11:44 Technical Demos Unveiled
14:41 Self-Improving AI Models
17:19 Automating Complex Tasks
18:46 The Competitive Landscape
20:32 Investor Perspectives on AI
22:36 Major Updates from OpenAI
23:43 Real-Time Quality Assurance Testing
29:07 Creating a Stock Dashboard
35:45 Conclusion and Future Insights

------
RESOURCES

Josh: https://x.com/JoshKale

Ejaaz: https://x.com/cryptopunk7213

------
Not financial or tax advice. See our investment disclosures here:
https://www.bankless.com/disclosures⁠

What is Limitless: An AI Podcast?

Exploring the frontiers of Technology and AI

Ejaaz:
48 hours ago, Anthropic dropped Claude Opus 4.6, the world's most powerful AI model.

Ejaaz:
And literally 20 minutes later, OpenAI dropped Codex 5.3, which is not only

Ejaaz:
better, but also built itself.

Ejaaz:
Now, to say both of these models are powerful would literally be the understatement of the century.

Ejaaz:
By the time I'd eaten breakfast yesterday, one of the models had discovered

Ejaaz:
500 security flaws, which no one else had discovered before.

Ejaaz:
And by lunchtime, a bunch of software stocks were down hundreds of billions

Ejaaz:
of dollars out of fear that these models would replace entire teams.

Ejaaz:
And it's actually already happened. These models can replace a team of 50 software

Ejaaz:
engineers, rebuild Pokemon from scratch, and so much more.

Ejaaz:
And in this episode, we're going to be doing a live demo side by side to show

Ejaaz:
you which model is the best.

Josh:
Yeah, this is pretty cool. I wanted to spend a lot of time this episode kind

Josh:
of introducing people to these models, what they could do, how they work through

Josh:
demos that we're going to perform ourselves.

Josh:
These are definitely two frontier models but i think more importantly they're

Josh:
frontier coding models and when people hear that i

Josh:
think a lot of them get turned away because it seems like this complicated

Josh:
thing like you need to be a developer in order to use them and we

Josh:
are here to tell you that is not the case as from

Josh:
one non-technical person to another i fed this

Josh:
model a prompt i fed it some assets and

Josh:
then i pressed play and what i got is a

Josh:
side-scrolling game which was exactly what i asked for so on the screen now

Josh:
you're seeing the one shot prompt that i fed this model to ask to create a side

Josh:
scroller that was like mario that we can actually play so it has coins and i

Josh:
don't think the gravity quite works what you're saying is that it understands

Josh:
physics it is able to generate graphics and it plays like a pretty solid side

Josh:
scroller and i created this in five minutes,

Josh:
with one prompt and it actually works what.

Ejaaz:
Was the prompt that you used josh

Josh:
Yeah so i'll pause playing this game to

Josh:
actually show you the the prompt it was very simple it was this

Josh:
one paragraph i want you to make a game you can

Josh:
use python or c++ whatever you find the most convenient a 2d

Josh:
platformer that closely resembles super mario use the

Josh:
attached background image and sprites found in the

Josh:
asset folder take into account that the sprites don't come with transparent background

Josh:
but pink ones so you need to filter the background and for those who are

Josh:
watching you can actually see the sprites on my screen they were

Josh:
just a series of assets that there was no context given as

Josh:
to what each one of them was but the model reasoned through it it removed

Josh:
the background and it actually generated a pretty good representation of

Josh:
that now this was built one shot on codex which

Josh:
is the new open ai mac application that just released this

Josh:
week and i wanted to compare it to claude

Josh:
so i have another instance here on the screen with claude this is using opus

Josh:
4.6 the newest frontier model that they just released this week and i want to

Josh:
do an exact one-to-one comparison so i'm gonna launch the same exact prompt

Josh:
we're gonna have that cook on codex or we're gonna have that cook in claude

Josh:
code and in the meantime you just maybe we can kind of talk about more of what

Josh:
these models do and how they work well.

Ejaaz:
Before we do that actually um as you set this game up i ran it on claude opus

Ejaaz:
4.6 as well but with a slight twist okay

Josh:
Let's see your output what do we have okay.

Ejaaz:
Uh i don't know if you can see my screen

Ejaaz:
but it is the exact game that you just created but i don't know if those characters

Ejaaz:
look uh kind of familiar to you we have the uh hero protagonist character which

Ejaaz:
is uh my beautiful face and my beautiful person ejaz um and we have uh who's

Ejaaz:
this enemy over here that looks a lot like the bear guy

Ejaaz:
and listen we can double jump here josh and i think yep i can crush you but every time i mean this

Ejaaz:
Kind of jokes aside, this is insane. This took me like around three minutes to build end-to-end.

Ejaaz:
I used the exact same prompt that you gave me.

Ejaaz:
And we didn't have sprites ready-made of ourselves, right?

Ejaaz:
We didn't have like cartoon images of ourselves. So I uploaded an image that

Ejaaz:
we had taken, I don't know, like six months ago and said, hey,

Ejaaz:
can you make game avatars out of this?

Ejaaz:
It did it in 20 seconds. And then I said, could you add these to the game and

Ejaaz:
replace the enemy with Josh and the protagonist with Ejaz? And it did it in a minute.

Josh:
So here we go. That's pretty amazing. And these are really, these are just using

Josh:
standard desktop applications. So what you're using right here,

Josh:
this was done in Cloud Code, right?

Josh:
You just went onto Cloud, the MacBook, the Mac app. You downloaded it. You put in the prompt.

Josh:
You shared some assets. And now it built this amazing game in one single prompt.

Josh:
And we're actually going to experiment further in this episode where we're going

Josh:
to create a trading room that does actual real-time stock analysis.

Josh:
So as I'm curating the prompts and as we're getting ready for that second demo,

Josh:
maybe we could walk through what makes these models so exceptional.

Ejaaz:
Yeah, well, you might actually notice the first difference on screen right now.

Ejaaz:
If you notice, if you look closely, my avatar is kind of glitching out, right?

Ejaaz:
And if you compare it to your Codex game that you just coded up,

Ejaaz:
there's no glitches. It runs super smoothly.

Ejaaz:
And the main takeaway here is Codex 5.3 is a superior coding model to Anthropic.

Ejaaz:
And that's a sentence I never thought I would say, at least for the next couple

Ejaaz:
of years, because Anthropic has held that prestige and title for so long.

Ejaaz:
But since Code Red was initiated in open air around three months ago,

Ejaaz:
Sam has devoted pretty much all his resources towards building the best coding model.

Ejaaz:
And the benchmarks don't lie. It is a full 12 points on the software engineering

Ejaaz:
benchmark ahead of Claude Opus 4.6.

Josh:
That's a pretty significant difference.

Ejaaz:
So I've actually pulled up a more general comparison between the two models here.

Ejaaz:
And it summarizes it really well. So if we look at Claude's model,

Ejaaz:
Opus 4.6, what's good about it?

Ejaaz:
Well, they've 5x the context window.

Ejaaz:
So it's gone up to a million tokens or rather characters that you can put in

Ejaaz:
a single prompt, which if you want to understand how powerful this is,

Ejaaz:
you can just put way more information into your initial prompt.

Ejaaz:
It has much better context and memory. So you can end up cooking up much better

Ejaaz:
products overall, which is very, very impressive and important to have.

Ejaaz:
Number two, I would think about this as an orchestration model.

Ejaaz:
So if you look at specific benchmarks, it is beaten OpenAI at GDP eval.

Ejaaz:
GDP eval is a benchmark where they go out and they test a model's performance

Ejaaz:
at a really complex task versus a professional human that would normally do that task.

Ejaaz:
And the decision is, would you use the AI model or would you use the human?

Ejaaz:
And in this case, you would choose Claude 4.6 over humans way more than you

Ejaaz:
would choose OpenAI's latest model. So that's a really important thing.

Ejaaz:
And the point around Claude's latest model is that it doesn't code as well as

Ejaaz:
codecs, but it can orchestrate a bunch of agents and overall activity better than OpenAI.

Ejaaz:
Now, if you look at Codex and OpenAI's new models specifically,

Ejaaz:
It wins on the software engineering. It is simply a better software engineer

Ejaaz:
than Claude is, which is a massive flip around and shows that it's a testament

Ejaaz:
to how much resources and fine-tuning that OpenAI has been able to achieve.

Josh:
And to the note on the quality of the models here, my prompt is done in Claude

Josh:
code that I used, the same one that we used in Codex. And I'm going to run it

Josh:
here for the first time now.

Josh:
You can see on screen and we'll see what it looks like.

Josh:
So underneath, we have our Codex version, which looks beautiful.

Josh:
On top we have our brand new version that was just made by opus now i haven't

Josh:
tried this yet so we're going to see what happens when i press space to start,

Josh:
so it looks like opus has failed to create a

Josh:
floor so i am just falling through the floor until the game ends um okay so

Josh:
just based on this one demo alone this is a fairly significant difference where

Josh:
gpt's codex has created a beautiful side scroller it doesn't have gravity but

Josh:
i could just ask it to or it has gravity it's a little too much i could ask

Josh:
it to lower it opus doesn't even work at all,

Josh:
And again, the test was just a one-shot prompt. So I'm going to get back to

Josh:
work prompting it again to build this new application, the trading application.

Josh:
We'll follow up with that. But I think that's a funny kind of demo just to showcase

Josh:
that one actually is kind of superior in the other in this one use case, at least.

Ejaaz:
Yeah, I mean, you said it pretty clearly, which is Codex is the best coding AI model.

Ejaaz:
And I have to like, I can't emphasize that enough because OpenAI for a long

Ejaaz:
time was behind Anthropic and by a massive margin. and in some way,

Ejaaz:
shape, or form, they've been able to catch up.

Ejaaz:
Now, what's interesting here is both companies have focused on each other's goals.

Ejaaz:
So when Anthropic was typically meant to be the leading frontier model in coding,

Ejaaz:
it now has decided to focus on what OpenAI was really good at,

Ejaaz:
which is overall orchestration and being a better generalized model, right?

Josh:
They're taking each other's lunch. Yeah, exactly.

Ejaaz:
OpenAI has decided to eat Anthropic's

Ejaaz:
lunch and say, okay, we've got the generalized stuff sorted out.

Ejaaz:
Let's try and figure out the coding specific niche, highly defined,

Ejaaz:
professionalized functions. And it's produced the best coding model.

Ejaaz:
So it's kind of a weird win-win for both labs.

Ejaaz:
And what's awesome about this is they both now have really well-rounded,

Ejaaz:
but also very specialized models.

Ejaaz:
And the reason why this is important is, and this is like kind of maybe my hot take,

Ejaaz:
I don't think the coding models matter, Josh. I actually don't think the generalized models matter either.

Ejaaz:
I think they're both going off to something much bigger, which is creating the

Ejaaz:
operating system for the future of work.

Ejaaz:
They know that AI models and AI agents are gonna automate a ton of different

Ejaaz:
industries and the industries are only gonna pick you if you can do both generalized

Ejaaz:
work and hyper-specific work really well.

Ejaaz:
That is coding and orchestration and managing your data.

Ejaaz:
And now we have two amazing models dropped within 20 minutes of each other.

Ejaaz:
That does exactly that to the highest performance metric that we've ever seen before.

Josh:
They're pretty exceptional. So now for this next demo, I have it queued up here.

Josh:
What we're going to do is, what I did is ask the model itself to build me a

Josh:
prompt for this. So I wanted it to create me an AI stock portfolio war room.

Josh:
And I asked, hey, I want to create this, create me a fully fleshed out prompt

Josh:
that kind of should solve this problem with one shot.

Josh:
So what I do is I loaded it up here in our Cloud Code app.

Josh:
And then I also loaded it up into the codex app i created its own

Josh:
project folder and now i'm going to hit send so both of

Josh:
these things are thinking in real time we will check back

Josh:
in once their outputs are done and we'll compare again the second version

Josh:
which is more of a robust one i mean you'll see uh on

Josh:
the cloud screen it has this whole list of to-dos that it wants to do it has

Josh:
an entire plan there's nine different panels that it's going to build it's going

Josh:
to do risk analysis matrix and portfolio action bars and all this stuff so we'll

Josh:
let that cook and let's get back to what separates these what people have been

Josh:
freaking out about on the internet more as these things get going could i.

Ejaaz:
Take three minutes show you some wild demos yeah

Josh:
Let's see what the internet's been demoing while we wait for hours to cook okay.

Ejaaz:
Cool like listen our 2d mario inspired game was cool but imagine if i told you

Ejaaz:
you could recreate the entire pokemon game including levels cities characters

Ejaaz:
and creatures that you fight from scratch in about an hour and 30 minutes

Ejaaz:
That's pretty impressive. That's what we're looking at right now.

Josh:
Wow, it even has the fighting.

Ejaaz:
Yeah, yeah, yeah. And buttons and the multimodal gameplay.

Ejaaz:
And obviously this looks like it's been made by a child image wise,

Ejaaz:
but it's probably going to take you, what, another couple of hours to make a

Ejaaz:
really high fidelity game that you could probably run on your Nintendo Switch or whatever.

Ejaaz:
It is just so impressive that we can do these things.

Ejaaz:
Anyone can do these things with no previous background. Just upload a few images

Ejaaz:
or generate a few images and you can create childhood nostalgic games that are

Ejaaz:
worth billions of dollars, which is just super cool to see.

Josh:
Yeah, one of the cool things that I think it's really important to note is how approachable this is.

Josh:
Like for the recent example that we're having run right now on my screen,

Josh:
all I did was tell it what I wanted and ask it to develop the prompt with me.

Josh:
So even if it feels overwhelming, like you don't really know how to code,

Josh:
you don't know how to prompt things, you can actually just ask the model to

Josh:
help you generate the prompt, help explain to you how it works.

Josh:
And it's a really easy way to build basically anything you can imagine.

Josh:
It's not just games. It's productivity tools. It's CRM tracking.

Josh:
It's whatever you want it to be so i think that's really interesting but it

Josh:
also goes much more technical right i saw another crazy example with the compiler.

Ejaaz:
Okay so for for the tech nerds

Ejaaz:
out there that's been a lot of time coding you are going to

Ejaaz:
be wowed by this um for one of their uh flagship demos for uh opus 4.6 the anthropic

Ejaaz:
team decided to task the model with building a c compiler which is an incredibly

Ejaaz:
complicated execution tool that is required to code up some of the most craziest types of apps.

Ejaaz:
And they just walked away. And they just kind of like looked at it,

Ejaaz:
monitored it, made sure that it wasn't going awry.

Ejaaz:
And in two weeks, let me emphasize that,

Ejaaz:
Two whole weeks, 14 days, it coded nonstop and built this compiler.

Ejaaz:
Now, you might think two weeks is quite a long time. I want my thing done in an hour and a half.

Ejaaz:
Well, let me hearken back to history where previously, if you wanted to create

Ejaaz:
something like this, in today's world, it would take a team of around 50 or

Ejaaz:
so humans, and it would take them a few months to build from scratch. That's today.

Ejaaz:
But back in the day, it would technically have taken them around a decade to

Ejaaz:
build and like thousands of people.

Ejaaz:
So we have just kind of condensed the timeline to create really complicated

Ejaaz:
tools in a matter of hours or weeks in this case.

Ejaaz:
Now, the second thing I want to point out is the fact that these models can

Ejaaz:
go untouched for two weeks is just insane.

Ejaaz:
There was another stat that was released today by OpenAI with,

Ejaaz:
sorry, yesterday with OpenAI is 5.2, I think, 5.2 high, I believe,

Ejaaz:
where it can go pretty much 50% hit rate for 6.6 hours. a time horizon.

Ejaaz:
So that means if you gave it any kind of complicated coding task,

Ejaaz:
50% of the time in 6.6 hours, it would get that done, completely done.

Ejaaz:
And it would nail it 50% of the time, which is just such an impressive track

Ejaaz:
record when you look back a year.

Ejaaz:
And that time was, what was it like 30 minutes, maybe an hour.

Ejaaz:
So every iteration, we see this thing double. It's just so insane.

Josh:
Yeah, it's really, it's unbelievable and almost like intimidating how

Josh:
capable and competent it is even for someone who

Josh:
is a novel at writing code it's not about writing

Josh:
code it's about being able to generate whatever you want it to so like if you

Josh:
think of it you kind of in a way it abstracts the code away and allows you to

Josh:
just speak the english language and get what you want from speaking english

Josh:
and in a way that you understand and it will help walk you through the way one

Josh:
of the things that i love about cloud code in particular is the plan mode.

Josh:
If you leave a lot of things out of your prompt, it'll actually just continue

Josh:
to prompt you with additional questions to understand where you want.

Josh:
And one of the most fascinating things that I read about GPT's 5.3 codex in

Josh:
particular is like you mentioned in the intro, it helps build itself.

Josh:
And I don't think that can be overstated because this is the first model in

Josh:
the history of OpenAI that has helped with the building and construction of itself.

Josh:
And what happens as that starts to ramp up, right? If you think of each model

Josh:
iteration as a flywheel, what is the constraint?

Josh:
The two constraints are the speed at which a developer can actually build it

Josh:
and then create the test for it and make sure that it's safe to ready to deploy.

Josh:
And then it's the hardware that's required to actually train the model.

Josh:
What we're seeing with Codex and Opus, which I really believe was kind of Sonnet,

Josh:
is the incremental improvements.

Josh:
Now, for the incremental improvements that don't require an entirely new training

Josh:
run, the real constraint is the actual software and what you could squeeze out of it.

Josh:
And when you have a model that's helping you build this

Josh:
software that can think for 6 12 24 hours

Josh:
at a time even longer and that is it kind

Josh:
of creates this like self-fulfilling loop right where the models use the

Josh:
new models to make the new models the future models

Josh:
stronger and more powerful and better and i thought that was a really interesting

Josh:
thing to note is that this is the first self propagating model where it ran

Josh:
a lot of the test for itself it introduced new code that made itself better

Josh:
and as we continue to see that you can start to imagine that vertical that like

Josh:
exponential progress line going pretty close to vertical and things getting

Josh:
really good like really really quick.

Ejaaz:
I think what most people listening to this might think is that,

Ejaaz:
well, what was different before?

Ejaaz:
Well, previously, models would just kind of work in a very analog mode.

Ejaaz:
You would just point it at a problem

Ejaaz:
and it would just understand what the problem was and then solve it.

Ejaaz:
But it lacked that awareness and wider context as to like what the wider vision

Ejaaz:
and goal was to achieve and then figuring out stuff for itself.

Ejaaz:
You always had to kind of handhold it. But now with its ability to kind of like

Ejaaz:
understand what it's trying to do and look internally and say,

Ejaaz:
huh, I made that mistake because of this error in my code.

Ejaaz:
I'm going to now like rewrite my code and then I'll be better at it.

Ejaaz:
It kind of functions similarly to a human. Now, I actually saw a great analogy.

Ejaaz:
I forgot who wrote it, but it's

Ejaaz:
fantastic. where if you imagine yourself standing on a sidewalk, right?

Ejaaz:
And a Bugatti Veyron drives super fast by you at let's say 200 miles an hour,

Ejaaz:
you'll be like, wow, that's kind of fast.

Ejaaz:
And then two minutes later, another Bugatti drives by you at 300 miles an hour.

Ejaaz:
You'll be like, wow, that's kind of fast. But you wouldn't really notice the

Ejaaz:
difference between that 100 mile an hour difference, right?

Ejaaz:
But if you were in the car strapped in, you would notice it is significantly

Ejaaz:
improved. And that's how software engineers feel right now.

Ejaaz:
Now, if you're someone that doesn't code all the time, you're not necessarily

Ejaaz:
going to understand these impacts, but it's really important for those of you

Ejaaz:
listening to this to figure out that this is massively impactful and will change

Ejaaz:
the way that a lot of things are happening today.

Ejaaz:
I mean, just take a look at this, right? This is a direct quote from someone

Ejaaz:
who is building at a major tech company, Rakuten.

Ejaaz:
And the quote here says, Claude Opus 4.6 autonomously closed 13 issues and assigned

Ejaaz:
12 issues to the right team members in a single day, managing a 50-person organization

Ejaaz:
across six repositories.

Ejaaz:
Josh, do you know who else is responsible for doing that?

Ejaaz:
An entire team of product managers that each get paid a quarter of a million

Ejaaz:
dollars in compensation automatically.

Josh:
Minimum per year at least yeah their.

Ejaaz:
Jobs are automated now

Josh:
Well one of the earlier moments in

Josh:
which i realized this was pretty profound is is when claude co-work they

Josh:
said they built it with what just a hint like four people over the course of

Josh:
10 days and it was 100 built by the current model of claude which is opus 4.5

Josh:
at the time like the the amount of leverage from these tools is so high but

Josh:
it cuts both ways it's like if you can design and develop a product in 10 days,

Josh:
then that means another company can probably do that in five.

Josh:
And it starts to lower the competitive threshold for these companies to catch up.

Josh:
And it starts to raise the bar of what is possible.

Josh:
Like if you could build something that profound in 10 days, what can you build

Josh:
over the course of six months?

Josh:
Like, can you really build something fantastic that has a moat that like actually

Josh:
delivers on the total power that you have by leveraging this AI?

Josh:
It's going to be interesting to see because i mean what we're finding even with

Josh:
the the codex and opus dual launch is that these companies are right next to

Josh:
each other and if one publishes something,

Josh:
profound or something that attracts a lot of users they're just a few days and

Josh:
a few prompts away from copying it and that's like a pretty difficult thing

Josh:
to compete against on on the software front well.

Ejaaz:
That's why if we look at the stock market over the last couple of days like

Ejaaz:
it's down trillions of dollars and i'm not exaggerating if you look at microsoft

Ejaaz:
over the last two weeks, the stock is down 20%. It's trading like a meme stock, which is just insane.

Ejaaz:
And the reason why that is, is a lot of investors are anticipating that these models,

Ejaaz:
Specifically Opus 4.6 and Codex 5.3, will just create the tools that these billions

Ejaaz:
of dollars worth of SaaS companies have spent or valued their entire lives on

Ejaaz:
in a couple of seconds, just as you described.

Ejaaz:
Now, the counter argument to this, Josh, is, and Jets of Wine actually kind

Ejaaz:
of went live at a conference and spoke about this and made this point,

Ejaaz:
If you're an AI agent or AI model that is capable of building these tools, right?

Ejaaz:
Why would you rebuild the tool every single time you do a function?

Ejaaz:
Surely you would just access the best tool and use it.

Ejaaz:
So there's a bit more nuance where AI models aren't just gonna recreate your

Ejaaz:
entire software stack if you are at a Fortune 500 company.

Ejaaz:
That kind of doesn't make any sense. There are a bunch of tools that are hyper-optimized to do that.

Ejaaz:
But what it will do is it will connect all of these tools and silos in a much more effective way.

Ejaaz:
And maybe that requires rebuilding parts of it.

Ejaaz:
Maybe it requires kind of connecting different ways, but not rebuilding the entire tools.

Ejaaz:
And whatever operating system that ends up becoming will be the most sticky

Ejaaz:
and valuable company ever.

Ejaaz:
Now, that could be Salesforce, or it could be someone completely different,

Ejaaz:
a startup that we haven't even heard of. And I think that's really important

Ejaaz:
to understand, but people are experimenting.

Ejaaz:
And if you look at this graph right here, which is may not look insane to some,

Ejaaz:
but is insane to me at least, 4% of daily GitHub commits are now clawed code.

Ejaaz:
That was, I think, 5% of what it is today two months ago.

Ejaaz:
So the ascent has just been insane. These companies are adopting it and they are using it.

Josh:
Yeah, the number is just going to keep going up and there's no reason why it

Josh:
wouldn't. It's such a testament. One, the speed.

Josh:
It feels like we're strapped in that car and now we're flying.

Josh:
Two, an outsider might not look like it. It certainly feels like that

Josh:
on the inside and i think a lot of people are starting to notice this and get

Josh:
a little nervous about it too like look at this example on the screen right

Josh:
now this is a prompt from gpt 5.3 codex which basically created an entire minecraft

Josh:
clone in a single prompt and it looks awesome and it works really fast and it

Josh:
was super lightweight and

Josh:
And it says, I also tried on Opus 4.6, but for some reason it got stuck.

Josh:
But you can build anything that you want very, very quickly,

Josh:
like very cheaply as well.

Josh:
What Opus 5.3, or Opus 5.3, I'm getting them all mixed up.

Josh:
What GPT 5.3 Codex offered is double the rates, the double the token rates for

Josh:
the next couple of months.

Josh:
So you actually have the freedom for their $20 a month plan to go and build whatever you want.

Ejaaz:
Can I maybe deliver a hot take, Josh?

Josh:
Yeah, what do you got?

Ejaaz:
I think the most exciting part about these model releases aren't the models themselves.

Ejaaz:
Largely, I think the models are kind of similar in capabilities.

Ejaaz:
They are around the same coding benchmarks, and they can roughly do the same

Ejaaz:
things. They can spin up a bunch of agents and orchestrate themselves.

Ejaaz:
The bigger picture, which I think a lot of people missed, was both companies,

Ejaaz:
Anthropic and OpenAI, are at war with each other.

Ejaaz:
And they're trying to basically build and own the operating system for work,

Ejaaz:
which isn't just a model. it's a software suite.

Ejaaz:
So this week alone, OpenAI didn't just release this new model.

Ejaaz:
They released the Codex app, which is a desktop Mac app, which is kind of like

Ejaaz:
a command line interface, which makes the coding experience way better.

Ejaaz:
And they also launched an enterprise platform called Frontier,

Ejaaz:
which allows Fortune 500 companies to basically take this magical model and

Ejaaz:
give it to non-coders and let them do magical things. Now,

Ejaaz:
All of these products together creates a very sticky experience where it starts

Ejaaz:
to make sense for software engineers and non-software engineers to use these products.

Ejaaz:
And it becomes incredibly sticky, which results in billion-dollar contracts, right?

Ejaaz:
Anthropic has done the same thing over the last two weeks.

Ejaaz:
They released Claude Cowork, they released agent teams this week,

Ejaaz:
and then they released this new model.

Ejaaz:
They're going after the same thing, which it kind of makes sense why they're

Ejaaz:
releasing Super Bowl ads that are kind of shitting on each other now.

Ejaaz:
It makes a lot of sense. And so the point is, if they can own this operating

Ejaaz:
system, this future of work, they will basically be the most valuable company.

Ejaaz:
And I think it's going to be when it takes most.

Josh:
I have to interrupt you here. We have some developments on our prompts that

Josh:
we've been working on, our AI stock war room. Let's go. That I'm going to have

Josh:
to share on the screen right now.

Josh:
So currently what it's doing is it's asking to do some quality assurance testing.

Josh:
So you'll see it actually used a it's taking over control of my browser and

Josh:
it's asking to make prompts on the screen. So you can see all of this that you're

Josh:
seeing right here is generated live, and it's doing an actual real-time debug

Josh:
of the product that it made.

Josh:
It's clicking around, it's resizing things, it's going through the links,

Josh:
and it's running real quality assurance testing on the actual product.

Josh:
It's really amazing to see.

Josh:
This was all just built all these visual charts and they're all accurate so

Josh:
right now we're looking at nvidia we have a chart and i'm not going to mess

Josh:
with it because it's doing the real-time manipulation to do quality assurance

Josh:
checks but it's actually clicking through it's making sure the

Josh:
stats are accurate it's making sure all the widgets work and look it has this

Josh:
amazing graphs already it has sentiment analysis 85 percent of people are bullish

Josh:
on nvidia it has recent signals from the news it has the assessment a risk assessment

Josh:
matrix where it shows the like export controls and chip controls.

Josh:
It has revenue and earnings every single quarter, charted, competitive moats.

Josh:
It has sector comparisons. It's like, this is unbelievable.

Josh:
And it just generated this in a single prompt. And I just find it really funny

Josh:
that we can actually watch this do it in real time.

Josh:
So you'll see in this prompt, it's clicking through, it's taking screenshots of what it's seeing.

Josh:
And then it's digesting, analyzing, and understanding what it made,

Josh:
what it messed up and what it actually still has left to finish.

Josh:
And it generated everything, all of this in real time as we're recording this episode.

Josh:
So fascinating.

Ejaaz:
Wow, it reminds me of some of the research platforms at the former companies

Ejaaz:
that I used to work at and they would pay, I'm not joking, millions of dollars

Ejaaz:
a year to get access to these types of platforms that would give them analysis

Ejaaz:
like what you're showing on the screen right now.

Josh:
And you just built it from scratch. From scratch, and look, it's doing this.

Josh:
I'm not even touching my keyboard. I just searched for Apple and now I'm sure

Josh:
if I go over to the prompt,

Josh:
it's taking screenshots of apple it says apple dashboard

Josh:
looking great let me scroll to see the new three column button row layout and

Josh:
it's checking the button rows and it's really unbelievable like we have the

Josh:
investment thesis the bull case for it the bear case for it catalyst and timelines

Josh:
it has wwdc built in it has the iphone 18 launch props um set up for september,

Josh:
It's like so cool. It's absolutely unbelievable. And now this is a real tool

Josh:
that I'll be able to use to type

Josh:
in whatever stock I want to look at and actually get some analysis on it.

Josh:
Now, I'll go over to Codex over here and it looks like Codex is taking its sweet time.

Josh:
It's still zero out of six tasks completed. So it might take a little while

Josh:
for us to get a visual on that, but it's just amazing to watch this happen in

Josh:
real time as at least Cloud Code and Opus 4.6,

Josh:
does some quality assurance testing live by taking over my browser and running

Josh:
it for itself. I just think this is like, this is amazing.

Ejaaz:
It's magic. Something I just noticed in your Opus chatbot screen when it's going

Ejaaz:
through its thinking, it seems to have like spun up a few different agents or

Ejaaz:
instances of its own self to pull this off.

Ejaaz:
Like I think if you scroll up, like I saw a few kind of like prompts that like

Ejaaz:
suggested that that's what it was doing,

Ejaaz:
which I think is, underscore is a very important point that both of these models

Ejaaz:
can do, which is they can spin up multiple versions of the same model and task

Ejaaz:
it with different things to run in parallel.

Ejaaz:
What this means is you can get a really complicated product like what you're

Ejaaz:
seeing on the screen right now in a matter of minutes because it's running in parallel.

Ejaaz:
So imagine having a bunch of computer science geniuses that you can just duplicate

Ejaaz:
immediately and run at a fraction of the cost of electricity, the cost of inference.

Ejaaz:
And now you start to see why all these NVIDIA chips and stuff are worth so much.

Ejaaz:
Because you want to do cool stuff like this. This is insane.

Josh:
It's actually incredible. Okay, so now I want to test it on Tesla.

Josh:
So I'm going to choose Tesla and see if it actually can do it in.

Ejaaz:
A non-controlled environment. This UI is so cool.

Josh:
It's very pretty. What the hell? This looks great. Okay, so here we have Tesla.

Josh:
It has the charts. We're going to click through the charts. It has the one-week

Josh:
chart, the one-month chart, the three-month chart. That looks fairly accurate.

Josh:
It has the price-to-earnings ratio, the 52-week high, 52-week low.

Josh:
So it looks like at one point it was trading at $4.88, now it's trading at $3.89.

Josh:
The bull case for Tesla, RoboTaxi and FSD driving licenses could unlock $500

Josh:
billion in revenue by 2030.

Josh:
It has the RoboTaxi service launch in Austin that it's preparing for.

Josh:
And let's see the sector comparison. So it's comparing it to Rivian, Baidu, Toyota, Ford.

Josh:
It has the competitive moat where it says it's most strong in brand power,

Josh:
IP patents, and cost advantages.

Josh:
You can see the revenue, the estimate per share earnings.

Josh:
Sentiment is much worse on Tesla than it was on Apple. It's at 52% right now.

Josh:
And it looks like, as it relates to the risk assessment, devaluation and competition

Josh:
and execution are all very high risk.

Josh:
And that's probably an accurate assessment, although I'm not sure the competition

Josh:
is really a problem. The execution is certainly going to be an issue.

Josh:
But it's just amazing to see how well it does. And it even gives it a verdict.

Josh:
So the AI verdict on Tesla is,

Josh:
It's a hold. Tesla's optionality is enormous, but current valuations already

Josh:
prices in multiple moonshots.

Josh:
Execution on RoboTaxi will be the key catalyst. That sounds about right.

Josh:
And it's amazing that we just built this with a single prompt without any oversight from me.

Josh:
And it works. It actually works. It's really just unbelievable how capable these things are.

Josh:
And now I have a dashboard that anytime I want to make a decision,

Josh:
I can type in the ticker and get all this um optionality it even has menus that

Josh:
work look at this profit margins pe ratios market cap wow pretty unbelievable it's.

Ejaaz:
It's a reactive in real time bloomberg terminal oh wait for the modern age

Josh:
There's um there's another feature here that looks like you could compare stocks

Josh:
let's see if this actually works here so if i type in let's say apple's ticker

Josh:
and i hit go will that compare the two now it looks like that doesn't work very

Josh:
well oh my god but it has moving average lines and everything. This is pretty robust.

Ejaaz:
I know it's like the traded and investors dream. Just crazy.

Ejaaz:
Kind of like a side note on this, but like,

Ejaaz:
The fact that Tesla's down and everyone's kind of like bearish on this company,

Ejaaz:
even though they're like rumored to be merging and stuff like this.

Ejaaz:
Like the point being is there's an asymmetry between what the market is seeing

Ejaaz:
and what these inventors and builders are seeing.

Ejaaz:
These AI labs have created what they define as pretty much a low form of AGI.

Ejaaz:
You literally have an AI model that is building the next version of itself.

Ejaaz:
That by description is like a super genius and it's only limited by the function

Ejaaz:
of energy and compute, right?

Ejaaz:
And then investors are looking at this and saying, huh, Amazon and Google are

Ejaaz:
about to spend a combined $500 billion worth of CapEx this year.

Ejaaz:
Kind of bearish, that's a lot of money. So there is a real investment opportunity

Ejaaz:
here to really understand the difference of what these things can actually do.

Ejaaz:
And that might lead to a lot of like opportunities to invest.

Ejaaz:
I don't know, but I know that I'm buying Tesla today and a bunch of google stock

Josh:
Yeah i mean look at this google valuation one this chart looks absolutely gorgeous

Josh:
but two um the ai verdict is a buy even the ai thinks google is a buy because

Josh:
they just have um alphabet offers the best value in mega cap tech dominant ai

Josh:
capabilities diversified growth and a cheap valuation if search mode holds and.

Ejaaz:
Yeah give me the week give me the week

Josh:
Let's see the weekly chart here do you want some moving average lines as well

Josh:
because we could drop those in please let's.

Ejaaz:
See let's see i'm actually super yeah look see it's had a slight dip Markets are so reactive. Crazy.

Josh:
Yeah, and I think to the point of the CapEx, markets are viewing that as a scary, high-risk statement.

Josh:
But while that's true, I also think it's a testament to the fact that scaling

Josh:
laws are going to work, and the largest companies in the world are betting on

Josh:
the continuation of them working.

Josh:
And the shared consensus between all of these large-cap companies deciding to

Josh:
spend record CapEx this year,

Josh:
is a testament to the fact that things are only going to go faster.

Josh:
And they believe that the more money they put in, the more outputs they will get.

Josh:
And they're going to continue to put their foot on the gas. So I think any question

Josh:
that anyone had, if these scaling laws could continue to hold up and we could

Josh:
continue to be on the path to whatever AGI looks like and beyond,

Josh:
I think that was answered this week through these earnings reports.

Josh:
And the overwhelming answer is yes, it's true.

Josh:
It is likely that this is going to happen and everyone is betting their entire company on it?

Ejaaz:
I think we have done a great job, if I pat ourselves on the back virtually,

Ejaaz:
Josh, of showing what these models are capable of.

Ejaaz:
And remember, it's been less than 48 hours that these models have been alive.

Ejaaz:
In fact, I think it's been like 36 hours. So if any of you are interested in

Ejaaz:
trying these out, I cannot urge you enough to go out and try these things.

Ejaaz:
Try to solve a problem that you're finding at work or try to solve a problem

Ejaaz:
that you're finding just in your casual leisure time to code up a hobby or a

Ejaaz:
project in a matter of seconds. It's so, so easy.

Ejaaz:
And it'll put you at an advantage to understand how these tools work and why

Ejaaz:
they're really changing the world as we see it around us, why stocks are dumping,

Ejaaz:
why some stocks are pumping.

Ejaaz:
But yes, go demo it. Let us know what you actually end up building.

Ejaaz:
Josh and I are trying to give you more live demos in a lot of the episodes that we put out.

Ejaaz:
And with every other model release and feature that drops, we are going to be

Ejaaz:
trying and testing these things so we can bring to you exactly what these things

Ejaaz:
can do and show you kind of like the benefits and disadvantages,

Ejaaz:
what's real and what's really not.

Josh:
Yeah. And I can't stress this enough. The best way to stay on top of things,

Josh:
the best way to feel like you're not being left behind is just to use the tools

Josh:
as they come out and to understand them and what makes them different.

Josh:
And for a single subscription to ChatGPT or to Claude, you can access tools

Josh:
just like this and build stuff just like this.

Josh:
I'm not, this wasn't like an incredibly difficult technical challenge.

Josh:
You just ask it what you want and you ask it to help you.

Josh:
And it will actually walk through and help you through the process and build whatever you want.

Josh:
So the most important thing for anyone listening is just to train that muscle and to get familiar with,

Josh:
these tools and these skills that you're able to leverage them to your advantage,

Josh:
however it may best fit in your life.

Josh:
And that's what kind of we wanted to share with us.

Josh:
Like, it's simple. You download the app, you log into your account,

Josh:
and you're on your way. It's really

Josh:
not as difficult as I think a lot of people make it seem like it is.

Josh:
And I mean, this beautiful dashboard is a testament to that.

Josh:
Okay, so Ejaz, it also looks like our codex output

Josh:
has finished itself so we have here on the

Josh:
screen we have opus which we saw which is

Josh:
really a lovely dashboard but it seems like codex

Josh:
now has its own version that we could quickly compare so maybe we'll try we'll

Josh:
go to our favorite google we'll type google in and we'll click analyze and kind

Josh:
of see how this compares i find it funny how they've they've merged on the same

Josh:
type of design style but yeah oh okay this whoa this is interesting this is

Josh:
different so it has the moving averages select oh is that,

Josh:
Okay, yeah, so it has the charts.

Ejaaz:
Is that accurate?

Josh:
It has the PE ratio. Yeah, that's what I was looking at. Let's go to that one-week chart and see.

Josh:
I have some questions about these. It looks pretty right.

Ejaaz:
Okay. That looks very wrong.

Josh:
Yeah, the one you're a little confused about. Let's compare it to Claude here.

Josh:
Let's go to Google and we'll analyze that. Well, it thinks we can look at the

Josh:
rest. So it looks like it emulated pretty well.

Josh:
It has the verdict. It has the same stats.

Josh:
The risk assessment matrix is... good but you could see like some of the text

Josh:
you can't really read because it's black on black um but nonetheless pretty

Josh:
interesting they both succeeded.

Ejaaz:
Yeah i mean as we said before like these models are very equally capable and

Ejaaz:
you know maybe it's just the way that you prompt something or uh the way that

Ejaaz:
some of these things work but largely they kind of achieve the same goal and

Ejaaz:
same quality um and like listen like we're talking about like minor discrepancies here

Ejaaz:
I can't wait to see what we will build with this. Like, this is insane.

Josh:
It's amazing. Both of these one-shot prompts didn't touch anything.

Josh:
And here we are. I do think that Google, when your chart is wrong,

Josh:
I think Claude got that one right.

Josh:
But we overall both succeeded in the mission. Both look great.

Josh:
And both are just excellent models.

Ejaaz:
Amazing. Okay, well, that's it. Wherever you're listening to this,

Ejaaz:
if it is on YouTube and you're watching our lovely faces, or if you're listening

Ejaaz:
to us on Spotify, Apple Music, or wherever you listen to us,

Ejaaz:
please subscribe, give us a rating, leave us some comments.

Ejaaz:
We love your feedback and we respond to pretty much every single comment because

Ejaaz:
we're trying to figure out how to make this show better and bring you the content

Ejaaz:
that you guys deserve and want.

Ejaaz:
Turn on notifications because we are releasing more and more videos every week

Ejaaz:
on the hottest topics as they come out.

Ejaaz:
We also have the sickest newsletter ever where one of us will either write a

Ejaaz:
essay or give you the five top highlights of the week.

Ejaaz:
So if you don't want to watch any of these videos, you can just read and digest

Ejaaz:
that and you'll know everything that you need to know in AI and frontier tech.

Ejaaz:
Thank you for listening, and we will see you on the next one.

Josh:
See you in the next one. Peace.