Limitless: An AI Podcast

Let's examine the fierce competition between AI coding tools Anthropic's Claude and OpenAI's Codex. As Codex emerges with robust updates, we discuss user experiences and showcase demos comparing game development and dashboard creation.

Highlights include Codex's superior interface and innovative features like auto-review and Chronicle. We also explore the broader implications for AI integration in coding tasks.

------
🌌 LIMITLESS HQ ⬇️

NEWSLETTER: https://limitlessft.substack.com/
FOLLOW ON X: https://x.com/LimitlessFT
SPOTIFY: https://open.spotify.com/show/5oV29YUL8AzzwXkxEXlRMQ
APPLE: https://podcasts.apple.com/us/podcast/limitless-podcast/id1813210890
RSS FEED: https://limitlessft.substack.com/

------
TIMESTAMPS

0:00 Claude vs Codex
3:25 Image Generation Capabilities
4:52 Long Horizon Autonomy
8:55 Chronicles
10:19 Demo
16:49 Dashboard Creation Challenge
20:30 The AI Model Harness Explained
24:27 The Future of AI Tools
26:20 Claude Mythos
27:26 Verdicts

------
RESOURCES

Josh: https://x.com/JoshKale

Ejaaz: https://x.com/cryptopunk7213

------
Not financial or tax advice. See our investment disclosures here:
https://www.bankless.com/disclosures⁠

Creators and Guests

Host

Ejaaz Ahamadeen

Host

Josh Kale

What is Limitless: An AI Podcast?

Exploring the frontiers of Technology and AI

Josh:
A few months ago, we told you to use Claude. Now we're telling you to switch

Josh:
back because for those of you who aren't familiar, well, over Christmas break,

Josh:
there was a major vibe shift where AI coding went from this like fun tool to

Josh:
things that developers actually use when they're shipping code.

Josh:
And even if you're not a developer, the amount of use cases and applications

Josh:
that were created around that time were really strong.

Josh:
And since then, Anthropic has gone on this generational run of shipping these

Josh:
incredible products seemingly every single day that has turned Claude code into

Josh:
this supercharged super app that is the place that EJS,

Josh:
I know you've gone to, I've gone there too, in order to get all of our AI progress done.

Josh:
Any work that we have, we've gone to Cloud Code.

Josh:
Now, OpenAI has woken up. And over the last few weeks, Codex has shipped more

Josh:
features than most companies ship in a year.

Josh:
And I bet, I guarantee that you haven't heard of some of these features that

Josh:
we're going to talk about in this episode.

Josh:
The pendulum has fully swung back, or at least I believe, because I'm totally Codex-pilled.

Josh:
And in this episode, we're going to kind of walk through the differences between

Josh:
these two and why the model that you're using today probably won't be the model you're using tomorrow.

Josh:
And I don't think we're going to convince you, but maybe we could show you why

Josh:
you might want to consider using something else here.

Ejaaz:
I just want to talk through some of the crazy stats here because the script

Ejaaz:
has genuinely flipped. A few months ago, Claude Code was anything everyone could talk about.

Ejaaz:
And every software engineer was using Claude Code. Every enterprise was installing it. It was crazy.

Ejaaz:
But just over the last couple of weeks, specifically by the end of April,

Ejaaz:
Chat GPT 5.5 was released, and that was plugged into the coding AI model.

Ejaaz:
It's all one and the same.

Ejaaz:
And OpenAI went on this code red run where they focus on nothing but building

Ejaaz:
the best coding AI model and the best LLM.

Ejaaz:
And the numbers show that it's worked.

Ejaaz:
Over the last week, Codex has been downloaded over or installed over 46 million times.

Ejaaz:
Cold code, under 500,000 times. Now, that is crazy to say, because if you look

Ejaaz:
at the historical data, cold code downloads and installs has absolutely dwarfed

Ejaaz:
Codex, but something changed over the last couple of weeks.

Ejaaz:
That something was OpenAI putting out just a better model. You mentioned that

Ejaaz:
you were Codex pill, Josh.

Ejaaz:
I think so am I. I've spent the last couple of days playing around with Codex.

Ejaaz:
This morning, we prepped a bunch of really cool demos, and it is just completely flipped script.

Ejaaz:
But it's one thing saying it. It's another thing actually showing the direct

Ejaaz:
comparison. So we created this visual artifact to kind of

Ejaaz:
give you the scoreboard. And you can see it at the top here.

Ejaaz:
It's OpenAI Codex at 11 and Anthropic Claude at two. But let me explain why.

Ejaaz:
Okay. So number one, computer use.

Ejaaz:
Codex and Claude code can use your computer.

Ejaaz:
It can take over your desktop and it can like move your cursor around.

Ejaaz:
Now, Claude pioneered it. There were the first ones there, but it was super slow.

Ejaaz:
It kind of runs into a bunch of obstacles and you have to kind of like handhold

Ejaaz:
it and improve it to do a bunch of different things.

Ejaaz:
Codex is not only quicker than me, it's quicker than the average person.

Ejaaz:
In fact, I can actually see the cursor move around so quickly.

Ejaaz:
And it's like using a computer, but it's a superhuman and it can run pretty much 24-7 at this point.

Ejaaz:
Long horizon autonomy. Codex can work for longer in a much more intelligent

Ejaaz:
manner versus Claude code, which is, again, crazy to say because literally a

Ejaaz:
month ago, it was the inverse of this.

Ejaaz:
Claude right now can run for a decent number of times or amount of time,

Ejaaz:
but not as long as Codex can.

Ejaaz:
And then the last two that I want to talk about here is browser use.

Ejaaz:
So Codex can take over your browser. It can do a lot more intentional things.

Ejaaz:
It understands what it's looking at, very importantly. Previously, it could not do that.

Ejaaz:
Claude can do the same, but not as intelligently. And then finally,

Ejaaz:
ChatGPT Images 2.0 got released, what was it, like two weeks ago now? Oh, it's so good.

Ejaaz:
Yeah, it's the image generation model from OpenAI, and it is absolutely astounding.

Ejaaz:
In fact, it beat all the other predecessors, including Google's,

Ejaaz:
what is it, Nano Banana 2.0 Pro, which previously held the lead.

Ejaaz:
It beat it across every single benchmark.

Ejaaz:
Anthropic, on the other hand, doesn't even have an image gen model.

Ejaaz:
So, so far, it's crushing. Yeah.

Josh:
Yeah, I think a lot of the best is now bundled into codex. The image gen for

Josh:
anyone who uses any sort of visual work is unbelievable. And being able to use

Josh:
that directly in your software is awesome.

Josh:
One thing that you mentioned is the long horizon autonomy. I think that needs

Josh:
a double clicking on because it's really impressive how well it works.

Josh:
Traditionally, there's been this thing called a Ralph loop that we use.

Josh:
It's actually named after the character from The Simpsons who is very persistent.

Josh:
And it's basically a planning mode where you give the AI a goal and it will

Josh:
continue to iterate towards that goal until it accomplishes it.

Josh:
So like, let's say you want to build a Lego car or something and you give it the exact parameters.

Josh:
It will go and go and go until it solves that problem and gives you exactly

Josh:
what you want in a way that other AI models haven't.

Josh:
Codex did that. And this is the only native implementation that you can get

Josh:
of this long horizon thinking where it actually will go for days on end.

Josh:
I've seen screenshots of some thinking for as long as 36 hours to accomplish the goal.

Josh:
So if you have really difficult tasks, Codex is going to be really good at solving those.

Josh:
Now, continuing to scroll down, there was another feature that was just released

Josh:
this week called auto review.

Josh:
And a huge pain in the ass for people who are creating code for working on complex

Josh:
projects, whatever it may be, is you're constantly having to sit there and approve

Josh:
things because the permission system is a little finicky, right?

Josh:
You don't want to give it full access to your computer.

Josh:
You also don't want to sit there and approving every time it wants to use Chrome

Josh:
or every time I want to access your file.

Josh:
So Codex created auto review and they rolled it out last week where the agent is kind of smart.

Josh:
It knows which things are going to possibly be systemic existential threats

Josh:
and which approvals aren't.

Josh:
And it will just automatically approve all the things that aren't going to get you in a lot of trouble.

Josh:
It creates a much easier user interface where you can just kind of walk away

Josh:
from the computer for a little while and come back and things get done.

Josh:
Memory and context is pretty strong i'd say

Josh:
the one thing and we haven't mentioned many claude winners the

Josh:
place where claude wins currently is on their open claw capability funny

Josh:
enough because open ai bought open claw but dispatch is the mobile app feature

Josh:
for claude in which you can actually engage with claude code remotely that doesn't

Josh:
currently exist on codex and while the team has promised to ship that you don't

Josh:
actually have that currently today claude has that also in terms of the personality

Josh:
and ui claude is just so much better i think we're going to get into our.

Josh:
Personal takes but whenever you're using an llm versus an actual

Josh:
tool set or a harness claude is pretty great and the

Josh:
ui is very warm so there's there's some kind of

Josh:
instances in which claude is better but for the most part codex is

Josh:
really just kind of crushing it and i've really enjoyed using it one of the

Josh:
fun things is pets i mean just recently they released pets and claude also released

Josh:
pets but these pets are a little bit different this is an example of angry dario

Josh:
we're seeing on the screen and it's fun because you have this persistent character

Josh:
that exists throughout your computer use.

Josh:
And as you're engaging with Codex, it'll just kind of chat with you in the background

Josh:
so you can see your progress, see where you're at.

Josh:
It's fun, it's playful, and it just shows that they kind of care about the user experience.

Josh:
Now, one feature I would guarantee most people don't know is Chronicle, EJS.

Josh:
And you were just telling me about Chronicle and how cool it is,

Josh:
how it kind of monitors your screen as you go. This seems like novel technology

Josh:
that we haven't seen yet.

Ejaaz:
Yeah, so one of the earliest episodes that we did here on Limitless was an interview

Ejaaz:
with the folks at OpenAI that created...

Ejaaz:
Something called, what was it called, Josh? Do you remember?

Ejaaz:
It was like agent mode or personal mode, something like that.

Josh:
Yes. It thought overnight for you, right? Yes.

Ejaaz:
It basically took all the conversations that you'd had with ChatGPT the night

Ejaaz:
before or the day before or the week before, and it created important context

Ejaaz:
around you in the form of something called memories.

Ejaaz:
This is where AI memory was birthed from OpenAI themselves, from the OpenAI

Ejaaz:
team. And what it would do is it would feed you a report in the morning that

Ejaaz:
would update you on information that it thought you would be interested to read about.

Ejaaz:
So say, for example, you were interested in the stock market,

Ejaaz:
it'll give you an update on a bunch of advancements that had happened overnight

Ejaaz:
or over the last week or whatever it might be.

Ejaaz:
Right now, fast forward today, memory is embedded across every single AI model

Ejaaz:
and tool. The reason why is context is so important.

Ejaaz:
It's one thing a user asking for something explicitly and directly.

Ejaaz:
It's a complete other thing for an AI to actually understand what you mean,

Ejaaz:
the nuance in the sentence that you've created, and even better,

Ejaaz:
to predict what you want.

Ejaaz:
But there was still an obstacle, which was you needed to feed it the context

Ejaaz:
and say, hey, Claude, hey, ChatGPT, can you remember this?

Ejaaz:
OpenAI recently released a feature called Chronicle, where it observes what you scroll through,

Ejaaz:
What you click on, what you type, and it builds its own context and memories

Ejaaz:
around you without you needing to feed it, which actually led to a really cool

Ejaaz:
prompt that you pointed out, Josh, or that you found, which was,

Ejaaz:
what have I been doing very inefficiently on my computer, according to Chronicle,

Ejaaz:
which is this new memory feature, make some recommendations,

Ejaaz:
be direct, tell me what I need to hear. That's, that's pretty awesome.

Josh:
Yeah. So this is alpha because I don't think a lot of people recognize that

Josh:
this is a possibility because Codex and OpenAI didn't do a good job of explaining this.

Josh:
When they released Chronicle, they said it's a way of the system to review your

Josh:
code as you've gone because it's been taking sequential screenshots.

Josh:
But it's the reality is, is that it's much bigger than this.

Josh:
And I suspect they didn't market it this way because it could be a bit of a privacy issue,

Josh:
but it's essentially constantly monitoring your screen and taking screenshots

Josh:
of what's happening on your screen and interpreting it so it understands your

Josh:
habits, the way that you work, the thing that you do.

Josh:
And then you can ask it, what have I been doing very inefficiently on my computer?

Josh:
According to Chronicle, make some recommendations, be direct,

Josh:
tell me what I need to hear.

Josh:
And it'll actually evaluate how you've been using your computer,

Josh:
how long you've been scrolling on Twitter, perhaps, how long you haven't been

Josh:
doing the things you're supposed to be working on, or just generally how to

Josh:
improve your workflow and give you real feedback based on your actual actions that it's seen.

Josh:
And I think this is a super powerful thing currently only available to pro members.

Josh:
So if you pay for the $100, $200 a month subscription, you get access to this.

Josh:
But I suspect this is the early signs of a very important feature they're going

Josh:
to roll out, which is that entire computer monitoring system to improve your

Josh:
system and also probably train the models to get better at engaging with your system.

Josh:
But I found Chronicle to be one of those kind of secret features that not a

Josh:
lot of people know about, but has a lot of upside if you use it to your advantage

Josh:
and let it monitor what you're doing and improve your workflow on a day-to-day basis.

Ejaaz:
So the point is, from both of these companies, Anthropic and OpenAI,

Ejaaz:
we are getting feature releases every single week. In fact, every single day.

Ejaaz:
And it's becoming, I'm being bombarded by this.

Ejaaz:
And it's hard to keep track with all of this. So what is the number one litmus

Ejaaz:
test for both of these models and products and companies?

Ejaaz:
It's to actually use the thing. It's to build the thing.

Ejaaz:
And we have two special demos that we have prepared for you that we're about

Ejaaz:
to jump into. Now, Josh, can you guess what my first demo is about?

Josh:
The theme. First one's a game. We're gamers, man. I want to play a game.

Josh:
I want to see how well it does on a game.

Josh:
I know we did this demo in the past months ago. It left a lot to be desired.

Josh:
So I'm curious to see the current up-to-date status as it relates to Cloud Code

Josh:
versus Codex. Who's winning on the one-shot game prompt?

Ejaaz:
Indeed. Okay. So I am a nostalgic kind of guy. And so I was like,

Ejaaz:
oh, back in the day, I loved Mario.

Ejaaz:
So I want you, both of these models, to create the best Mario type or inspired

Ejaaz:
game, a side scroller, but make it futuristic.

Ejaaz:
Maybe add a little bit of neon, sprinkle a bit of neon in there,

Ejaaz:
create levels. I want game design. I want there to be enemies.

Ejaaz:
I want there to be pitfalls.

Ejaaz:
And I also want there to be a scoreboard and also tell me how to do this thing.

Ejaaz:
I want, give me the whole package.

Ejaaz:
Basically, I fed this prompt or idea into ChatGPT and Claude.

Ejaaz:
And I said, can you create a detailed prompt that I can then feed into your coding models?

Ejaaz:
I then set each of the coding models to their highest settings.

Ejaaz:
So what you're about to see is the best of the best for the most

Ejaaz:
detailed prompt that they came up with and let's see what they

Ejaaz:
did so step number one or example number

Ejaaz:
one is called opus 4.7 so

Ejaaz:
this is called code at the highest setting with their latest model

Ejaaz:
um okay it took the prompt pretty literally it's titled this neon plumber moon

Ejaaz:
base run which is obviously mario inspired and it said hey this is a demo edition

Ejaaz:
by the way this is not production ready what i like about this is it's giving

Ejaaz:
me the instructions but how does the game actually play out let's see it looks

Ejaaz:
good can you see me here josh i

Josh:
Can yes i can and it looks like.

Ejaaz:
The animations are pretty good i'm jumping around i think

Ejaaz:
i'm like a little robot i can see my feet pitter-pattering now

Ejaaz:
i'm guessing this thing is about to kill me so let's see if i can jump oh i

Ejaaz:
can jump there we go that's awesome um one bit can i kill this guy oh yes i

Ejaaz:
can now one bit of feedback i've noticed is uh i can't double jump and it told

Ejaaz:
me in the menu that i could double

Ejaaz:
jump so that's weird so the physics hasn't really paid off can i die

Josh:
Oh, it certainly looks like you could die.

Ejaaz:
I can die. Great. Okay. So that is Claude's attempt at it. What's your feedback

Ejaaz:
on this, Josh? I think the graphics are pretty good.

Josh:
The graphics are great. For one shot, I mean, granted, this is only one single

Josh:
prompt. So for one prompt, it created great graphics.

Josh:
It had sound design that actually sounds pretty accurate to what you would expect in the game.

Josh:
It has similar principles. It's following gaming principles.

Josh:
You kind of understand what looks dangerous, what doesn't.

Josh:
You knew that those spikes were going to hurt you and they hurt you.

Josh:
The logic seems to be a little bit flawed i think it's having problems with gravity or at least that

Josh:
double jump functionality because it looks like those coins that you probably

Josh:
want to collect you can't actually reach because you can't do the double jump

Josh:
so in terms of logic not so hot in terms of visuals aesthetics in terms of i

Josh:
mean how good this game is from one shot very impressive yeah.

Ejaaz:
I think it's important to understand that i started from zero it literally asked

Ejaaz:
me to give it a folder to build in and the folder was completely empty.

Ejaaz:
So all the visual renderings, all the graphics, the animation style,

Ejaaz:
the scoring system, the way that the avatar moves and looks was created from

Ejaaz:
scratch from a bunch of characters from this AI model.

Ejaaz:
So this is Claude Code's current best attempt and it is way better than what

Ejaaz:
we tested out and honestly demoed on this show about a month ago.

Ejaaz:
But now let's see what OpenAI's ChatGPT 5.5 codex at the highest possible setting cooked up

Josh:
Okay and this is using the same prompt correct so you just fed the model the

Josh:
same prompt identical but identical right oh god i'm excited i hope codex did

Josh:
well because now now that i'm a fan i'm gassing it up it better perform here okay.

Ejaaz:
So this is gpt 5.5's attempt now you

Ejaaz:
might notice that this isn't the entire browser that's because

Ejaaz:
codex has a very unique feature which is not only

Ejaaz:
can it do all the coding in a single app for you but it

Ejaaz:
has an in-app browser so it can

Ejaaz:
live test the thing in the app without you needing to go to google chrome or

Ejaaz:
whatever but anyway we have the starting screen here it has also called it neo

Ejaaz:
neon plumber moon base run it looks a little more rudimentary from the start

Ejaaz:
but i do like the background animation josh we didn't get this in the previous

Ejaaz:
one or at least not this side scrolling thing well let's

Ejaaz:
Oh.

Josh:
Oh, this is nice.

Ejaaz:
This is nice. I think this has good logic.

Ejaaz:
Wait, but this is no music. There's no music. I can't double jump.

Ejaaz:
Might be a skill issue. Might be a prompt issue.

Josh:
Let's have a look. Did it say you can double jump?

Ejaaz:
That's a good question, actually.

Josh:
This is a fully playable game.

Ejaaz:
Yes. And I like that it's like zoomed in. There's like... Oh,

Ejaaz:
we got the boost. I can jump on the platforms. Let's see if I can kill this guy.

Josh:
Yes. nice okay.

Ejaaz:
And can i jump the gap there's a scoring system

Josh:
You could see your hearts oh dude this is way better power.

Ejaaz:
Up wait oh my god i want the power up i'm still gonna go back double jump

Josh:
You can you could go back go back to the last platform oh god.

Ejaaz:
I died i'm going i'm going to the last platform here

Josh:
We go it looks like they're sequentially gaining height which is interesting,

Josh:
oh but okay so if i'm comparing these two i'm actually i'm not feeling very

Josh:
let down this is good aside from the music not existing which we may not have

Josh:
explicitly asked um this it looks like the logic plays better the actual gameplay

Josh:
is usable this is a full i don't know if it's glitching or if this is you glitching no no.

Ejaaz:
That is it's glitching it's glitching a bit

Josh:
Okay so it's still there are some edge case errors yeah but

Josh:
this is different in the sense that you have your hearts clearly projected you

Josh:
have a score system that's clearly in place you're able to get these power-ups

Josh:
they work they function i mean this is a very clean and functional game so i

Josh:
would give this to codex i think the experience perhaps the design of claude

Josh:
was better and And perhaps the music,

Josh:
I mean, music was definitely better versus none, but Claude,

Josh:
in terms of just, or Codex, in terms of just coding logic and making a better

Josh:
game, I give, I give this Codex. Do you have a take?

Ejaaz:
Yeah. So on the build side of things, I had a much more pleasant experience.

Ejaaz:
Using codex as well so i think codex wins

Ejaaz:
on this um i one-shotted it in the true sense

Ejaaz:
where i just gave it a single prompt and codex didn't ask for

Ejaaz:
any permissions it just kind of went on and did the thing i saw

Ejaaz:
it it's thinking and at points where it was unsure it thought amongst itself

Ejaaz:
and then made the decision to progress forwards whereas with cloud code it would

Ejaaz:
come to me now that might just be a developer engineer's preference right like

Ejaaz:
if you're building a production ready app for like, I don't know,

Ejaaz:
a big company that you work for, you probably want to have more hands-on involvement.

Ejaaz:
Whereas if you're just building a game like we did today, where I don't really

Ejaaz:
care what it ends up looking like or what it does, then the hands-off preference

Ejaaz:
is probably something that you would use Codex for. But I think Codex wins this.

Josh:
So for our second demo, we have this handwritten piece of paper that I actually

Josh:
wrote and took a picture of.

Josh:
I didn't. It's GPT Image Gen 2.0, but it looks like it's handwritten.

Josh:
The handwriting was too nice.

Ejaaz:
Josh. That was the giveaway.

Josh:
Yeah, my handwriting is far sloppier than this. But the idea is that you can

Josh:
even write things on the back of a napkin and you could turn that into an application.

Josh:
So what we did here is we just asked for it to create a generic limitless dashboard

Josh:
application on the back of a piece of paper, fed it into the model, and this is what we got.

Josh:
So it looks like it did a pretty good job.

Josh:
I could tell this is Claude before you even tell me which model it is because

Josh:
it has the standard design principles.

Josh:
Claude design is so basic and

Josh:
it's so predictable where like okay i've seen this

Josh:
dashboard before it looks like it was a mission success there's a

Josh:
lot of text on this page a lot of stuff going on a lot

Josh:
of graphics i give a lot of credit for kind of inferring what

Josh:
we would want to be seeing from something like this where we have a

Josh:
proper trip budget i don't think we asked for a trip budget um but okay i think

Josh:
it looks like it made it did a lot of inferring right like it kind of made a

Josh:
lot of assumptions but in the end of the day it did take what we had on the

Josh:
napkin and it turned it into a pretty generic dashboard of sorts based on very

Josh:
limited information that we gave it.

Ejaaz:
I think the issue with this is we asked for something

Ejaaz:
completely different it created a dashboard um but

Ejaaz:
we asked it for it to be based around the limitless podcast and

Ejaaz:
it created a travel planning board so i don't know

Ejaaz:
whether that was a a prompt issue or whether we just fed

Ejaaz:
it the wrong image but but here we go here is where we're

Ejaaz:
at um now let's take a look at what openai did okay so here we have the same

Ejaaz:
prompt fed into gpt 5.5 and it's funny i can instantly tell this is GPT-515

Ejaaz:
because it's cleaner and it's not neon and it's not trying to go for some futuristic spin.

Ejaaz:
It looks very simplistic. This is actually a website or app that I would probably

Ejaaz:
be more inclined to engage with.

Ejaaz:
It's also more visually perceptive to me, right?

Ejaaz:
Like, what do I have at the front here? It's this five-day trip that I want to go on.

Ejaaz:
It's giving me the basic information that I need to know at the start.

Ejaaz:
It has a bunch of different tabs as well.

Ejaaz:
But again, it isn't what I specified on the napkin. So I think this might be

Ejaaz:
a skill to show on our side, Josh. But otherwise, like, look at these graphics.

Ejaaz:
They're like really good. One thing I've noticed is stylistically,

Ejaaz:
although both models create very different looking things, the animation style looks the same.

Ejaaz:
Have you noticed that even with the game previously that we just demoed,

Ejaaz:
the avatar looked the same.

Ejaaz:
It was given the same sort of title and the objects interacted in the same way

Ejaaz:
we're seeing this here so maybe it's just a change in quality i actually prefer gpt 5.5 on this one

Josh:
Yeah this is crazy i'm just going to suspect

Josh:
there was a prompt issue there where yes like we clearly we asked for something

Josh:
that we didn't actually want but here it is i think if you're just comparing

Josh:
them apples to apples uh chat gpt and codex is like no-brainer 10 times better

Josh:
i far prefer this if you look at the original napkin photo this is much more

Josh:
accurate to what the design looked like on that original piece of paper.

Josh:
And then if you also just compare the general design, this is far easier to understand.

Josh:
It's just a lot less dense. It's designed better. I wouldn't even say this is really...

Josh:
A fair comparison it seems like codex just like completely crushed this and

Josh:
it has all the functionality built in it looks good i am giving another win

Josh:
to codex here that's two for two.

Ejaaz:
Wow look i've got like a re-optimization uh toggle at the top and it actually

Ejaaz:
updated i wonder where it's pulling that data from it's already hooked

Josh:
Into data look at that yeah impressive stuff.

Ejaaz:
Very very cool now one major reason why both of these models have advanced so

Ejaaz:
rapidly over the last couple of months is something known as the ai model harness

Ejaaz:
now you have the AI model, which is something that you and I have interacted with quite a lot.

Ejaaz:
It's via ChatGPT or Claude itself.

Ejaaz:
But there's an added layer that you can put on top of this model,

Ejaaz:
which comes in the form of prescripted prompts that are engineered to make the

Ejaaz:
model act in a particular way.

Ejaaz:
But it's also the environment that the model works in.

Ejaaz:
It's also the policies that you set to make sure that the model acts and behaves

Ejaaz:
and sounds in a particular way.

Ejaaz:
That's why we talked about Claude's

Ejaaz:
personality earlier being better than ChatGPT. It all plays into the

Ejaaz:
We figured out was it's an entirely new product category on its own.

Ejaaz:
In fact, Cursor had some news over the last couple of days where they made their

Ejaaz:
harness, Cursor SDK, available via API.

Ejaaz:
And the reason why this is such a big deal is critics criticized Cursor for

Ejaaz:
being an AI wrapper, which meant that Cursor doesn't have a model of its own.

Ejaaz:
It would just create this harness, a set of prompts environments around, say, Claude or ChatGPT.

Ejaaz:
And so people would say, cursor isn't actually special. Turns out the wrapper

Ejaaz:
or the harness actually made these models way more intelligent.

Ejaaz:
In fact, if you added cursor's harness on top of GPT 5.5 and Claude Opus 4.7

Ejaaz:
right now, you end up with a smarter, more intelligent, more efficient model

Ejaaz:
than the actual base models themselves.

Ejaaz:
Now, remember, AI Labs spent hundreds of millions of dollars to train these

Ejaaz:
models and to create the best thing and put their best foot forward.

Ejaaz:
And still you have a startup which is worth, what is it now,

Ejaaz:
$10 billion right now, potentially being acquired by XAI for $60 billion,

Ejaaz:
creating a better model on top.

Ejaaz:
So the harness and the AI model are arguably one and the same at this point.

Ejaaz:
And it's just a valuable moat to point out that these models aren't just better

Ejaaz:
at coding because of the base model itself. It's because of this thing known as a harness.

Josh:
Yeah. And the harness is the difference maker when it comes to building this super app.

Josh:
It's like every single company is trying to build the super

Josh:
app the all-in-one application that kind of serves

Josh:
as your operating system anytime you need to engage with ai

Josh:
this is the place that you could do it and it's all encompassing it's

Josh:
all in one now one of the best applications we've seen for this in the early

Josh:
days has been something like open claw where it's this extension of what an

Josh:
operating system could look like starting with ai at the foundation and open

Josh:
claw did a really amazing job of that now in some news this week you can now

Josh:
use your chat gpt account to generate tokens with OpenClaw.

Josh:
So previously you had to use the API, whether you were using Anthropic or OpenAI

Josh:
or any of the other models, and it was pretty expensive. It costs a lot of money.

Josh:
Now, thanks to Sam Altman this week announcing, you can actually use your account

Josh:
connected with it. And I think this is the beginning of a multi-step plan.

Josh:
To really integrate OpenClaw directly into Codex in a way that Anthropik can't.

Josh:
Because if you'll remember, OpenAI owns OpenClaw.

Josh:
They bought Peter and granted OpenClaw will stay open source forever,

Josh:
but they have the ability to actually integrate directly into their products.

Josh:
And I suspect that's what we're going to see.

Josh:
In fact, we even got some confirmation from another post from one of the Codex

Josh:
developers who replied to a post that was saying, Codex only needs a native

Josh:
editor, an iOS app, a full browser, and OpenClaw.

Josh:
And the developer, Tebow, said all of this and more

Josh:
is coming to which sam altman retweeted it so we are

Josh:
indeed getting open claw inside of codex we're getting a

Josh:
mobile ios apps that you can access it remotely and soon

Josh:
there's going to be no reason to really use a different app

Josh:
because it's going to be all-encompassing now are there still downfalls yes

Josh:
computer use 20 faster on codex but yesterday i was playing around with it i

Josh:
told it to increase the volume of my music and it took 10 minutes to do it because

Josh:
it tried to increase the slider on spotify even though it was max without actually

Josh:
increasing my system audio so it's still a little dumb, but it is getting better.

Josh:
And I think this leads me to this post that I really love, the vanilla maxing

Josh:
post we have to talk about.

Josh:
Which starts by saying, you should 100% be vanilla maxing. Just use the tools

Josh:
as they're handed to you. That's it.

Josh:
Because a lot of people, and I've found this personally, and in fact,

Josh:
I've been caught by this personally, is that you try to get caught up in using

Josh:
all these different repos and these skills and these plugins,

Josh:
when the reality is, is if you just wait, the AI labs are shipping fast enough,

Josh:
they'll just integrate it into your own native application.

Josh:
So I'm vanilla maxing you, Jess.

Ejaaz:
I'm totally vanilla maxing as well, dude. Like, listen, OpenClaw,

Ejaaz:
when it was hyped up, was incredibly impressive and still is incredibly impressive.

Ejaaz:
It opened up an entirely new product market and segment. That's why OpenAI acquired them.

Ejaaz:
But something's majorly changed over the last couple of months,

Ejaaz:
which is OpenClaw has kind of fallen off. No one talks about it anymore.

Ejaaz:
People who are complaining about the errors and bugs that we're facing have

Ejaaz:
kind of gone silent because they've just grown bored and they don't want to

Ejaaz:
put their energy and effort into it.

Ejaaz:
And the reason why is because although these tools are very frontier level,

Ejaaz:
they can't actually be scaled to a practical use.

Ejaaz:
You don't feel safe integrating OpenClaw into your desktop where you have personal

Ejaaz:
files. I've seen horror stories where they access credit card data and expose

Ejaaz:
that or where they deleted old wedding photos and the wife was super angry,

Ejaaz:
a bunch of the stuff like that.

Ejaaz:
If you are able to get given or access to a tool that comes under a branded

Ejaaz:
reputation, such as ChatGPT, Codex, or Claude Cowork, where it kind of like

Ejaaz:
takes over your computer, but in a sandboxed environment.

Ejaaz:
I know that NVIDIA also released NemoClaw, which is like the enterprise-grade

Ejaaz:
secure version of OpenClaw.

Ejaaz:
You're vanilla maxing. That is the way to do it. And there's no need to rush

Ejaaz:
ahead and lose all your data as a consequence. So that's basically it for the episode.

Ejaaz:
We wanted to give you a comprehensive guide and insight into Codex GPT 5.5 versus Claude Opus 4.7.

Ejaaz:
There's a lot of numbers in there, but basically the best coding models from

Ejaaz:
both sides to see which is better.

Ejaaz:
And the truth is, there isn't a clear winner right now. I would say it's probably

Ejaaz:
Codex GPT 5.5, but the narrative switched so recently that maybe,

Ejaaz:
maybe Claude can still catch up. And the only reason why I say that,

Ejaaz:
Josh, is that's the only reason

Ejaaz:
there's a model that we haven't discussed or demonstrated yet because we can't.

Ejaaz:
It's called Claude Mythos.

Ejaaz:
It was kind of pseudo-released about a few weeks ago.

Ejaaz:
And on all benchmarks, it is technically better than 5.5.

Ejaaz:
But the reason why we can't demo it is we can't get access to it.

Ejaaz:
And the reason cited by Anthropic was because it's too dangerous.

Ejaaz:
It's a cybersecurity risk. In fact, it wasn't just Anthropic saying it.

Ejaaz:
It was Peter Heskett of the US Department of War also saying this,

Ejaaz:
right? So there's concerns around that.

Ejaaz:
OpenAI has created a mythos level type model here, but has made it available

Ejaaz:
to everyone. And so the argument could be made that it's just because Anthropic

Ejaaz:
doesn't have enough compute.

Ejaaz:
So there's a lot of rumors around this, but I'm excited to get my hands on the

Ejaaz:
best models from each of these and compare them directly.

Josh:
Yeah. And the compute's actually been degrading. So I think I want to wrap this

Josh:
up on like, what do you actually currently use?

Josh:
What is the limitless production stack? How are we using these AI models?

Josh:
And for me, at least, it's not even close. I'm codex-pilled.

Josh:
I'm fully switched over. I am codex superior domination. It's going to be the month of codex.

Josh:
Maybe Anthropic will have a comeback, but that's not happening until at least

Josh:
June, July, because this month is codex month.

Josh:
So I've been using codex for basically everything, all of the difficult tasks that I need.

Josh:
What I have found is that GPT 5.5 as an LLM, as a language model,

Josh:
as a chatbot is a little bit inferior to Opus 4.7, which I believe to be the

Josh:
better model if you're just chatting with an AI.

Josh:
I like its personality it's warmer it's more precise it normally

Josh:
gets the idea of what i want so if i am building a complex

Josh:
project opus 4.7 is the orchestrator and

Josh:
codex is the actual implementer the executor of this code of this plan i've

Josh:
also noticed that opus 4.7 is a bit inferior to 4.6 at a few things and i think

Josh:
this is another piece of alpha here i actually use opus 4.6 whenever i'm doing

Josh:
anything relating to writing or word ingestion so one of the projects i've been doing recently.

Josh:
Is andre carpathy he created this like wiki for

Josh:
your own person where it ingests files and it kind of writes

Josh:
these summaries for you and it creates a personal knowledge wiki i use opus

Josh:
4.6 exclusively for that because opus 4.7 i think is far inferior at summarizing

Josh:
and kind of rewriting these topics that i use in my obsidian so that's kind

Josh:
of my stack i use opus for llms codex for everything else it just what are you

Josh:
currently optimizing for what.

Ejaaz:
Are you planning so it's two things when i have

Ejaaz:
a uh my stack is actually way more diverse when

Ejaaz:
it comes to just like the research side of things only because i'm

Ejaaz:
using the ai that's like available readily wherever i am

Ejaaz:
right so if i'm on x a lot and i see breaking news i'm

Ejaaz:
just tapping grok because honestly it's a recent model i think it's like what

Ejaaz:
was it 4.3 at this point uh is actually pretty good and they have multiple agents

Ejaaz:
that are kind of like running at this right but for the core bulk of the work

Ejaaz:
i've started shifting towards gpt 5.5 for the research because 5.5 research

Ejaaz:
things for so much longer. And it has a much more in-depth discussion.

Ejaaz:
In fact, I tested it out today because I was curious about the AI power stack

Ejaaz:
and what stocks I should be investing in to get exposure to the power grid lines

Ejaaz:
that are currently constraining AI data centers, right?

Ejaaz:
And I was like, all right, I gave a detailed prompt to both Claude Opus 4.7

Ejaaz:
and 5.5 and 5.5 completely cooked 4.7.

Ejaaz:
And it gave good reasoning why, whereas 4.7 did not.

Ejaaz:
I had to ask it more question. So all in all, I think 5.5 is my preference right

Ejaaz:
now. I still use 4.7 because of the personality.

Ejaaz:
It's like less of an AI type of voice versus GPT 5.5.

Ejaaz:
But again, I feel like OpenAI is on a generational run right now,

Ejaaz:
and they might just kind of fix this in the next couple of hours at this point.

Josh:
Yeah, it's coming. It's coming quick. And I think now is a good time to kind

Josh:
of get familiar with Codex to understand the way it works.

Josh:
And as they implement these features, you'll be able to adopt them within the hour, within the day.

Josh:
It's pretty amazing. And it's been fun to just experiment. It's been fun to try something new.

Josh:
And it's, again, competition is just better for everyone. So the end winner

Josh:
of this is the user, because for as low as $20 a month, you get access to all

Josh:
this frontier intelligence, all these capabilities.

Josh:
And it's just, it's really been unbelievable to watch. So that is the comparison, Codex versus Opus.

Josh:
If you have not tried both of them, I encourage you to give it a try.

Josh:
Test the prompts against one another. If you have any type of work that you

Josh:
need, if you're working on a computer at all, chances are you can use AI to

Josh:
help you do your job even better.

Josh:
Or you could just use it to help you do hobbies and side projects that you've

Josh:
always wanted to do. So give it a try.

Josh:
Let us know your preference, codex, cloud code. Which one is it going to be?

Josh:
I think that's probably it for the episode. Thank you guys so much for watching.

Josh:
If you enjoyed it, please don't forget to share with your friends.

Josh:
Let them know which model they picked. And also don't forget to rate it five

Josh:
stars on your favorite podcast listening platform. Any final thoughts, EJS, before we go?

Ejaaz:
No, that's it. Thank you guys so much for listening and we'll see you on the next one.

More episodes

Chapters

Creators and Guests

What is Limitless: An AI Podcast?