Exploring the frontiers of Technology and AI
Josh:
A few months ago, we told you to use Claude. Now we're telling you to switch
Josh:
back because for those of you who aren't familiar, well, over Christmas break,
Josh:
there was a major vibe shift where AI coding went from this like fun tool to
Josh:
things that developers actually use when they're shipping code.
Josh:
And even if you're not a developer, the amount of use cases and applications
Josh:
that were created around that time were really strong.
Josh:
And since then, Anthropic has gone on this generational run of shipping these
Josh:
incredible products seemingly every single day that has turned Claude code into
Josh:
this supercharged super app that is the place that EJS,
Josh:
I know you've gone to, I've gone there too, in order to get all of our AI progress done.
Josh:
Any work that we have, we've gone to Cloud Code.
Josh:
Now, OpenAI has woken up. And over the last few weeks, Codex has shipped more
Josh:
features than most companies ship in a year.
Josh:
And I bet, I guarantee that you haven't heard of some of these features that
Josh:
we're going to talk about in this episode.
Josh:
The pendulum has fully swung back, or at least I believe, because I'm totally Codex-pilled.
Josh:
And in this episode, we're going to kind of walk through the differences between
Josh:
these two and why the model that you're using today probably won't be the model you're using tomorrow.
Josh:
And I don't think we're going to convince you, but maybe we could show you why
Josh:
you might want to consider using something else here.
Ejaaz:
I just want to talk through some of the crazy stats here because the script
Ejaaz:
has genuinely flipped. A few months ago, Claude Code was anything everyone could talk about.
Ejaaz:
And every software engineer was using Claude Code. Every enterprise was installing it. It was crazy.
Ejaaz:
But just over the last couple of weeks, specifically by the end of April,
Ejaaz:
Chat GPT 5.5 was released, and that was plugged into the coding AI model.
Ejaaz:
It's all one and the same.
Ejaaz:
And OpenAI went on this code red run where they focus on nothing but building
Ejaaz:
the best coding AI model and the best LLM.
Ejaaz:
And the numbers show that it's worked.
Ejaaz:
Over the last week, Codex has been downloaded over or installed over 46 million times.
Ejaaz:
Cold code, under 500,000 times. Now, that is crazy to say, because if you look
Ejaaz:
at the historical data, cold code downloads and installs has absolutely dwarfed
Ejaaz:
Codex, but something changed over the last couple of weeks.
Ejaaz:
That something was OpenAI putting out just a better model. You mentioned that
Ejaaz:
you were Codex pill, Josh.
Ejaaz:
I think so am I. I've spent the last couple of days playing around with Codex.
Ejaaz:
This morning, we prepped a bunch of really cool demos, and it is just completely flipped script.
Ejaaz:
But it's one thing saying it. It's another thing actually showing the direct
Ejaaz:
comparison. So we created this visual artifact to kind of
Ejaaz:
give you the scoreboard. And you can see it at the top here.
Ejaaz:
It's OpenAI Codex at 11 and Anthropic Claude at two. But let me explain why.
Ejaaz:
Okay. So number one, computer use.
Ejaaz:
Codex and Claude code can use your computer.
Ejaaz:
It can take over your desktop and it can like move your cursor around.
Ejaaz:
Now, Claude pioneered it. There were the first ones there, but it was super slow.
Ejaaz:
It kind of runs into a bunch of obstacles and you have to kind of like handhold
Ejaaz:
it and improve it to do a bunch of different things.
Ejaaz:
Codex is not only quicker than me, it's quicker than the average person.
Ejaaz:
In fact, I can actually see the cursor move around so quickly.
Ejaaz:
And it's like using a computer, but it's a superhuman and it can run pretty much 24-7 at this point.
Ejaaz:
Long horizon autonomy. Codex can work for longer in a much more intelligent
Ejaaz:
manner versus Claude code, which is, again, crazy to say because literally a
Ejaaz:
month ago, it was the inverse of this.
Ejaaz:
Claude right now can run for a decent number of times or amount of time,
Ejaaz:
but not as long as Codex can.
Ejaaz:
And then the last two that I want to talk about here is browser use.
Ejaaz:
So Codex can take over your browser. It can do a lot more intentional things.
Ejaaz:
It understands what it's looking at, very importantly. Previously, it could not do that.
Ejaaz:
Claude can do the same, but not as intelligently. And then finally,
Ejaaz:
ChatGPT Images 2.0 got released, what was it, like two weeks ago now? Oh, it's so good.
Ejaaz:
Yeah, it's the image generation model from OpenAI, and it is absolutely astounding.
Ejaaz:
In fact, it beat all the other predecessors, including Google's,
Ejaaz:
what is it, Nano Banana 2.0 Pro, which previously held the lead.
Ejaaz:
It beat it across every single benchmark.
Ejaaz:
Anthropic, on the other hand, doesn't even have an image gen model.
Ejaaz:
So, so far, it's crushing. Yeah.
Josh:
Yeah, I think a lot of the best is now bundled into codex. The image gen for
Josh:
anyone who uses any sort of visual work is unbelievable. And being able to use
Josh:
that directly in your software is awesome.
Josh:
One thing that you mentioned is the long horizon autonomy. I think that needs
Josh:
a double clicking on because it's really impressive how well it works.
Josh:
Traditionally, there's been this thing called a Ralph loop that we use.
Josh:
It's actually named after the character from The Simpsons who is very persistent.
Josh:
And it's basically a planning mode where you give the AI a goal and it will
Josh:
continue to iterate towards that goal until it accomplishes it.
Josh:
So like, let's say you want to build a Lego car or something and you give it the exact parameters.
Josh:
It will go and go and go until it solves that problem and gives you exactly
Josh:
what you want in a way that other AI models haven't.
Josh:
Codex did that. And this is the only native implementation that you can get
Josh:
of this long horizon thinking where it actually will go for days on end.
Josh:
I've seen screenshots of some thinking for as long as 36 hours to accomplish the goal.
Josh:
So if you have really difficult tasks, Codex is going to be really good at solving those.
Josh:
Now, continuing to scroll down, there was another feature that was just released
Josh:
this week called auto review.
Josh:
And a huge pain in the ass for people who are creating code for working on complex
Josh:
projects, whatever it may be, is you're constantly having to sit there and approve
Josh:
things because the permission system is a little finicky, right?
Josh:
You don't want to give it full access to your computer.
Josh:
You also don't want to sit there and approving every time it wants to use Chrome
Josh:
or every time I want to access your file.
Josh:
So Codex created auto review and they rolled it out last week where the agent is kind of smart.
Josh:
It knows which things are going to possibly be systemic existential threats
Josh:
and which approvals aren't.
Josh:
And it will just automatically approve all the things that aren't going to get you in a lot of trouble.
Josh:
It creates a much easier user interface where you can just kind of walk away
Josh:
from the computer for a little while and come back and things get done.
Josh:
Memory and context is pretty strong i'd say
Josh:
the one thing and we haven't mentioned many claude winners the
Josh:
place where claude wins currently is on their open claw capability funny
Josh:
enough because open ai bought open claw but dispatch is the mobile app feature
Josh:
for claude in which you can actually engage with claude code remotely that doesn't
Josh:
currently exist on codex and while the team has promised to ship that you don't
Josh:
actually have that currently today claude has that also in terms of the personality
Josh:
and ui claude is just so much better i think we're going to get into our.
Josh:
Personal takes but whenever you're using an llm versus an actual
Josh:
tool set or a harness claude is pretty great and the
Josh:
ui is very warm so there's there's some kind of
Josh:
instances in which claude is better but for the most part codex is
Josh:
really just kind of crushing it and i've really enjoyed using it one of the
Josh:
fun things is pets i mean just recently they released pets and claude also released
Josh:
pets but these pets are a little bit different this is an example of angry dario
Josh:
we're seeing on the screen and it's fun because you have this persistent character
Josh:
that exists throughout your computer use.
Josh:
And as you're engaging with Codex, it'll just kind of chat with you in the background
Josh:
so you can see your progress, see where you're at.
Josh:
It's fun, it's playful, and it just shows that they kind of care about the user experience.
Josh:
Now, one feature I would guarantee most people don't know is Chronicle, EJS.
Josh:
And you were just telling me about Chronicle and how cool it is,
Josh:
how it kind of monitors your screen as you go. This seems like novel technology
Josh:
that we haven't seen yet.
Ejaaz:
Yeah, so one of the earliest episodes that we did here on Limitless was an interview
Ejaaz:
with the folks at OpenAI that created...
Ejaaz:
Something called, what was it called, Josh? Do you remember?
Ejaaz:
It was like agent mode or personal mode, something like that.
Josh:
Yes. It thought overnight for you, right? Yes.
Ejaaz:
It basically took all the conversations that you'd had with ChatGPT the night
Ejaaz:
before or the day before or the week before, and it created important context
Ejaaz:
around you in the form of something called memories.
Ejaaz:
This is where AI memory was birthed from OpenAI themselves, from the OpenAI
Ejaaz:
team. And what it would do is it would feed you a report in the morning that
Ejaaz:
would update you on information that it thought you would be interested to read about.
Ejaaz:
So say, for example, you were interested in the stock market,
Ejaaz:
it'll give you an update on a bunch of advancements that had happened overnight
Ejaaz:
or over the last week or whatever it might be.
Ejaaz:
Right now, fast forward today, memory is embedded across every single AI model
Ejaaz:
and tool. The reason why is context is so important.
Ejaaz:
It's one thing a user asking for something explicitly and directly.
Ejaaz:
It's a complete other thing for an AI to actually understand what you mean,
Ejaaz:
the nuance in the sentence that you've created, and even better,
Ejaaz:
to predict what you want.
Ejaaz:
But there was still an obstacle, which was you needed to feed it the context
Ejaaz:
and say, hey, Claude, hey, ChatGPT, can you remember this?
Ejaaz:
OpenAI recently released a feature called Chronicle, where it observes what you scroll through,
Ejaaz:
What you click on, what you type, and it builds its own context and memories
Ejaaz:
around you without you needing to feed it, which actually led to a really cool
Ejaaz:
prompt that you pointed out, Josh, or that you found, which was,
Ejaaz:
what have I been doing very inefficiently on my computer, according to Chronicle,
Ejaaz:
which is this new memory feature, make some recommendations,
Ejaaz:
be direct, tell me what I need to hear. That's, that's pretty awesome.
Josh:
Yeah. So this is alpha because I don't think a lot of people recognize that
Josh:
this is a possibility because Codex and OpenAI didn't do a good job of explaining this.
Josh:
When they released Chronicle, they said it's a way of the system to review your
Josh:
code as you've gone because it's been taking sequential screenshots.
Josh:
But it's the reality is, is that it's much bigger than this.
Josh:
And I suspect they didn't market it this way because it could be a bit of a privacy issue,
Josh:
but it's essentially constantly monitoring your screen and taking screenshots
Josh:
of what's happening on your screen and interpreting it so it understands your
Josh:
habits, the way that you work, the thing that you do.
Josh:
And then you can ask it, what have I been doing very inefficiently on my computer?
Josh:
According to Chronicle, make some recommendations, be direct,
Josh:
tell me what I need to hear.
Josh:
And it'll actually evaluate how you've been using your computer,
Josh:
how long you've been scrolling on Twitter, perhaps, how long you haven't been
Josh:
doing the things you're supposed to be working on, or just generally how to
Josh:
improve your workflow and give you real feedback based on your actual actions that it's seen.
Josh:
And I think this is a super powerful thing currently only available to pro members.
Josh:
So if you pay for the $100, $200 a month subscription, you get access to this.
Josh:
But I suspect this is the early signs of a very important feature they're going
Josh:
to roll out, which is that entire computer monitoring system to improve your
Josh:
system and also probably train the models to get better at engaging with your system.
Josh:
But I found Chronicle to be one of those kind of secret features that not a
Josh:
lot of people know about, but has a lot of upside if you use it to your advantage
Josh:
and let it monitor what you're doing and improve your workflow on a day-to-day basis.
Ejaaz:
So the point is, from both of these companies, Anthropic and OpenAI,
Ejaaz:
we are getting feature releases every single week. In fact, every single day.
Ejaaz:
And it's becoming, I'm being bombarded by this.
Ejaaz:
And it's hard to keep track with all of this. So what is the number one litmus
Ejaaz:
test for both of these models and products and companies?
Ejaaz:
It's to actually use the thing. It's to build the thing.
Ejaaz:
And we have two special demos that we have prepared for you that we're about
Ejaaz:
to jump into. Now, Josh, can you guess what my first demo is about?
Josh:
The theme. First one's a game. We're gamers, man. I want to play a game.
Josh:
I want to see how well it does on a game.
Josh:
I know we did this demo in the past months ago. It left a lot to be desired.
Josh:
So I'm curious to see the current up-to-date status as it relates to Cloud Code
Josh:
versus Codex. Who's winning on the one-shot game prompt?
Ejaaz:
Indeed. Okay. So I am a nostalgic kind of guy. And so I was like,
Ejaaz:
oh, back in the day, I loved Mario.
Ejaaz:
So I want you, both of these models, to create the best Mario type or inspired
Ejaaz:
game, a side scroller, but make it futuristic.
Ejaaz:
Maybe add a little bit of neon, sprinkle a bit of neon in there,
Ejaaz:
create levels. I want game design. I want there to be enemies.
Ejaaz:
I want there to be pitfalls.
Ejaaz:
And I also want there to be a scoreboard and also tell me how to do this thing.
Ejaaz:
I want, give me the whole package.
Ejaaz:
Basically, I fed this prompt or idea into ChatGPT and Claude.
Ejaaz:
And I said, can you create a detailed prompt that I can then feed into your coding models?
Ejaaz:
I then set each of the coding models to their highest settings.
Ejaaz:
So what you're about to see is the best of the best for the most
Ejaaz:
detailed prompt that they came up with and let's see what they
Ejaaz:
did so step number one or example number
Ejaaz:
one is called opus 4.7 so
Ejaaz:
this is called code at the highest setting with their latest model
Ejaaz:
um okay it took the prompt pretty literally it's titled this neon plumber moon
Ejaaz:
base run which is obviously mario inspired and it said hey this is a demo edition
Ejaaz:
by the way this is not production ready what i like about this is it's giving
Ejaaz:
me the instructions but how does the game actually play out let's see it looks
Ejaaz:
good can you see me here josh i
Josh:
Can yes i can and it looks like.
Ejaaz:
The animations are pretty good i'm jumping around i think
Ejaaz:
i'm like a little robot i can see my feet pitter-pattering now
Ejaaz:
i'm guessing this thing is about to kill me so let's see if i can jump oh i
Ejaaz:
can jump there we go that's awesome um one bit can i kill this guy oh yes i
Ejaaz:
can now one bit of feedback i've noticed is uh i can't double jump and it told
Ejaaz:
me in the menu that i could double
Ejaaz:
jump so that's weird so the physics hasn't really paid off can i die
Josh:
Oh, it certainly looks like you could die.
Ejaaz:
I can die. Great. Okay. So that is Claude's attempt at it. What's your feedback
Ejaaz:
on this, Josh? I think the graphics are pretty good.
Josh:
The graphics are great. For one shot, I mean, granted, this is only one single
Josh:
prompt. So for one prompt, it created great graphics.
Josh:
It had sound design that actually sounds pretty accurate to what you would expect in the game.
Josh:
It has similar principles. It's following gaming principles.
Josh:
You kind of understand what looks dangerous, what doesn't.
Josh:
You knew that those spikes were going to hurt you and they hurt you.
Josh:
The logic seems to be a little bit flawed i think it's having problems with gravity or at least that
Josh:
double jump functionality because it looks like those coins that you probably
Josh:
want to collect you can't actually reach because you can't do the double jump
Josh:
so in terms of logic not so hot in terms of visuals aesthetics in terms of i
Josh:
mean how good this game is from one shot very impressive yeah.
Ejaaz:
I think it's important to understand that i started from zero it literally asked
Ejaaz:
me to give it a folder to build in and the folder was completely empty.
Ejaaz:
So all the visual renderings, all the graphics, the animation style,
Ejaaz:
the scoring system, the way that the avatar moves and looks was created from
Ejaaz:
scratch from a bunch of characters from this AI model.
Ejaaz:
So this is Claude Code's current best attempt and it is way better than what
Ejaaz:
we tested out and honestly demoed on this show about a month ago.
Ejaaz:
But now let's see what OpenAI's ChatGPT 5.5 codex at the highest possible setting cooked up
Josh:
Okay and this is using the same prompt correct so you just fed the model the
Josh:
same prompt identical but identical right oh god i'm excited i hope codex did
Josh:
well because now now that i'm a fan i'm gassing it up it better perform here okay.
Ejaaz:
So this is gpt 5.5's attempt now you
Ejaaz:
might notice that this isn't the entire browser that's because
Ejaaz:
codex has a very unique feature which is not only
Ejaaz:
can it do all the coding in a single app for you but it
Ejaaz:
has an in-app browser so it can
Ejaaz:
live test the thing in the app without you needing to go to google chrome or
Ejaaz:
whatever but anyway we have the starting screen here it has also called it neo
Ejaaz:
neon plumber moon base run it looks a little more rudimentary from the start
Ejaaz:
but i do like the background animation josh we didn't get this in the previous
Ejaaz:
one or at least not this side scrolling thing well let's
Ejaaz:
Oh.
Josh:
Oh, this is nice.
Ejaaz:
This is nice. I think this has good logic.
Ejaaz:
Wait, but this is no music. There's no music. I can't double jump.
Ejaaz:
Might be a skill issue. Might be a prompt issue.
Josh:
Let's have a look. Did it say you can double jump?
Ejaaz:
That's a good question, actually.
Josh:
This is a fully playable game.
Ejaaz:
Yes. And I like that it's like zoomed in. There's like... Oh,
Ejaaz:
we got the boost. I can jump on the platforms. Let's see if I can kill this guy.
Josh:
Yes. nice okay.
Ejaaz:
And can i jump the gap there's a scoring system
Josh:
You could see your hearts oh dude this is way better power.
Ejaaz:
Up wait oh my god i want the power up i'm still gonna go back double jump
Josh:
You can you could go back go back to the last platform oh god.
Ejaaz:
I died i'm going i'm going to the last platform here
Josh:
We go it looks like they're sequentially gaining height which is interesting,
Josh:
oh but okay so if i'm comparing these two i'm actually i'm not feeling very
Josh:
let down this is good aside from the music not existing which we may not have
Josh:
explicitly asked um this it looks like the logic plays better the actual gameplay
Josh:
is usable this is a full i don't know if it's glitching or if this is you glitching no no.
Ejaaz:
That is it's glitching it's glitching a bit
Josh:
Okay so it's still there are some edge case errors yeah but
Josh:
this is different in the sense that you have your hearts clearly projected you
Josh:
have a score system that's clearly in place you're able to get these power-ups
Josh:
they work they function i mean this is a very clean and functional game so i
Josh:
would give this to codex i think the experience perhaps the design of claude
Josh:
was better and And perhaps the music,
Josh:
I mean, music was definitely better versus none, but Claude,
Josh:
in terms of just, or Codex, in terms of just coding logic and making a better
Josh:
game, I give, I give this Codex. Do you have a take?
Ejaaz:
Yeah. So on the build side of things, I had a much more pleasant experience.
Ejaaz:
Using codex as well so i think codex wins
Ejaaz:
on this um i one-shotted it in the true sense
Ejaaz:
where i just gave it a single prompt and codex didn't ask for
Ejaaz:
any permissions it just kind of went on and did the thing i saw
Ejaaz:
it it's thinking and at points where it was unsure it thought amongst itself
Ejaaz:
and then made the decision to progress forwards whereas with cloud code it would
Ejaaz:
come to me now that might just be a developer engineer's preference right like
Ejaaz:
if you're building a production ready app for like, I don't know,
Ejaaz:
a big company that you work for, you probably want to have more hands-on involvement.
Ejaaz:
Whereas if you're just building a game like we did today, where I don't really
Ejaaz:
care what it ends up looking like or what it does, then the hands-off preference
Ejaaz:
is probably something that you would use Codex for. But I think Codex wins this.
Josh:
So for our second demo, we have this handwritten piece of paper that I actually
Josh:
wrote and took a picture of.
Josh:
I didn't. It's GPT Image Gen 2.0, but it looks like it's handwritten.
Josh:
The handwriting was too nice.
Ejaaz:
Josh. That was the giveaway.
Josh:
Yeah, my handwriting is far sloppier than this. But the idea is that you can
Josh:
even write things on the back of a napkin and you could turn that into an application.
Josh:
So what we did here is we just asked for it to create a generic limitless dashboard
Josh:
application on the back of a piece of paper, fed it into the model, and this is what we got.
Josh:
So it looks like it did a pretty good job.
Josh:
I could tell this is Claude before you even tell me which model it is because
Josh:
it has the standard design principles.
Josh:
Claude design is so basic and
Josh:
it's so predictable where like okay i've seen this
Josh:
dashboard before it looks like it was a mission success there's a
Josh:
lot of text on this page a lot of stuff going on a lot
Josh:
of graphics i give a lot of credit for kind of inferring what
Josh:
we would want to be seeing from something like this where we have a
Josh:
proper trip budget i don't think we asked for a trip budget um but okay i think
Josh:
it looks like it made it did a lot of inferring right like it kind of made a
Josh:
lot of assumptions but in the end of the day it did take what we had on the
Josh:
napkin and it turned it into a pretty generic dashboard of sorts based on very
Josh:
limited information that we gave it.
Ejaaz:
I think the issue with this is we asked for something
Ejaaz:
completely different it created a dashboard um but
Ejaaz:
we asked it for it to be based around the limitless podcast and
Ejaaz:
it created a travel planning board so i don't know
Ejaaz:
whether that was a a prompt issue or whether we just fed
Ejaaz:
it the wrong image but but here we go here is where we're
Ejaaz:
at um now let's take a look at what openai did okay so here we have the same
Ejaaz:
prompt fed into gpt 5.5 and it's funny i can instantly tell this is GPT-515
Ejaaz:
because it's cleaner and it's not neon and it's not trying to go for some futuristic spin.
Ejaaz:
It looks very simplistic. This is actually a website or app that I would probably
Ejaaz:
be more inclined to engage with.
Ejaaz:
It's also more visually perceptive to me, right?
Ejaaz:
Like, what do I have at the front here? It's this five-day trip that I want to go on.
Ejaaz:
It's giving me the basic information that I need to know at the start.
Ejaaz:
It has a bunch of different tabs as well.
Ejaaz:
But again, it isn't what I specified on the napkin. So I think this might be
Ejaaz:
a skill to show on our side, Josh. But otherwise, like, look at these graphics.
Ejaaz:
They're like really good. One thing I've noticed is stylistically,
Ejaaz:
although both models create very different looking things, the animation style looks the same.
Ejaaz:
Have you noticed that even with the game previously that we just demoed,
Ejaaz:
the avatar looked the same.
Ejaaz:
It was given the same sort of title and the objects interacted in the same way
Ejaaz:
we're seeing this here so maybe it's just a change in quality i actually prefer gpt 5.5 on this one
Josh:
Yeah this is crazy i'm just going to suspect
Josh:
there was a prompt issue there where yes like we clearly we asked for something
Josh:
that we didn't actually want but here it is i think if you're just comparing
Josh:
them apples to apples uh chat gpt and codex is like no-brainer 10 times better
Josh:
i far prefer this if you look at the original napkin photo this is much more
Josh:
accurate to what the design looked like on that original piece of paper.
Josh:
And then if you also just compare the general design, this is far easier to understand.
Josh:
It's just a lot less dense. It's designed better. I wouldn't even say this is really...
Josh:
A fair comparison it seems like codex just like completely crushed this and
Josh:
it has all the functionality built in it looks good i am giving another win
Josh:
to codex here that's two for two.
Ejaaz:
Wow look i've got like a re-optimization uh toggle at the top and it actually
Ejaaz:
updated i wonder where it's pulling that data from it's already hooked
Josh:
Into data look at that yeah impressive stuff.
Ejaaz:
Very very cool now one major reason why both of these models have advanced so
Ejaaz:
rapidly over the last couple of months is something known as the ai model harness
Ejaaz:
now you have the AI model, which is something that you and I have interacted with quite a lot.
Ejaaz:
It's via ChatGPT or Claude itself.
Ejaaz:
But there's an added layer that you can put on top of this model,
Ejaaz:
which comes in the form of prescripted prompts that are engineered to make the
Ejaaz:
model act in a particular way.
Ejaaz:
But it's also the environment that the model works in.
Ejaaz:
It's also the policies that you set to make sure that the model acts and behaves
Ejaaz:
and sounds in a particular way.
Ejaaz:
That's why we talked about Claude's
Ejaaz:
personality earlier being better than ChatGPT. It all plays into the
Ejaaz:
We figured out was it's an entirely new product category on its own.
Ejaaz:
In fact, Cursor had some news over the last couple of days where they made their
Ejaaz:
harness, Cursor SDK, available via API.
Ejaaz:
And the reason why this is such a big deal is critics criticized Cursor for
Ejaaz:
being an AI wrapper, which meant that Cursor doesn't have a model of its own.
Ejaaz:
It would just create this harness, a set of prompts environments around, say, Claude or ChatGPT.
Ejaaz:
And so people would say, cursor isn't actually special. Turns out the wrapper
Ejaaz:
or the harness actually made these models way more intelligent.
Ejaaz:
In fact, if you added cursor's harness on top of GPT 5.5 and Claude Opus 4.7
Ejaaz:
right now, you end up with a smarter, more intelligent, more efficient model
Ejaaz:
than the actual base models themselves.
Ejaaz:
Now, remember, AI Labs spent hundreds of millions of dollars to train these
Ejaaz:
models and to create the best thing and put their best foot forward.
Ejaaz:
And still you have a startup which is worth, what is it now,
Ejaaz:
$10 billion right now, potentially being acquired by XAI for $60 billion,
Ejaaz:
creating a better model on top.
Ejaaz:
So the harness and the AI model are arguably one and the same at this point.
Ejaaz:
And it's just a valuable moat to point out that these models aren't just better
Ejaaz:
at coding because of the base model itself. It's because of this thing known as a harness.
Josh:
Yeah. And the harness is the difference maker when it comes to building this super app.
Josh:
It's like every single company is trying to build the super
Josh:
app the all-in-one application that kind of serves
Josh:
as your operating system anytime you need to engage with ai
Josh:
this is the place that you could do it and it's all encompassing it's
Josh:
all in one now one of the best applications we've seen for this in the early
Josh:
days has been something like open claw where it's this extension of what an
Josh:
operating system could look like starting with ai at the foundation and open
Josh:
claw did a really amazing job of that now in some news this week you can now
Josh:
use your chat gpt account to generate tokens with OpenClaw.
Josh:
So previously you had to use the API, whether you were using Anthropic or OpenAI
Josh:
or any of the other models, and it was pretty expensive. It costs a lot of money.
Josh:
Now, thanks to Sam Altman this week announcing, you can actually use your account
Josh:
connected with it. And I think this is the beginning of a multi-step plan.
Josh:
To really integrate OpenClaw directly into Codex in a way that Anthropik can't.
Josh:
Because if you'll remember, OpenAI owns OpenClaw.
Josh:
They bought Peter and granted OpenClaw will stay open source forever,
Josh:
but they have the ability to actually integrate directly into their products.
Josh:
And I suspect that's what we're going to see.
Josh:
In fact, we even got some confirmation from another post from one of the Codex
Josh:
developers who replied to a post that was saying, Codex only needs a native
Josh:
editor, an iOS app, a full browser, and OpenClaw.
Josh:
And the developer, Tebow, said all of this and more
Josh:
is coming to which sam altman retweeted it so we are
Josh:
indeed getting open claw inside of codex we're getting a
Josh:
mobile ios apps that you can access it remotely and soon
Josh:
there's going to be no reason to really use a different app
Josh:
because it's going to be all-encompassing now are there still downfalls yes
Josh:
computer use 20 faster on codex but yesterday i was playing around with it i
Josh:
told it to increase the volume of my music and it took 10 minutes to do it because
Josh:
it tried to increase the slider on spotify even though it was max without actually
Josh:
increasing my system audio so it's still a little dumb, but it is getting better.
Josh:
And I think this leads me to this post that I really love, the vanilla maxing
Josh:
post we have to talk about.
Josh:
Which starts by saying, you should 100% be vanilla maxing. Just use the tools
Josh:
as they're handed to you. That's it.
Josh:
Because a lot of people, and I've found this personally, and in fact,
Josh:
I've been caught by this personally, is that you try to get caught up in using
Josh:
all these different repos and these skills and these plugins,
Josh:
when the reality is, is if you just wait, the AI labs are shipping fast enough,
Josh:
they'll just integrate it into your own native application.
Josh:
So I'm vanilla maxing you, Jess.
Ejaaz:
I'm totally vanilla maxing as well, dude. Like, listen, OpenClaw,
Ejaaz:
when it was hyped up, was incredibly impressive and still is incredibly impressive.
Ejaaz:
It opened up an entirely new product market and segment. That's why OpenAI acquired them.
Ejaaz:
But something's majorly changed over the last couple of months,
Ejaaz:
which is OpenClaw has kind of fallen off. No one talks about it anymore.
Ejaaz:
People who are complaining about the errors and bugs that we're facing have
Ejaaz:
kind of gone silent because they've just grown bored and they don't want to
Ejaaz:
put their energy and effort into it.
Ejaaz:
And the reason why is because although these tools are very frontier level,
Ejaaz:
they can't actually be scaled to a practical use.
Ejaaz:
You don't feel safe integrating OpenClaw into your desktop where you have personal
Ejaaz:
files. I've seen horror stories where they access credit card data and expose
Ejaaz:
that or where they deleted old wedding photos and the wife was super angry,
Ejaaz:
a bunch of the stuff like that.
Ejaaz:
If you are able to get given or access to a tool that comes under a branded
Ejaaz:
reputation, such as ChatGPT, Codex, or Claude Cowork, where it kind of like
Ejaaz:
takes over your computer, but in a sandboxed environment.
Ejaaz:
I know that NVIDIA also released NemoClaw, which is like the enterprise-grade
Ejaaz:
secure version of OpenClaw.
Ejaaz:
You're vanilla maxing. That is the way to do it. And there's no need to rush
Ejaaz:
ahead and lose all your data as a consequence. So that's basically it for the episode.
Ejaaz:
We wanted to give you a comprehensive guide and insight into Codex GPT 5.5 versus Claude Opus 4.7.
Ejaaz:
There's a lot of numbers in there, but basically the best coding models from
Ejaaz:
both sides to see which is better.
Ejaaz:
And the truth is, there isn't a clear winner right now. I would say it's probably
Ejaaz:
Codex GPT 5.5, but the narrative switched so recently that maybe,
Ejaaz:
maybe Claude can still catch up. And the only reason why I say that,
Ejaaz:
Josh, is that's the only reason
Ejaaz:
there's a model that we haven't discussed or demonstrated yet because we can't.
Ejaaz:
It's called Claude Mythos.
Ejaaz:
It was kind of pseudo-released about a few weeks ago.
Ejaaz:
And on all benchmarks, it is technically better than 5.5.
Ejaaz:
But the reason why we can't demo it is we can't get access to it.
Ejaaz:
And the reason cited by Anthropic was because it's too dangerous.
Ejaaz:
It's a cybersecurity risk. In fact, it wasn't just Anthropic saying it.
Ejaaz:
It was Peter Heskett of the US Department of War also saying this,
Ejaaz:
right? So there's concerns around that.
Ejaaz:
OpenAI has created a mythos level type model here, but has made it available
Ejaaz:
to everyone. And so the argument could be made that it's just because Anthropic
Ejaaz:
doesn't have enough compute.
Ejaaz:
So there's a lot of rumors around this, but I'm excited to get my hands on the
Ejaaz:
best models from each of these and compare them directly.
Josh:
Yeah. And the compute's actually been degrading. So I think I want to wrap this
Josh:
up on like, what do you actually currently use?
Josh:
What is the limitless production stack? How are we using these AI models?
Josh:
And for me, at least, it's not even close. I'm codex-pilled.
Josh:
I'm fully switched over. I am codex superior domination. It's going to be the month of codex.
Josh:
Maybe Anthropic will have a comeback, but that's not happening until at least
Josh:
June, July, because this month is codex month.
Josh:
So I've been using codex for basically everything, all of the difficult tasks that I need.
Josh:
What I have found is that GPT 5.5 as an LLM, as a language model,
Josh:
as a chatbot is a little bit inferior to Opus 4.7, which I believe to be the
Josh:
better model if you're just chatting with an AI.
Josh:
I like its personality it's warmer it's more precise it normally
Josh:
gets the idea of what i want so if i am building a complex
Josh:
project opus 4.7 is the orchestrator and
Josh:
codex is the actual implementer the executor of this code of this plan i've
Josh:
also noticed that opus 4.7 is a bit inferior to 4.6 at a few things and i think
Josh:
this is another piece of alpha here i actually use opus 4.6 whenever i'm doing
Josh:
anything relating to writing or word ingestion so one of the projects i've been doing recently.
Josh:
Is andre carpathy he created this like wiki for
Josh:
your own person where it ingests files and it kind of writes
Josh:
these summaries for you and it creates a personal knowledge wiki i use opus
Josh:
4.6 exclusively for that because opus 4.7 i think is far inferior at summarizing
Josh:
and kind of rewriting these topics that i use in my obsidian so that's kind
Josh:
of my stack i use opus for llms codex for everything else it just what are you
Josh:
currently optimizing for what.
Ejaaz:
Are you planning so it's two things when i have
Ejaaz:
a uh my stack is actually way more diverse when
Ejaaz:
it comes to just like the research side of things only because i'm
Ejaaz:
using the ai that's like available readily wherever i am
Ejaaz:
right so if i'm on x a lot and i see breaking news i'm
Ejaaz:
just tapping grok because honestly it's a recent model i think it's like what
Ejaaz:
was it 4.3 at this point uh is actually pretty good and they have multiple agents
Ejaaz:
that are kind of like running at this right but for the core bulk of the work
Ejaaz:
i've started shifting towards gpt 5.5 for the research because 5.5 research
Ejaaz:
things for so much longer. And it has a much more in-depth discussion.
Ejaaz:
In fact, I tested it out today because I was curious about the AI power stack
Ejaaz:
and what stocks I should be investing in to get exposure to the power grid lines
Ejaaz:
that are currently constraining AI data centers, right?
Ejaaz:
And I was like, all right, I gave a detailed prompt to both Claude Opus 4.7
Ejaaz:
and 5.5 and 5.5 completely cooked 4.7.
Ejaaz:
And it gave good reasoning why, whereas 4.7 did not.
Ejaaz:
I had to ask it more question. So all in all, I think 5.5 is my preference right
Ejaaz:
now. I still use 4.7 because of the personality.
Ejaaz:
It's like less of an AI type of voice versus GPT 5.5.
Ejaaz:
But again, I feel like OpenAI is on a generational run right now,
Ejaaz:
and they might just kind of fix this in the next couple of hours at this point.
Josh:
Yeah, it's coming. It's coming quick. And I think now is a good time to kind
Josh:
of get familiar with Codex to understand the way it works.
Josh:
And as they implement these features, you'll be able to adopt them within the hour, within the day.
Josh:
It's pretty amazing. And it's been fun to just experiment. It's been fun to try something new.
Josh:
And it's, again, competition is just better for everyone. So the end winner
Josh:
of this is the user, because for as low as $20 a month, you get access to all
Josh:
this frontier intelligence, all these capabilities.
Josh:
And it's just, it's really been unbelievable to watch. So that is the comparison, Codex versus Opus.
Josh:
If you have not tried both of them, I encourage you to give it a try.
Josh:
Test the prompts against one another. If you have any type of work that you
Josh:
need, if you're working on a computer at all, chances are you can use AI to
Josh:
help you do your job even better.
Josh:
Or you could just use it to help you do hobbies and side projects that you've
Josh:
always wanted to do. So give it a try.
Josh:
Let us know your preference, codex, cloud code. Which one is it going to be?
Josh:
I think that's probably it for the episode. Thank you guys so much for watching.
Josh:
If you enjoyed it, please don't forget to share with your friends.
Josh:
Let them know which model they picked. And also don't forget to rate it five
Josh:
stars on your favorite podcast listening platform. Any final thoughts, EJS, before we go?
Ejaaz:
No, that's it. Thank you guys so much for listening and we'll see you on the next one.