Exploring the frontiers of Technology and AI
Ejaaz:
The most powerful model in the world is here right now. In fact,
Ejaaz:
it's so good that it beats Claude mythos.
Ejaaz:
OpenAI just released ChatGPT 5.5 and it crushes Claude on every single benchmark.
Ejaaz:
It's the new number one coding model. It can do 20 hour tasks that expert software
Ejaaz:
engineers sometimes can't do.
Ejaaz:
It's already discovered groundbreaking solutions in maths and frontier sciences such as genetics.
Ejaaz:
And it's cheaper than GPT 5.4. This is the result of two years worth of frontier
Ejaaz:
research released in this one single model.
Ejaaz:
In fact, it's so good that an NVIDIA engineer said, and I quote,
Ejaaz:
losing access to GPT 5.5 feels like I've had a limb amputated.
Josh:
I think a lot of people are going to compare this to Opus 4.7,
Josh:
and that's fair, but I really think the true comparison is to Mythos because
Josh:
Sam Elman recently, he just posted something as the model was coming out that
Josh:
felt very much like a jab at Mythos.
Josh:
And we're going to get into the benchmarks comparing them, many of which will
Josh:
actually beat the Claude model. But what I find most interesting about this
Josh:
post is the second paragraph where he says, we believe in democratization.
Josh:
And he mentioned specifically, we have been tracking cybersecurity as a preparedness
Josh:
category for a long time and have built mitigations we believe in that enable
Josh:
us to make capable models broadly available.
Josh:
So this is very much a dig at Mythos, which is, as we all know,
Josh:
privately available, only gated to the companies that are given allowance to it.
Josh:
ChatGPT and OpenAI are like, hey, we're going to give you the powerful cybersecurity.
Josh:
We're just going to bake in the precautions into the model so that everyone could have it.
Josh:
And it ends by saying it's this really sweet thing. It's like we love you and
Josh:
we want you to win. We believe in everyone having access to this intelligence.
Josh:
And I really respect that. And I think it's an awesome way to set the precedence
Josh:
for what the next generation of these models is going to look like.
Josh:
But before we go any further, let's talk about the model itself. It's out right now.
Josh:
If you have a chat GPT membership, you can go and use it, go and play with it.
Josh:
EJS, what's the TLDR? What are the high-level things that everyone should know?
Josh:
What's most new and noteworthy about GPT 5.5?
Ejaaz:
Okay, so inspired by your mythos comparison, the first question that pops into
Ejaaz:
my head is I use Claude Opus 4.7 every single day. So I'm like,
Ejaaz:
is it better than this? Like, should I be switching back to ChatGPT right now?
Ejaaz:
The answer might be yes. So if we look at the benchmark score right here,
Ejaaz:
GPT 5.5 on the left over here absolutely crushes all the standard benchmarks
Ejaaz:
that these frontier models are weighted against.
Ejaaz:
And if you look on the right over here, Claude Opus 4.7, it either doesn't even
Ejaaz:
measure in a particular category, or it's completely beaten by GPT 5.5.
Ejaaz:
In fact, the only stat that GPT 5.5 doesn't beat Opus 4.7 in is something called
Ejaaz:
Software Engineering Benchmark Verified Pro or something like that.
Ejaaz:
It's like the pro software coding situation.
Ejaaz:
But there's a footnote at the bottom of this blog where OpenAI states,
Ejaaz:
Anthropic has publicly said that they might have gamed that particular benchmark
Ejaaz:
and they need to be re-evaluated.
Ejaaz:
So we might have a complete clean sweep for 5.5 as we see today.
Ejaaz:
So it's an incredibly powerful model.
Ejaaz:
But a question that popped to my head is, does it actually beat Mythos?
Ejaaz:
And we have a direct comparison right here.
Josh:
Yeah, so it shows that it does across some benchmarks. Now, again,
Josh:
these benchmarks are pretty fuzzy.
Josh:
We don't know which ones are gamed to do what. But there is a world in which
Josh:
GPT 5.5 will outperform Mythos on some things, which ones we're not entirely sure.
Josh:
I think as we kind of figure out ways to describe GPT 5.5, it seems as if it's
Josh:
their first attempt at making a model built for autonomy instead of answers.
Josh:
I think a lot of the benchmarks that they're working on is in agent decoding,
Josh:
things like it handles tasks that are 20 hours long. We'll get into that.
Josh:
It's doing 85% of OpenAI's internal work already.
Josh:
And it also helped rewrite the infrastructure that
Josh:
built it there was this amazing quote in the blog post it said open ai
Josh:
says 5.5 itself helped optimize the stack
Josh:
that serves it codex analyzed weeks of production traffic and
Josh:
wrote custom heuristics for load balancing that boosted token generation speed
Josh:
by over 20 so they're using the model to actually build the model and make it
Josh:
maximally efficient based on the data that it's collected from users like us
Josh:
who are interacting with the model on a daily basis so it's very smart it's
Josh:
very clever it's not just there to give you answers.
Josh:
It's there to think deeply and actually solve problems for you in a way that
Josh:
I think Mythos and a lot of these other frontier models are kind of pivoting towards now.
Ejaaz:
The great thing about this model release is it reveals a few things that OpenAI
Ejaaz:
has as an advantage against, say, a frontier lab like Anthropic.
Ejaaz:
It's clear looking at these benchmarks compared to Mythos, which,
Ejaaz:
by the way, the entire world is spiraling because of this model,
Ejaaz:
because it's going to have the cybersecurity ability to take over any kind of government system.
Ejaaz:
This model is pretty close, and Sam is going to be releasing this publicly,
Ejaaz:
or OpenAI is going to be releasing it publicly for everyone to use.
Ejaaz:
So a question that pops to my head is, does this mean that it's a matter of
Ejaaz:
compute, and OpenAI just simply has more of them?
Ejaaz:
Certainly, if you compare Sam Altman's ability to acquire compute and spend
Ejaaz:
all these trillions of dollars to acquire it versus Anthropic,
Ejaaz:
Anthropic has been extremely conservative, and now they're struggling.
Ejaaz:
They recently signed a $5 billion deal with Amazon, which we'll get to later
Ejaaz:
on. But the point is, this is a tale of two stories.
Ejaaz:
Either OpenAI has enough compute and they're about to leapfrog Claude because of that.
Ejaaz:
And they're proving that through this model that is a very good answer to Mythos.
Ejaaz:
Or, and this is the alternative side, Anthropic's Mythos model is just plainly better than 5.5.
Ejaaz:
And these benchmarks are actually verified, which is technically kind of true
Ejaaz:
because I don't know how official these things are.
Ejaaz:
These are just through tests that a small set of users have done.
Ejaaz:
So it's a game of both. I'm sure Anthropic is watching this and thinking,
Ejaaz:
hmm, maybe we should roll out Mythos, but they don't have to compute.
Josh:
Yeah, they don't have the inference. In fact, speaking of the inference,
Josh:
Sam actually made a post saying that he's really...
Josh:
Excellent work by the inference team to serve this model so efficiently he
Josh:
wanted to really highlight the fact that to a significant degree they've
Josh:
become an ai inference company now and i think that's a
Josh:
really big difference than what was previously stated like anthropic has really
Josh:
tough time serving compute and we see that and even if they had mythos available
Josh:
in a way that was safe they can't serve it open ai can and we see it reflected
Josh:
in pricing because i mean we have some pricing for this model right and it seems
Josh:
as if it's roughly at par with 4.7 if not slightly better?
Ejaaz:
Slightly. It's slightly more expensive, but not by much.
Ejaaz:
So for every million tokens input, it's both the same for Anthropic Opus 4.7 and GPT 5.5.
Ejaaz:
It's $5 in, but the output is $30 for 5.5 per million tokens and $25 per million tokens for 4.7.
Ejaaz:
So it's a little more expensive, but here's where you actually have more of
Ejaaz:
a bargain using the more expensive model 5.5. It is cheaper than GPT 5.4,
Ejaaz:
and it uses tokens way more efficiently to think.
Ejaaz:
So what does that mean if you are an enterprise that wants to plug in this AI
Ejaaz:
model and not worry about it and just have it power your entire profit engine?
Ejaaz:
Well, you end up using less tokens, so you hit your rate limits in a much slower
Ejaaz:
rate, which means that you end up getting more bang for your buck as long as
Ejaaz:
you use the model like 24-7 or you use it effectively.
Ejaaz:
Way if you are just kind of out there using 5.5 to like ask questions that you
Ejaaz:
should maybe be asking google this is probably not the model for you but otherwise super powerful one
Josh:
Yeah. And if these prices don't mean anything to you, that's fine.
Josh:
As long as you have a $20 a month subscription.
Josh:
In fact, this is going to be available to freezers fairly soon, I believe.
Josh:
But anyone who is a subscriber has access to this. You don't need to use the
Josh:
API. There's nothing fancy.
Josh:
You open up your app on your phone, you go to the web browser,
Josh:
it's there, it's available, ready to go.
Josh:
Now, there's a few interesting things that you can do with this model that haven't
Josh:
previously been possible.
Josh:
And although we don't quite have access to it just yet, we're recording this
Josh:
right as the model got launched.
Josh:
We do have a blog post from OpenAI themselves who are showcasing a few demos
Josh:
so again take these with a grain of salt these are straight from open ai but
Josh:
they are seemingly pretty impressive and pretty noteworthy as to what they're
Josh:
capable of doing starting with this space mission application which is um pretty
Josh:
cool and very reminiscent of the moon mission that we just had yeah.
Ejaaz:
Um so if you guys don't know um josh has a secret he has many secrets on this
Ejaaz:
show one is he's a massive space fan and when he's not hanging out with me he's
Ejaaz:
doing uh space simulations uh on whatever he can do right well okay maybe maybe
Ejaaz:
be part of that is a bit of a lie.
Ejaaz:
But with this new app that we're seeing in front of us right now,
Ejaaz:
this was completely vibe coded using 5.5.
Ejaaz:
And it's used to simulate a specific space mission.
Ejaaz:
Now, if this looks very similar, it's because we just had a space mission for
Ejaaz:
some we visited or went back to the moon in 53 years, pretty big deal.
Ejaaz:
And we can see a pretty accurate simulation going on right here.
Ejaaz:
So as you can see, there's various different toggles, the physics of the entire
Ejaaz:
thing is very important.
Ejaaz:
And that's another point I want to make about this model. it is being
Ejaaz:
used for frontier research not just in ai but in
Ejaaz:
mathematics in genetics like it made frontier progression on both of these fronts
Ejaaz:
and so what we're showing here is this is a model that goes way beyond just
Ejaaz:
text and telling you what could be it actually implements this into a lot of
Ejaaz:
different things and understands the world around it which is extremely powerful
Ejaaz:
but we have another one here we have a we have an earthquake tracker
Josh:
For anyone who wants to make websites, it's so good at making websites.
Josh:
And this appears to be one of the strong suits. In this case,
Josh:
there's a few things to highlight on this Earthquake tracker.
Josh:
One of them being that it's one, just like a pretty elegantly designed website.
Josh:
But two, all of the graphics are interactive. You'll notice that they update
Josh:
dynamically as you hover over them, as you click. It looks very clean.
Josh:
I assume that it is pulling up-to-date information from an API somewhere that it set up.
Josh:
It is just truly competent and capable of doing these kind of longer tail tasks
Josh:
that are a bit more complicated than a static landing page,
Josh:
but have dynamic data have the richness that
Josh:
you would expect from a high-end high-quality polished website
Josh:
except just built with an ai model from someone who doesn't
Josh:
need to know anything about coding at all and then
Josh:
for the gamers also there's another great example of a dungeon game
Josh:
which is they're describing as a playable 3d dungeon arena prototype
Josh:
built with codex and gpt models now
Josh:
i think this is something novel to this setup where codex handles
Josh:
the game architecture the combat systems the enemy encounters and then the character
Josh:
models the character textures and animations those were created with third-party
Josh:
asset generation tools using something like image gen 2.0 so this is also one
Josh:
of the earlier signs where you can actually merge a lot of these tools together
Josh:
to build something dynamic in a way that you previously couldn't have done before.
Ejaaz:
Yet actually the quality of this game looks like something out of uh league
Ejaaz:
of legends or something like that at least that's what it reminds me of like
Ejaaz:
the these games are getting way more high def than i expected i know it's just
Ejaaz:
like it's pretty basic for anyone that's watching this they can kind of like
Ejaaz:
pick with a finer eye but it's cool but for those of you who prefer like the
Ejaaz:
more traditional side of games
Ejaaz:
this might be something that you can kind of vibe code in a
Ejaaz:
couple of minutes now it may look basic but theoretically
Ejaaz:
this is like a 3D spatially aware game and that's not something you could achieve
Ejaaz:
at least very easily with previous models what I love about this as well is
Ejaaz:
it's also they've also created or included the prompt for all of these things
Ejaaz:
so this is something that you can try right now like look at this And the prompt
Ejaaz:
is no more than like, what's it?
Ejaaz:
One, two, three, four, like 12 lives. 12 lives, dude.
Ejaaz:
And you can have like a fully functioning game. You can probably then add an
Ejaaz:
extra step or extra prompt saying, hey, can you deploy this to Vercel? And-
Ejaaz:
Send that to your friends. Now you can use, you have a game.
Ejaaz:
You're a game creator. You're a game developer.
Ejaaz:
So the applications for this model cannot be understated.
Ejaaz:
I'm going to be very honest. I thought this model was going to be just an iterative
Ejaaz:
upgrade. I didn't think it would get anywhere near Claude Mythos.
Ejaaz:
Two stories have now revealed themselves, which is, one, it's the answer to Claude Mythos.
Ejaaz:
And two, it's really damn good. I am now convinced that compute is everything,
Ejaaz:
but not in the way that I thought it would be useful. I thought it would be
Ejaaz:
largely for pre-training.
Ejaaz:
But to Sam's tweet earlier on, and also in Greg Brockman, the president of OpenAI's
Ejaaz:
recent interview, they're going all in on inference, test time compute,
Ejaaz:
which just means that if you have more compute and if you have a good enough
Ejaaz:
model, it can do the thing.
Ejaaz:
This thing, like I said, built itself. It's a self-improving model. Very, very impressive.
Josh:
It's good for solving hard problems. It's good for thinking for a long time.
Josh:
In fact, they marketed it as a model that can now think for 20 hours coherently.
Josh:
Great which is almost a full day it can work
Josh:
on a problem yeah and what you're noticing from this prompt that's on screen is
Josh:
it doesn't take that much to get it going you don't need to
Josh:
kind of spoon feed it all the way through anymore it can make
Josh:
decisions on its own it can infer conclusions on what
Josh:
you want just based on the the knowledge architecture that it
Josh:
currently has it's amazingly impressive in fact one of the people who got access to
Josh:
it early just posted on x that he's posting
Josh:
live as his um prompt is seven hours
Josh:
into his task it has been running for over seven hours
Josh:
he said this has literally never happened before the models would maybe run
Josh:
for 30 minutes or so wow or or if you
Josh:
really shouted them after two to three hours but he's on seven plus hours i
Josh:
think this is going to be fun for people with complicated things if you really
Josh:
want to make a triple a feeling video game or a simulator or a really complex
Josh:
website this is the model to try out and to use it with codex and see how all
Josh:
these things kind of piece together it's really i mean
Josh:
I wasn't, I didn't have my hopes very high based on the Opus 4.7 to 4.6 incremental improvement.
Josh:
This seems like a very solid improvement over 5.4.
Ejaaz:
Absolutely. And listen, if you are listening to this and you're like, listen, I'm not a gamer.
Ejaaz:
I can't waste my time with that. I focus on more serious things.
Ejaaz:
Well, for you serious people, if you're a manager at a top company or whatever
Ejaaz:
that might be, this isn't just a toy or a model used for coders.
Ejaaz:
A lot of the examples that we just gave are around coding.
Ejaaz:
You can use this for just admin stuff or managerial work, like the capability
Ejaaz:
of this model to think more strategically and long-term and understand the context
Ejaaz:
of the tasks that you're working towards.
Ejaaz:
Like we said earlier, for coding specifically, it can work on 20-hour-long expert tasks.
Ejaaz:
That also applies for administrative stuff or things that are more generalized,
Ejaaz:
white-collar worker work.
Ejaaz:
And so in this example, Noam Brown says, I'm a manager at OpenAI,
Ejaaz:
but I'm using this model to basically manage my entire team and make sure we're
Ejaaz:
focused on the right things.
Ejaaz:
And guess what? but the output of this team and this product has been pretty amazing.
Ejaaz:
So all around really excellent work by the entire team and the inference team
Ejaaz:
specifically, as Sam Altman says here.
Ejaaz:
And yeah, I'm looking forward to using this thing. I don't have access to it right now.
Ejaaz:
I've refreshed my account probably like five times at this point and it hasn't
Ejaaz:
appeared. So maybe it's like a slow rollout.
Ejaaz:
But if you're listening to this and you've tried it out, let us know what you're
Ejaaz:
using it for. Let us know what amazes you. I really want to hear more.
Josh:
Yeah, OpenAX had a pretty incredible week. And this comes on the back of their
Josh:
new ImageGen model that they just released, which was also unbelievable.
Josh:
If you haven't seen that episode, we just recorded it yesterday.
Josh:
So I would go advise you to see because, oh my God, it is amazing.
Josh:
We also recorded an episode on Apple's new CEO this week and what that means
Josh:
for the company, as well as the hardware race and how this, I mean,
Josh:
this model, Opus, no, not Opus, this is GPT.
Josh:
GPT 5.5 is very much part of the AGI class of models that is built on Blackwell
Josh:
chips. and we've recorded an entire episode all about that.
Josh:
Very interesting, very fascinating. Also interesting and fascinating because
Josh:
as always, this is the weekly roundup. We have a few other topics to talk about.
Josh:
We have some news out of SpaceX, which is a pseudo acquisition.
Josh:
Now they haven't quite acquired Cursor being the company in question,
Josh:
but they have at least partnered with them with the option to buy Cursor for
Josh:
either $60 billion or pay 10 billion for the right to actually work together.
Josh:
This seems like a big deal.
Josh:
This seems like, I mean, XAI, we could call it SpaceX, but SpaceX AI is taking
Josh:
AI very seriously. They're currently behind.
Josh:
They clearly don't want to be behind. This is a huge step and a huge kind of
Josh:
trust of support in Cursor with this minimum of $10 billion into accelerating
Josh:
their progress and trying to get themselves into this game.
Ejaaz:
This is actually a genius deal, and there are a few stories why it makes that so. So let me explain.
Ejaaz:
If you're SpaceX AI, which by the way is a ridiculous name now,
Ejaaz:
like we'll just call them XAI, you are currently harboring...
Ejaaz:
One to 1.5 million of the frontier GPUs, mainly NVIDIA, in a warehouse. There's one issue.
Ejaaz:
You're not really utilizing all of it because XAI has had a bit of a slow start
Ejaaz:
to training their models. What's a genius idea?
Ejaaz:
If I rent those out to another company to train their own model,
Ejaaz:
then we can make money from that. Okay, so that's win number one for SpaceX.
Ejaaz:
But then they've thought of another thing which is huh grok
Ejaaz:
isn't really good at coding and we are
Ejaaz:
losing the race every single day we don't update our model
Ejaaz:
at coding because anthropic and chat gpt
Ejaaz:
5.5 is completely running away with it so
Ejaaz:
how did they leapfrog and get ahead they should acquire the company that is
Ejaaz:
using their own gpus to train a frontier coding model so then the question becomes
Ejaaz:
well who the hell is cursor what what's the mode that they have like why do
Ejaaz:
they have a good shot of training a better coding model than Anthropic and GPT-505?
Ejaaz:
Aren't those two companies way ahead? Well, the answer is not quite so.
Ejaaz:
Cursor, for the longest time, was the number one platform and tool for people
Ejaaz:
to use to do their Vibe coding. Why?
Ejaaz:
Not only did they have access to Frontier coding models from Claude and ChatGPT,
Ejaaz:
they also had something called an agent harness.
Ejaaz:
Now, you'll notice in GPT-505, it's really good at coding because of something
Ejaaz:
called agentic That is something that Cursor pretty much pioneered.
Ejaaz:
It's basically the harness, the prompts, the environment that they mold the model,
Ejaaz:
or rather that they mold around the model that makes it so good and intuitive
Ejaaz:
and remembers the context across every single project, like menial things,
Ejaaz:
like understanding your GitHub branches and working on separate flows at the same time.
Ejaaz:
A lot of the top software engineers in the world right now use tools like Curse
Ejaaz:
and Argentic Coding to be able to pull this off.
Ejaaz:
So Elon Musk thought, hmm, if I give you the GPUs to train a better coding model,
Ejaaz:
which gives you a better product, I should have the option to acquire you.
Ejaaz:
In acquiring you, I can integrate you with Grok and Grok somehow becomes the
Ejaaz:
number one coding model over the next year or so, depending on if this deal goes.
Ejaaz:
And if the deal falls through and they create a really bad model,
Ejaaz:
well, you pay me $10 billion for the service.
Ejaaz:
Well i pay you not a bad deal not a bad deal
Josh:
Yeah it seems like they're they're going to be continuing to
Josh:
work with other companies to accelerate in places that
Josh:
they're weak at currently because i mean they they're so strong at
Josh:
building out the hardware and creating these huge data centers they need
Josh:
someone who could take advantage of all those gpus hopefully this will
Josh:
help serve that cause and that's not the only spacex news
Josh:
this week the other is that they have officially filed an s1 which
Josh:
for those who are not familiar it means they're going public it's officially official
Josh:
100 they will be going public this year if there
Josh:
were any doubts please let them be relinquished here we
Josh:
have it spacex will be going published the most interesting thing from
Josh:
this was i think the share structure of
Josh:
how they're going to be organizing this for daddy elon who's going to be getting
Josh:
quite a big payday if he does well so we have on screen here just a series of
Josh:
some of the financials i mean we know starlink as a business has been doing
Josh:
unbelievable they have about 25 billion dollars in cash 92 billion assets 50
Josh:
billion liabilities that's.
Ejaaz:
Quite a lot of liabilities on this my god
Josh:
They got a lot of debt man i don't know we'll see we'll see once they finally
Josh:
publish everything i'm very excited for the first earnings report where you
Josh:
really get a true peek behind the scenes of what's going on there but it looks
Josh:
like it's going to be going public at a 1.75 trillion dollar valuation now in terms of pay structure.
Josh:
Elon is posed to get 60 million shares
Josh:
which is 11 tranches vesting in
Josh:
500 billion dollar market cap increments from 1.1 trillion to 6.6 trillion dollar
Josh:
share price oh um so for those unfamiliar with the current ceiling i think it's
Josh:
nvidia nvidia is what five trillion under five trillion close to five trillion
Josh:
it's like 4.3 yeah okay so not even close they're like 20 away from five trillion.
Josh:
SpaceX needs to be, what is that? Like 20 something percent more valuable than
Josh:
the most valuable company in the world.
Josh:
But if they do, Elon gets 60 million shares.
Josh:
Now I haven't done the math on exactly how much that is.
Josh:
But if we make some assumptions here, the total value at Vest looks like it
Josh:
could be about a quarter of a trillion dollars.
Josh:
So pretty good payday for Elon. I think the most important thing is that he's
Josh:
getting a lot of control over this.
Josh:
It seems as if he's going to have 40 something percent control of
Josh:
the company which is really ultimately what was most important to
Josh:
him as they went public so really exciting news i am hopeful that it happens
Josh:
this june which we can expect and it's without a shadow of a doubt going to
Josh:
be the largest ipo in history i think everyone's going to be talking about it
Josh:
there is a new vehicle in which some people are investing in we're actually
Josh:
going to have the founder on the show soon so keep an eye out for that one.
Josh:
And yeah, the SpaceX news is very exciting.
Ejaaz:
Now, in the world of AI hardware, many people think that NVIDIA has run away with the win.
Ejaaz:
And you could argue that with a $4.300 market cap, not many people are competing,
Ejaaz:
except that there is one company, Google.
Ejaaz:
Now, you might be thinking, Google does all my search engines and stuff.
Ejaaz:
Well, Google is the only vertically integrated Mag 7 company that is involved
Ejaaz:
or has a frontier capability at every single layer of the AI stack.
Ejaaz:
Now, right at the bottom are these things called Google TPUs,
Ejaaz:
Tensor Processing Units.
Ejaaz:
And they're their version of the GPU. In fact, fun fact,
Ejaaz:
Google's Gemini models has never trained on an NVIDIA GPU.
Ejaaz:
It's all been their own internal warehoused infrastructure. And they've been
Ejaaz:
working on this thing for 10 years.
Ejaaz:
Now, just today, or rather this week, they released their latest generation
Ejaaz:
of TPUs, the TPU-8T and the TPU-8i.
Ejaaz:
Now, the TPU-8T, T stands for training or pre-training.
Ejaaz:
It is highly optimized for the pre-training part of an AI model.
Ejaaz:
So this is like the bulk, arguably the more expensive part of training a model.
Ejaaz:
It's like teaching it like, hey, these are words.
Ejaaz:
These are the general fundamental set of facts that you need to know before
Ejaaz:
we can kind of like put you out into the world and present you to our users.
Ejaaz:
TPU AI is specialized or hyper-specialized in inference specifically.
Ejaaz:
Now, the important part about inference is it's being used for so many different things.
Ejaaz:
Number one, it's to answer all your different prompts. Whenever you write a
Ejaaz:
prompt and you submit it to an AI model, it is known as inference.
Ejaaz:
It's getting inference. It needs to query the model and make sure it does the
Ejaaz:
right types of thinking and gives you the right answer.
Ejaaz:
But the other part of inference is post-training, where a lot of people train
Ejaaz:
the model, and then they do more training after the fact by using it to help
Ejaaz:
the model reason and think of other alternative facts before it presents you the actual answer.
Ejaaz:
And that's what that second TPU is. Now, Google's TPUs have been used extensively.
Ejaaz:
In fact, their largest customer is a little-known AI lab known as Anthropic,
Ejaaz:
which currently runs 1.5 million TPUs. So the argument can be made that TPUs
Ejaaz:
are largely responsible for Claude's and Opus' success.
Ejaaz:
So very impressive all around, but there's some other facts about this, right?
Josh:
Yeah, well, I love the dual architecture training setup that they have here being hyper specific.
Josh:
I mean, the AT chip in particular, it's built to reduce frontier model development
Josh:
cycles, they said, from months to weeks.
Josh:
And then we have the AI, which is the reasoning engine, which is specifically
Josh:
served for agentic use to deliver tokens really quick, as fast as possible.
Josh:
And as we know, Anthropic is working closely with them. And also,
Josh:
I mean, Google is making these for themselves.
Josh:
So I think whoever is working with Google, whoever's kind of focused on these
Josh:
accelerators, is probably in for a nice little windfall as it relates to increased
Josh:
velocity of the training and also increased ability to distribute these models.
Josh:
As we know, Anthropik is having a very difficult time with this.
Josh:
Now, NVIDIA and Jensen are probably feeling a little shook.
Josh:
They got to be feeling a little bit of pressure here. and it seems as if
Josh:
that's why they're pushing to be open source because if you
Josh:
are a in a closed source world where everyone is
Josh:
making closed source models on their own architecture then the
Josh:
nvidia edge very quickly disappears and i mean i'm looking at these chips in
Josh:
hand they look beautiful they're ready to be they're taped out ready to be manufactured
Josh:
and i i think you could start getting kind of excited about this new world of
Josh:
accelerated hardware and we're seeing this happen again and again because amazon
Josh:
just made another big investment in who else other than Anthropic.
Josh:
And the deal, I think, is like this has to be close to a record deal.
Josh:
They're owning a tremendous amount of this company now.
Ejaaz:
Yep. So the news here is Amazon announced they're investing $5 billion into Anthropic.
Ejaaz:
They've just raised $5 billion. Congrats.
Ejaaz:
And so the reason why this is important is, well, there's a few reasons.
Ejaaz:
Number one, Anthropic knows that they don't have enough compute.
Ejaaz:
The argument could be made that's why Claude Mythos hasn't been rolled out.
Ejaaz:
Well, hey, hey, presto, now you have $5 billion worth more of compute.
Ejaaz:
Now, for those of you who didn't know, Amazon is a primary investor already in Anthropic.
Ejaaz:
Before this announcement, they owned around 17% of Anthropic.
Ejaaz:
After this announcement, it's closer to 20%. So we're talking about one company
Ejaaz:
that's publicly tradable right now that owns a fifth.
Ejaaz:
Is my math right? Yeah, a fifth of the world's leading
Ejaaz:
AI lab, which is pretty crazy. Now, if we look into the stats of this,
Ejaaz:
this is a five gigawatt deal, which is more than any single data center that's currently live.
Ejaaz:
It's actually a multiple of five. I think SpaceX AI's Colossus 2 is the largest
Ejaaz:
right now with their 1 million TB.
Ejaaz:
So it's going to be 5X larger than the average data center that we're seeing
Ejaaz:
right now for AI specifically.
Ejaaz:
And they're aiming to get one gigawatt online by the end of the year.
Ejaaz:
Now, Now, the reason why this is so good for both teams is Anthropic already
Ejaaz:
has a close relationship with AWS and Amazon's cloud computing department.
Ejaaz:
So spinning up more compute clusters is gonna be so easy for them.
Ejaaz:
They have a working relationship.
Ejaaz:
They're used to training cloud models on this, so it shouldn't be too hard to
Ejaaz:
ramp this up. If you're Amazon,
Ejaaz:
Welcome back. That $5 billion is going to come right back to you.
Ejaaz:
So I don't know what kind of like circle economy this is, but it's back and
Ejaaz:
it's very impressive for them.
Josh:
Is it ironic that today Amazon hit an all-time high? No, maybe, maybe not.
Ejaaz:
I'm holding stock. I got the stock.
Josh:
Clearly, clearly they're doing something right. Amazon is a phenomenal company.
Josh:
They're the largest shareholder in Anthropic.
Josh:
It's hard not to be bullish on them. It's hard not to be bullish on the accelerated
Josh:
computing stack. And I think that's probably what Jensen is getting nervous
Josh:
about. That's why NVIDIA is pushing open source.
Josh:
And the good news is, is he has some help. He has some assistance.
Josh:
From the folks overseas in China who have been pumping out unbelievable models
Josh:
all week long as it relates to Kimi and Quen, our Chinese favorites.
Josh:
We have Kimi K2.6 and Quen 3.6. There's a lot of digits and numbers.
Josh:
All you need to know is that the best open source models in the world didn't exist last week.
Josh:
They now exist this week and they are better at pretty much everything,
Josh:
but exceptional at coding.
Josh:
In fact, word on the street is that some of these models are as good as GPT
Josh:
5.4 was and only a few points off of Claude.
Josh:
I mean, these are pretty amazing open source models that, again,
Josh:
are free to run locally on your machine if you have the machine capability of doing so.
Josh:
That's a big, that's a big game changer.
Ejaaz:
Okay, so typically the story we tell with these open source models is,
Ejaaz:
wow, aren't they so amazing?
Ejaaz:
Yeah, they're the good younger brother. They're not as good as the Frontier AI Labs.
Ejaaz:
That completely changed this week. So Kimi K 2.6 is the latest model from a
Ejaaz:
Chinese lab called Moonshot Labs. I believe it's Moonshot or Moonshot AI.
Ejaaz:
And they released their model, which ends up being as good as coding or at coding.
Ejaaz:
As opus 4.7 and it's 100 open source
Ejaaz:
like you mentioned josh which means that maybe you could run
Ejaaz:
this on a local device now the answer that you would typically get back from
Ejaaz:
this is hey like listen it's uh it's too
Ejaaz:
large to run on my laptop and that is true but with the latest quen model which
Ejaaz:
is a 3.6 version you can run it as an 18 gigabyte sized model slightly quantized
Ejaaz:
on your laptop today so the point that i want to make about these models isn't
Ejaaz:
exactly the specifics but across all benchmarks,
Ejaaz:
it's not as good as the Frontier AI Labs, but it's a few points.
Ejaaz:
That difference and gap has closed massively over the last couple of months,
Ejaaz:
which tells me two things.
Ejaaz:
Number one, China has figured out some kind of groundbreaking way to train their
Ejaaz:
models that they haven't told the West about, and they're going to keep it closed
Ejaaz:
guard and eventually close source their model releases going forwards.
Ejaaz:
And number two, they've figured out a new way to use inference to their benefit.
Ejaaz:
Like one thing I'm going to highlight here is this new Kimi K 2.6 model can
Ejaaz:
code continuously for 12 hours straight using 300 agents.
Ejaaz:
So the unlock here isn't one model itself. It's spitting up 300 versions of
Ejaaz:
itself and getting it to attack the problem.
Ejaaz:
That's something Sam realized and what he's implementing in 5.5.
Ejaaz:
That's something Opus 4.7 realized and is doing probably similarly with Mythos.
Ejaaz:
So I have this question here, which is like, how do they have to try to do this?
Ejaaz:
Well, I think every three months that there's a new open model that gets released,
Ejaaz:
they're making these jumps because they're using these models to train themselves
Ejaaz:
we proved that with kimmy k 2.5 there's too many two point whatevers um and
Ejaaz:
the same thing is happening with quen it's just all around pretty amazing stuff um
Josh:
Yeah chen is crushing okay so before we go we have two quick things to hit the
Josh:
first being one that we missed last week which we need to touch on quickly anthropic
Josh:
has a design tool now if you are a designer if you are interested in building
Josh:
web pages videos graphics slideshows pitch decks any type of visual asset you're
Josh:
claude now has an entire design suite built just
Josh:
for this purpose it's called claude design it exists separately
Josh:
you can access it through the desktop app or on your browser and
Josh:
it basically allows you to build visual assets in a way that
Josh:
you couldn't previously previously with claude you had artifacts an
Josh:
artifact you could generate something dynamic it could kind of
Josh:
build you a web page this takes it to a whole new level you could generate wireframes
Josh:
if you want to try it to use less tokens you could fill it out and create properly
Josh:
created prototypes that are actually clickable it's amazing the video we're
Josh:
seeing on screen highlights a few of them unfortunately there was a big loser
Josh:
in this because uh this sounds like a lot of what that little design company named figma does.
Ejaaz:
Yeah the little company
Josh:
Stock market did not love the reaction to that did it nope.
Ejaaz:
Nope it is down almost 20 on the week um i actually tracked the stock price
Ejaaz:
after the announcement was made so like it wasn't even readily available it
Ejaaz:
was literally just the tweet 20 minutes after it was tweeted the stock was down
Ejaaz:
six percent so So the point being,
Ejaaz:
whether this is market speculation or not, like, listen,
Ejaaz:
Claude Design isn't as good as Figma.
Ejaaz:
They're working with a few of these different partners, such as Canva.
Ejaaz:
Two weeks ago, one of Anthropic's former most execs left the board of Figma.
Ejaaz:
And the rumors was because they were building a competitor. So it's pretty clear.
Ejaaz:
Anthropic is going off to every single sector, whether you're a designer,
Ejaaz:
a software engineer, a mathematician, a research scientist, doesn't matter.
Ejaaz:
They're going off to everything because the model is applicable to everything.
Ejaaz:
And I don't know what this means for certain modes that companies like Figma
Ejaaz:
holds, but it's certainly going to affect the stock price.
Josh:
Can you do me a favor and click the max button real quick for me just to show the chart?
Ejaaz:
Oh!
Josh:
Yeah, minus 86% since IPO for those who are not watching on screen.
Josh:
It's been a pretty bad, rough run for Figma.
Ejaaz:
We have to start naming Anthropic the stock killer, Josh. This is like every
Ejaaz:
single tweet is tanking a stock.
Josh:
No, it's tough. It's brutal. We had one last thing that you wanted to mention.
Josh:
I know. We got to end on this strong. What do we have?
Ejaaz:
How good is your accent or impersonation of your president, of our president, Josh?
Josh:
Pretty horrible.
Ejaaz:
Not good. Okay, well, then we're not going to attempt it.
Josh:
I'd love to hear your British take on it, though, if you're feeling ambitious.
Ejaaz:
Okay, so my British take on this is, this is, albeit hilarious and somewhat
Ejaaz:
terrifying, that the President of the United States is saying this.
Ejaaz:
He commented, okay, on the government's relationship with Anthropic.
Ejaaz:
Now, if you're wondering why on earth he's commenting on it,
Ejaaz:
they're going to be releasing this cold mythos model.
Ejaaz:
It might be a security risk. It's probably good for the government to have access
Ejaaz:
to this thing and prepare necessarily.
Ejaaz:
The government has been having very important conversations with bankers and
Ejaaz:
governments all around the world, just try and figure out, you know,
Ejaaz:
how best to prepare for this.
Ejaaz:
And after having an in-depth discussion with Dario Amore, which by the way,
Ejaaz:
he blacklisted that CEO and Anthropic entirely from the government using it.
Ejaaz:
He's now rekindling it and saying, maybe there's a deal on the line.
Ejaaz:
He goes, and I quote, I'm not going to do the accent.
Ejaaz:
We'll get along with Anthropic just fine. Trump said on CNBC.
Josh:
We'll get along with Anthropic just fine. I think they can be of great use to us.
Josh:
They're high IQ people. Very good. Very good. They tend to be on the left,
Josh:
radical left, but We get along with them.
Josh:
I don't know. That's all I got. But that is what he said.
Ejaaz:
Were you practicing that? That was actually pretty good. I was practicing my head.
Josh:
I was rehearsing.
Ejaaz:
Damn. I closed my eyes whilst you were doing that, whilst I was laughing. Did it feel right?
Josh:
It sounded like him? Good.
Ejaaz:
It channeled his spirit. It was there. It was a good effort.
Ejaaz:
But I believe that's it. That is the end of the roundup.
Ejaaz:
Josh and I are recording this. FYI, it's 4 p.m. over here. Typically, we're morning birds.
Ejaaz:
We deliver this in the morning, but we waited for the announcement of Spud GPT
Ejaaz:
5.5 just for you guys. and we're going to be bringing you the cutting edge news every single week.
Ejaaz:
As Josh mentioned, we had three other amazing episodes that we filmed earlier
Ejaaz:
this week. Definitely go check them out. They're all each 20 minute song.
Ejaaz:
It's your commute to work.
Ejaaz:
It's your gym session if you're not that active.
Ejaaz:
Definitely go check it out and let us know what you think. But yeah,
Ejaaz:
Josh, any final thoughts?
Josh:
Call me crazy, but I like the afternoon recordings. I got good energy.
Josh:
I'm like woken up. I'm 100% right now. I'm rocking and rolling.
Josh:
I'm feeling good. So I don't know. Maybe we'll have to lean into this a little
Josh:
bit more, but that's everything.
Josh:
If you've made it this far, if you're still listening to this and you've heard
Josh:
our other episodes, you're caught up. You're done for the week.
Josh:
You can go touch grass. Enjoy your weekend.
Josh:
There will be a lot more to talk about next weekend. But for now,
Josh:
you have fully synchronized with all of the chaos happening on the frontier of AI and technology.
Josh:
Thank you so much for watching. As always, we very much appreciate it.
Josh:
If you enjoyed this episode or any of our previous episodes from this week,
Josh:
don't forget to share them with a friend who you also might enjoy it, possibly.
Josh:
We have a newsletter on Substack that goes live twice a week.
Josh:
Just went live yesterday, going live again tomorrow.
Josh:
The Friday issue is a recap of everything that happens this week,
Josh:
which is always fun and exciting.
Josh:
In fact, I'm gonna go write that as soon as we finish this episode.
Josh:
So thank you all for watching.
Josh:
As always, don't forget to subscribe, like, comment, all the good things,
Josh:
and we will see you guys next week.