Limitless Podcast

Kimi K2 is a groundbreaking open-source AI model from China with 1 trillion parameters. We discuss its competitive advantages, including low operational costs and superior coding capabilities through a "mixture of experts" approach. 

Josh highlights the implications for AI competition as Kimi K2 emerges in the market alongside OpenAI’s plans for an open-source model.

We also explore Kimi K2’s two versions—Base and Instruct—its impact on the AI landscape, and the challenges faced by OpenAI's ChatGPT, xAI's Grok, and Anthropic's Claude. Tune in for key insights on how Kimi K2 could reshape AI development!

------
💫 LIMITLESS | SUBSCRIBE & FOLLOW
https://limitless.bankless.com/
https://x.com/LimitlessFT

-----
TIMESTAMPS

0:00 Intro
0:58 The Rise of Kimi K2
2:49 Efficiency and Cost Benefits
3:53 Training Breakthroughs Explained
5:37 Innovations in AI Training
6:30 The Impact of Open Source
8:05 Competitive Landscape of AI
9:41 Context Window Capabilities
12:55 The Surge of Kimi K2
15:36 Market Adoption Insights
19:57 Versions of Kimi K2
24:21 Privacy and Local AI
26:30 The AI Talent Landscape
31:04 China's AI Competitive Edge
32:40 Open Source vs. Closed Source
40:19 Closing Thoughts and Future Prospects
42:49 Get Involved

-----
RESOURCES

Josh: https://x.com/Josh_Kale

Ejaaz:https://x.com/cryptopunk7213

------
Not financial or tax advice. See our investment disclosures here:
https://www.bankless.com/disclosures⁠

Creators and Guests

Host
Josh Kale

What is Limitless Podcast?

Exploring the frontiers of Technology and AI

Ejaaz:
A bunch of AI researchers from China just released a brand new AI model called

Ejaaz:
Kimi K2, which is not only as good as any other top model like Claude,

Ejaaz:
but it is also 100% open source, which means it's free to take,

Ejaaz:
customize and create into your own brand new AI model.

Ejaaz:
This thing is amazing at coding, it beats any other model at creative writing,

Ejaaz:
and it also has a pretty insane voice mode.

Ejaaz:
Oh, and I should probably mention that it is one trillion parameters in size,

Ejaaz:
which makes it one of the biggest and largest models to ever be created.

Ejaaz:
Josh, we were winding down on a Friday night and this news broke that this team

Ejaaz:
had released this model.

Ejaaz:
Absolutely crazy bomb, especially with like OpenAI rumored to release their

Ejaaz:
open source model this week.

Ejaaz:
You've been jumping into this. What's your take?

Josh:
Yeah. So last week we crowned Grok 4 as the new leading private model, closed source model.

Josh:
This week we got to give the crown to Kimi K2 we got another crown

Josh:
going for the open source team they are winning I mean this is

Josh:
better than DeepSeek and DeepSeek R2 this is basically DeepSeek R3

Josh:
I would imagine um and if you remember back a couple months DeepSeek really

Josh:
flipped the world on its head because of how efficient it was and the algorithmic

Josh:
upgrades it made and I think what we see with Kimi K2 is a lot of the same thing

Josh:
it's it's these novel breakthroughs that come as a downstream effect of their

Josh:
needing to be resourceful

Josh:
China, they don't have the mega GPU clusters we have, they don't have all the

Josh:
cutting edge hardware, but they do have the software prowess to find these efficiencies.

Josh:
I think that's what makes this model so special. And that's what we're going

Josh:
to get into here is specifically what they did to make this model so special.

Ejaaz:
Yeah, I mean, look at these stats here, Josh, like 1 trillion parameters in total.

Ejaaz:
It's 32 billion active mixture of expert models. So what this means is,

Ejaaz:
although it's really large in size, typically these AI models can become pretty

Ejaaz:
inefficient if it's large in size, it uses this technique called mixture of

Ejaaz:
experts, which means that whenever someone queries a model,

Ejaaz:
it only uses or activates a number of parameters that are relevant for the query itself.

Ejaaz:
So it's more smarter, it's much more efficient, and it doesn't use or consume

Ejaaz:
as much energy as you would if you wanted to run it locally at home or whatever

Ejaaz:
that might be. It's also super cheap.

Ejaaz:
I think I saw somewhere that this was 20% the cost of clawed,

Ejaaz:
josh which uh we love that insane uh

Ejaaz:
for all the nerds that kind of want to run you know

Ejaaz:
really long tasks or you know just set and

Ejaaz:
forget the ai to to run on like your coding log or whatever that might mean

Ejaaz:
you can now do it at a much more affordable rate at one-fifth the cost uh than

Ejaaz:
some of the top models that are out there and it is as good as those models

Ejaaz:
so just insane kinds of things josh i know there's a bunch of things that you

Ejaaz:
wanted to point out here on benchmarks um And what do you want to get into?

Josh:
Yeah, it's really amazing. So they took 15 and a half trillion tokens and they

Josh:
condensed those down into a one trillion parameter model.

Josh:
And then what's amazing is when you use this model, like she said,

Josh:
it uses a thing called mixture of experts.

Josh:
So it has, I believe, 384 experts.

Josh:
And each expert is good at a specific thing. So let's say in the case you want

Josh:
to do a math problem, it will take a 32 billion parameter subset of the one

Josh:
trillion total parameters, and it will choose eight of these different

Josh:
Experts in a specific thing. So in the case of math, it'll find an expert that

Josh:
has the calculator tool.

Josh:
It'll find an expert that has a fact, like a fact checking tool or a proof tool

Josh:
to make sure that the math is accurate.

Josh:
It'll have just a series of tools to help itself. And that's kind of how it

Josh:
works so efficiently is instead of using a trillion parameters at once,

Josh:
it uses just 32 billion and it uses the eight best specialists out of the 384

Josh:
that it has available to it. It's really impressive.

Josh:
And what we see here is the benchmarks that we're showing on screen.

Josh:
And the benchmarks are really good.

Josh:
It's up there in line with just about any other top model, except with the exception

Josh:
that this is open source.

Josh:
And there was another breakthrough that we had, which was the actual way that

Josh:
they handled the training of this.

Josh:
And yeah, this is the loss curve. So what you're looking at on screen for the

Josh:
people who are listening, it's this really pretty smooth curve that kind of

Josh:
starts at the top and it trends down in a very predictable and smooth way.

Josh:
And most curves don't look like this. And if they do look like this,

Josh:
it's because the company has spent tons and tons of money on error correction

Josh:
to make sure this curve is so smooth.

Josh:
So basically what you're seeing is the training run of the model.

Josh:
And a lot of times what happens is you get these very sharp spikes and it starts

Josh:
to defer away from the normal training run.

Josh:
And it takes a lot of compute to kind of recalibrate and push that back into the right way.

Josh:
What they've managed to do is really make it very smooth.

Josh:
And they've done this by increasing these efficiencies. So if you can think

Josh:
about it, there's this analogy I was thinking of right before we hit the record button.

Josh:
And it's if you were teaching a chef how to cook, right?

Josh:
So we have Chef Ejaz here. I am teaching him how to cook. I am an expert chef.

Josh:
And instead of telling him every ingredient and every step for every single

Josh:
dish, what I tell him is like, hey, if you're making this amazing dinner recipe,

Josh:
all you need that matters is this amount of salt applied at this time,

Josh:
this amount of heat applied for this length of time, and the other stuff doesn't matter as much.

Josh:
So just put in whatever you think is appropriate, but you'll get the same answer.

Josh:
And that's what we see with this model is just an increased amount of efficiency by being

Josh:
direct by being intentional about the data that they used to train it on,

Josh:
the data that they used to fetch in order to give you high quality queries.

Josh:
And it's a really novel breakthrough. They call it the MuonClip optimizer,

Josh:
which, I mean, it's a Chinese company, maybe it means something special there,

Josh:
but it is a new type of optimizer.

Josh:
And what you're seeing in this curve is that it's working really well and it's

Josh:
working really efficient.

Josh:
And that's part of the benefit of having this open source is now we have this

Josh:
novel breakthrough and we could take this and we could use this for even more

Josh:
breakthroughs even more open source models and and that's part that's been really cool to see

Ejaaz:
I i mean this is just um time

Ejaaz:
and again from china uh so so amazing from their research team so so like just

Ejaaz:
to kind of like um pick up your comment on deep seek at the end of last year

Ejaaz:
we were utterly convinced that the only way to create a breakthrough model was

Ejaaz:
to spend billions of dollars on compute clusters.

Ejaaz:
And so therefore it was a pay-to-play game. And then DeepSeek,

Ejaaz:
a team out of China, released their model and completely open-sourced it as well.

Ejaaz:
And it was as good as OpenAI's Frontier model, which was the top model at the time.

Ejaaz:
And the revelation there was, oh, you don't actually just need to chuck a bunch of compute at this.

Ejaaz:
There are different techniques and different methods if you get creative about

Ejaaz:
how you design your model and how you run the training cluster,

Ejaaz:
the training one, which is basically what you need to do to make your model smart,

Ejaaz:
you can run it in different ways that is more efficient, consumes less energy,

Ejaaz:
and therefore less amount of money, but is as smart, if not smarter,

Ejaaz:
than the frontier models that American AI companies are making.

Ejaaz:
And this is just a repeat of that, Josh.

Ejaaz:
I mean, look at this curve. For those who are looking at this episode on video.

Ejaaz:
It is just so clean yeah it's beautiful

Ejaaz:
the craziest part about this is when deep

Ejaaz:
seek was released they pioneered something called uh reasoning

Ejaaz:
or reinforcement learning uh which are two separate

Ejaaz:
techniques that made the model super smart um with less energy and less compute

Ejaaz:
spend um with this model they didn't even implement that technique at all so

Ejaaz:
theoretically this model can get so much more smarter than it already is um

Ejaaz:
and they just kind of leveraged a new method to make it as smart as it already is right now.

Ejaaz:
So just such a fascinating kind of like progress in research from China.

Ejaaz:
And it just keeps on coming out. It's so impressive.

Josh:
Yeah, this is this was the exciting part to me is that we're seeing so many

Josh:
algorithms or exponential improvements in so many different categories.

Josh:
So this was considered a breakthrough by all means. And this wasn't even the

Josh:
same type of breakthrough that DeepSeek had.

Josh:
So we get this now compounding effect where we have this new training breakthrough

Josh:
and then we have DeepSeek who has the reinforcement learning and that hasn't

Josh:
even yet been applied to this new model.

Josh:
So we get the exponential growth on one end, the exponential growth on the reasoning end,

Josh:
those come together and then you get the exponential growth on the hardware

Josh:
stack where the GPUs are getting much faster and there's all of these different

Josh:
subsets of AI that are compounding on each other and growing and accelerating

Josh:
quicker and quicker and what you get is this unbelievable rate of progress and

Josh:
that's what we're seeing. So

Josh:
reasoning isn't even here yet and we're going to see it soon because it is open

Josh:
source so people can apply their own reasoning on top of it i'm sure the moonshot

Josh:
team is going to be doing their own reasoning version of this model and i'm

Josh:
sure we're going to be getting even more impressive results soon i see you have

Josh:
a post up here um about the testing and overall performance can you please share yeah

Ejaaz:
Yeah so um this is a tweet that summarizes really well how this model performs

Ejaaz:
in relation to other Frontier models.

Ejaaz:
And the popular comparison that's taken for Kimi K2 is against Claude.

Ejaaz:
So Claude has a bunch of models out.

Ejaaz:
Claude 3.5 is its earlier model, and then Claude 4 is its latest.

Ejaaz:
And the general take is that this model is just better than those models,

Ejaaz:
which is just insane to say, because for so long, Josh, we've said that Claude

Ejaaz:
was the best coding model.

Ejaaz:
And indeed it was. And then within the span of, what is it, five days?

Ejaaz:
Grok 4 released and it just completely blew Claude 4 out of the water in terms of coding.

Ejaaz:
Now Kimi K2, an open source model out of China who doesn't even have access

Ejaaz:
to the research and kind of proprietary knowledge that a lot of American AI

Ejaaz:
companies have also beat it as well, right?

Ejaaz:
So it kind of beats Claude at its own game, but it's also cheaper.

Ejaaz:
It's 20% the cost of Claude 3.5, which is just an insane thing to say,

Ejaaz:
which means that if you are a developer out there that

Ejaaz:
wants to try your hand at kind of like vibe coding

Ejaaz:
a bunch of things or actually seriously coding something you

Ejaaz:
know that's quite novel but you don't have the hands on deck to do that you

Ejaaz:
can now spin up a Kimi K2 AI agent actually multiple of them for a very cost-efficient

Ejaaz:
reasonable you know salary you don't have to pay like hundreds of thousands

Ejaaz:
of dollars or you know hundreds of millions of dollars which is what Meta is

Ejaaz:
doing to kind of buy a bunch of these software engineers,

Ejaaz:
you can spend, you know, the equivalent of maybe a Netflix subscription or $500

Ejaaz:
to $1,000 a month and spin up your own app. So super, super cool.

Josh:
And also one added perk that's there is it's that even if you have a lot of

Josh:
GPUs sitting around, you can actually run this model for free.

Josh:
So that's the cost if you actually query it from the servers.

Josh:
But I'm sure there's going to be companies that have access to XS GPUs.

Josh:
They can actually just download the model because it's open source,

Josh:
open weights, and they could run it on their own.

Josh:
And that brings the cost of compute down to the cost per kilowatt of the energy

Josh:
required to run the GPUs.

Josh:
So because it's open source, you really start to see these costs decline,

Josh:
but the quality doesn't.

Josh:
And that's every time we see this, we see a huge productivity unlock in encoding

Josh:
output and amount of queries used. It's like, this is freaking awesome.

Ejaaz:
Yeah josh i saw something else come up as well so so do you remember when claude

Ejaaz:
first released um their frontier model i think it was 3.5 or maybe it was four

Ejaaz:
one of their bragging rights was it had a one million uh token context window which.

Josh:
Oh yes which was huge

Ejaaz:
Yeah which for listeners of the show is huge it's like several uh book novels

Ejaaz:
worth um of words or characters you could just bung into one single prompt.

Ejaaz:
And the reason why that was such an amazing thing was for a while,

Ejaaz:
people struggled to kind of communicate with these AIs because they couldn't set the context.

Ejaaz:
There wasn't enough bandwidth within their chat log window for them to say,

Ejaaz:
you know, and don't forget this. And then there was this.

Ejaaz:
And then, you know, this detail and that detail, there just wasn't enough space.

Ejaaz:
And models weren't performing enough to kind of consume all of this in one go.

Ejaaz:
And then Claude came out and was like, hey, we have one million context windows.

Ejaaz:
Don't worry about it chuck in all the research papers that you want chuck in

Ejaaz:
your essay chuck in reference books and we got you um i saw this tweet that

Ejaaz:
was uh deleted i think you sent this to me um.

Josh:
We got the screenshots we always come with receipts yeah i

Ejaaz:
Wonder why they deleted it but uh good catch from you um yeah let's get into this.

Josh:
What's your take on it was was first posted i think

Josh:
earlier today yeah like an hour ago and then deleted pretty shortly afterwards

Josh:
and this is from a woman name crystal crystal works with the moonshot team she

Josh:
is part of the team that that released kimmy k2 um and in this post it says

Josh:
kimmy isn't just another ai it went viral in china as the first to support

Josh:
A 2 million token context window. And then she goes on to say,

Josh:
we're an AI lab with just 200 people, which is ministerially small compared

Josh:
to a lot of the other labs they're competing with.

Josh:
And it was acknowledgement that they had a 2 million token context window.

Josh:
And for those who, just a quick refresher on the context window stuff,

Josh:
it's imagine you have like a gigantic textbook and you've read it once and you

Josh:
close it and you kind of have a fuzzy memory of all the pages.

Josh:
The context window allows you to lay all of those out in clear view

Josh:
and directly reference every single page so when

Josh:
you have two million tokens which is roughly two million words

Josh:
of context we're talking about like hundreds and hundreds

Josh:
of books and textbooks and knowledge and you could really dump a

Josh:
lot of information in this for the ai to readily access and

Josh:
that if they release that a two million token

Josh:
open source model that's huge

Josh:
deal i mean even grok 4 recently i believe

Josh:
what did we say it was it was a 256 000 uh token context window something like

Josh:
that so grok 4 is one eighth of what they supposedly have accessible right now

Josh:
which is a really really big deal um so i'm hoping it was deleted because they

Josh:
just don't want to share that not because it's not true i would like to believe

Josh:
that it's true because man that'd be pretty epic yeah

Ejaaz:
And the people are loving it josh um check out this

Ejaaz:
graph from uh open router which basically shows

Ejaaz:
uh the split of usage between everyone

Ejaaz:
on their platform that are querying different models so for context

Ejaaz:
here open router is a website that you can go to

Ejaaz:
and you can type up a prompt just like you do at chat gpt and

Ejaaz:
you can decide which model your

Ejaaz:
prompt goes to or you could let open router decide for you

Ejaaz:
and it kind of like divvies up your query so if you have a coding query it's

Ejaaz:
probably going to send it to claude or now kimmy k2 or grok4 but if you have

Ejaaz:
something that's more like to do with creative writing or something that's like

Ejaaz:
a case study it might send it to OpenAI's O3 model, right? So it kind of like decides for you.

Ejaaz:
OpenRacha released this graphic, which basically shows that KimiK2 surpassed

Ejaaz:
XAI in token market share just a few days after launching, which basically means

Ejaaz:
that XAI spent, you know,

Ejaaz:
hundreds of billions of dollars training up their Grok4 model,

Ejaaz:
which just kind of beat out the competition just last week.

Ejaaz:
Then KimiK2 gets released completely open source

Ejaaz:
and everyone starts to use that more than

Ejaaz:
grok 4 which is just an insane thing to say and

Ejaaz:
just shows how rapidly these ai models compete with each other and surpass each

Ejaaz:
other um i think part of the reason for this josh is it's open source right

Ejaaz:
which means that not only are retail users like myself and yourself using it

Ejaaz:
for our daily queries you know uh you know,

Ejaaz:
create this recipe for me or whatever, but researchers and builders all over

Ejaaz:
the world that have so far been challenged or had this obstacle of pots of money

Ejaaz:
basically to start their own AI company now have access to a frontier,

Ejaaz:
world-renowned model and can create whatever application, website,

Ejaaz:
or product that they want to make.

Ejaaz:
So I think that's part of the usage there as well. Do you have any takes on this?

Josh:
Yeah, and it's downstream of cost, right? We always see this when a model is

Josh:
cheaper and mostly equivalent, the money will always flow to the cheaper model.

Josh:
It'll always get more queries. I think it's important to note the different

Josh:
use cases of these models. So they're not directly competing head to head on the same benchmarks.

Josh:
I think what we see is like when we talk about Claude, it's generally known as the coding model.

Josh:
And I don't think like OpenAI's O3 is not really competing directly with Claude

Josh:
because it's more of a general intelligence versus a coding specific intelligence.

Josh:
K2 is probably closer to a Claude. I would assume where it's really good at

Josh:
coding because it uses this mixture of experts.

Josh:
And I think that helps it find the tools. It uses this cool new novel thing

Josh:
called like multiple tool use.

Josh:
So each one of these experts can use a tool simultaneously and they could use

Josh:
these tools and work together to get better answers.

Josh:
So in the case of coding, this is a home run.

Josh:
Like it is very cheap cost per token, very high quality outputs.

Ejaaz:
I actually think you can compete with OpenAO3, Josh. Check this out.

Ejaaz:
So Rowan, yeah, Rowan Cheng put this out yesterday And he basically goes,

Ejaaz:
I think we're at the tipping point for AI-generated writing.

Ejaaz:
It's been notoriously bad, but China's Kimi K2, an open-weight model,

Ejaaz:
is now topping creative writing benchmarks.

Ejaaz:
So just to put that into context, that's like having the top most, I don't know,

Ejaaz:
smartest or slightly autistic software engineer, at the top engineering company

Ejaaz:
working on AI models, also being the best poet or creative script and directing

Ejaaz:
the next best movie or whatever that might be,

Ejaaz:
or creating a Harry Potter novel series.

Ejaaz:
This model can basically do both. And what it's pointing out here is that compared

Ejaaz:
to 03, it tops it. Look at this. Completely beats it.

Josh:
Okay, so I take that back. Maybe it is just better at everything.

Josh:
Yeah, that's some pretty impressive results.

Ejaaz:
I think like what's worth pointing out here is, and I don't know whether any

Ejaaz:
of the American AI models do this, Josh, but mixture of experts seems to be clearly a win here.

Ejaaz:
The ability to create an incredibly smart model doesn't come without,

Ejaaz:
you know, this large storage load that is needed, right? One trillion parameters.

Ejaaz:
But then combining it with the ability to be like, Like, hey,

Ejaaz:
you don't need to query the entire thing.

Ejaaz:
We've got you. We have a smart router, which basically pulls on the best experts,

Ejaaz:
as you described earlier, for whatever relevant query you have.

Ejaaz:
So if you have a creative writing task or if you have a coding thing,

Ejaaz:
we'll send it to two different departments of this model.

Ejaaz:
That's a really huge win. Do any other American models use this?

Josh:
Well, the first thing that came to my mind when you said that is Grok4,

Josh:
which doesn't exactly use this, but uses a similar thing, where instead of using

Josh:
a mixture of experts, It uses a mixture of agents.

Josh:
So Grok4 Heavy uses a bunch of distributed agents that are basically clones of the large model.

Josh:
But that takes up a tremendous amount of compute. And that is the $300 a month plan.

Ejaaz:
That's replicating Grok4 though, right? So that's like taking the model and copy pasting it.

Ejaaz:
So let's say Grok4 was one trillion parameters just for ease of comparison.

Ejaaz:
That's like creating, if there was four agents, that's four trillion parameters,

Ejaaz:
right? So it's still pretty costly and inefficient.

Josh:
Is that what you're saying no it's the actually the opposite direction of k2

Josh:
so what they have used is just and again this is kind of similar to tracking

Josh:
sentiment between the united states and china where the united states will throw

Josh:
compute at it where china will throw like

Josh:
kind of clever resource at it so grok yeah

Josh:
when they use their mixture of agents it actually just costs a lot more

Josh:
money whereas k2 when they use their mixture of

Josh:
experts well it costs a lot less instead of using 4 trillion

Josh:
parameters in this case it uses just 32 billion and it

Josh:
kind of copies that 32 billion over and over and it's really it's a really

Josh:
elegant solution that seems to be

Josh:
yielding pretty comparable results so i think as we

Josh:
see these efficiency upgrades i'm sure they will

Josh:
eventually trickle down into the united states models and when they do that

Josh:
is going to be a huge unlock in terms of cost per token in terms of the smaller

Josh:
distilled models that we're going to be able to run on our own computers um

Josh:
but yeah i don't know of any who are also using it at this scale it might be

Josh:
novel just to k2 right now and

Ejaaz:
And i think that this is the method that probably scales the best josh like.

Josh:
Yeah it makes sense efficiency

Ejaaz:
Always wins at the end right and to see um this kind of innovation come pretty

Ejaaz:
early on in a technology's life cycle is just super impressive to see,

Ejaaz:
Another thing I saw is there's two different versions of this model, I believe.

Ejaaz:
There's something called Kimi K2 Base, which is basically the model for researchers

Ejaaz:
who want full control for fine-tuning and custom solutions, right?

Ejaaz:
So imagine this model as the entire parameter set. So you have access to one

Ejaaz:
trillion parameters, all the weight designs and everything.

Ejaaz:
And if you're a nerd that wants to nerd out you can

Ejaaz:
go crazy you know if you have like your own gpu

Ejaaz:
cluster at home or if you happen to have a convenient

Ejaaz:
warehouse full of of servers that you weirdly

Ejaaz:
have access to you can go crazy with it you can if you

Ejaaz:
think about like um the early gaming days of counter-strike and then you could

Ejaaz:
like mod it you can basically mod this uh model to your heart's desire and then

Ejaaz:
there's a second version called k2 instruct which is for drop-in general purpose

Ejaaz:
chat and AI agent experiences.

Ejaaz:
So this is kind of like at the consumer level, if you're experimenting with

Ejaaz:
these things, or if you want to run an experiment at home on a specific use

Ejaaz:
case, you can kind of like take that away and do that for yourself.

Ejaaz:
That's how I understand it, Josh. Do you have any takes on this?

Josh:
That makes sense. And I think that second version that you're describing is

Josh:
what's actually available publicly on their website, right?

Josh:
So if you go to Kimmy.com, it has a text box. It looks just like ChatGPT like you're used to.

Josh:
And that's where you can run that second tier model which

Josh:
um you described as that's the the drop in general purpose

Josh:
chat and then yeah for the the hardcore researchers there's

Josh:
a github repo and the github repo has all the weights and all the code and

Josh:
you can really download it dive in use the full thing i

Josh:
was playing around with the kimmy tool and it's it's really cool

Josh:
it's fast oh i mean it's lightning fast if you

Josh:
go from a reasoning model to an inference model like kimmy

Josh:
you get responses like this like when

Josh:
i'm using grok 4 or o3 i'm sitting there sometimes for a couple minutes it's

Josh:
waiting for an answer this you type it in and it just types back right away

Josh:
no time waiting so it's it's kind of refreshing to see that but it's also a

Josh:
testament to how impressive it is i'm getting great answers and it's just spitting

Josh:
it right out so what happens when they add the reasoning layer on top well it's

Josh:
probably going to get pretty freaking good

Ejaaz:
So the trend we're seeing, and we saw this last week with Grok4,

Ejaaz:
is typically we're expected to wait a while when we send a prompt to a breakthrough

Ejaaz:
model because it's thinking, it's trying to basically replicate what we have in our brains up here.

Ejaaz:
And now it's just getting much quicker and much smarter and much cheaper.

Ejaaz:
So the long story short is these incredibly powerful, I kind of think about

Ejaaz:
it as how we went from massive desktop computers to slick cell phones,

Ejaaz:
Josh, and then we're going to eventually have chips in our brain.

Ejaaz:
AI is just kind of like fast tracking that entire life cycle within like a couple

Ejaaz:
of years, which is just insane.

Josh:
And these efficiency improvements are really exciting because you can see how

Josh:
quickly they're shrinking and allowing eventually for those incredible models

Josh:
to just run on our phones.

Josh:
So there's totally a world a year from now in which like a

Josh:
grok 403 kimmy k2 capable model

Josh:
is small enough that it could just run inside of in our

Josh:
phone and run on a mobile device or run locally on a laptop

Josh:
or you're offline and you kind of have this portable intelligence

Josh:
that's available everywhere anytime even if

Josh:
you're not connected to the world and that seems really cool

Josh:
like we were talking a few episodes ago about apple's um local

Josh:
free ai inference running on an iphone

Josh:
but how the base models still kind of suck like they don't really do

Josh:
anything super interesting they're basically good enough to do what

Josh:
you would expect siri to do but can't do and these

Josh:
models as we get more and more breakthroughs like this that allow you to

Josh:
run much larger parameter counts

Josh:
on a much smaller device it's going to start really

Josh:
super powering these mobile devices and i can't help but think about the open

Josh:
ai hardware device i'm like wow that'd be super cool if you had like oh three

Josh:
running locally in the middle of the jungle somewhere with no service and you

Josh:
still had access to all of its capabilities like that's probably coming downstream

Josh:
of breakthroughs like this where we get really big efficiency unlocks

Ejaaz:
I mean, it's not just efficiency, though, right? It's the fact that if you can

Ejaaz:
run it locally on your device, it can have access to all your private data without

Ejaaz:
exposing all of that to the model providers themselves, right?

Ejaaz:
So one of the major concerns of not just AI models, but also with mobile phones is privacy.

Ejaaz:
I don't want to share all my kind of like private health, financial,

Ejaaz:
and social media data, because then you're just going to have everything on

Ejaaz:
me and you're going to use me.

Ejaaz:
You're going to use me as a product, right? And that's kind of like been the

Ejaaz:
quota for the last decade in tech.

Ejaaz:
And so with AI, that's a supercharged version of it. The information gets more

Ejaaz:
personal. It's not just your likes.

Ejaaz:
It's, you know, where Josh shops every day and, you know, who he's dating and

Ejaaz:
all these kinds of things, right?

Ejaaz:
And that becomes quite personal and intrusive very quickly.

Ejaaz:
So the question then becomes, how can we have the magic of an AI model without it being so obtrusive?

Ejaaz:
And that is open source locally run AI or privately run AI. and Kimi K2 is a

Ejaaz:
frontier model that can technically run on your local device if you set up the right hardware for it.

Ejaaz:
And the way that we're trending, you can basically end up having that on your

Ejaaz:
device, which is just a huge unlock.

Ejaaz:
And if you can imagine how you use OpenAI 03 right now, Josh,

Ejaaz:
right? I know you use it as much as I do.

Ejaaz:
The reason why you and I use it so much isn't just because it's so smart,

Ejaaz:
but it's because it remembers everything about us.

Ejaaz:
But I hate that Sam knows or has access to all that data.

Ejaaz:
I hate that if he chooses to switch on personalized ads, which is currently

Ejaaz:
the model where most of these tech companies make money right now,

Ejaaz:
he can, and I've got nothing to do about it because I don't want to use any

Ejaaz:
other model apart from that.

Ejaaz:
But if there was a locally run

Ejaaz:
model that had access to all the memory and context, I'd use that instead.

Josh:
And this is suspicious. I mean, this is a different conversation in total,

Josh:
but isn't it interesting how other companies haven't really leaned into memory

Josh:
when it's seemingly the most important mode that there is.

Josh:
Like Grok4 doesn't have good memory rolled out. Gemini doesn't really have memory.

Josh:
There's no, Claude doesn't have memory the way that OpenAI does.

Josh:
Yet it's the single biggest reason why we both continue to go back to ChatGPT and OpenAI.

Josh:
So that's just been an interesting thing. I mean, Kimmy is open source.

Josh:
I wouldn't expect them to lean too much into it. But for these closed source

Josh:
models, that's just, it's another interesting just observation.

Josh:
Like, hey, the most important thing isn't, doesn't seem to be prioritized by

Josh:
other companies just yet.

Ejaaz:
Why do you think that is so so my theory um at least from xai or grok force

Ejaaz:
perspective is elon's like okay i'm not going to be able to build a better chat

Ejaaz:
bot or chat messenger than openai has there's not too many features i can um.

Ejaaz:
Set Grok 4 apart, then that O3 doesn't already do, right?

Ejaaz:
But where I can beat O3 is at the app layer.

Ejaaz:
I can create a better app store than they have because I haven't really created

Ejaaz:
one that is sticky enough for users to continually use.

Ejaaz:
And I can use that data set to then unlock memory and context at that point, right?

Ejaaz:
So I just saw today that they released, they

Ejaaz:
being um xai released a new feature for grok 4

Ejaaz:
called i think it's uh companions josh um

Ejaaz:
and it's basically these yeah these animated um

Ejaaz:
avatar like um characters so they basically look like they're from an anime

Ejaaz:
show and you know how you can use voice mode in open ai and you can kind of

Ejaaz:
like talk to this uh realistic human sounding ai you now have a face and a character

Ejaaz:
on grok 4 and it's really entertaining, Josh.

Ejaaz:
Like I find myself kind of like engaged in this thing because I'm not just typing words.

Ejaaz:
It's not just this binary to and fro with this chat messenger.

Ejaaz:
It's this human, this cute, attractive human that I'm just like now speaking to.

Ejaaz:
And I think that that's the strategy that a lot of these AI companies,

Ejaaz:
if I had to guess, are taking to kind of like seed their user base before they

Ejaaz:
unlock memory. I don't know whether you have a take on that.

Josh:
Yeah, I have a fun little demo. I actually played around with it this morning

Josh:
and I was using it totally unhinged, no filter, very vulgar,

Josh:
but like kind of fun. It's like a fun little party trick.

Josh:
And yeah, I mean, that was a surprise to me this morning when I saw that rolled

Josh:
out. I was like, huh, that doesn't really seem like it makes sense.

Josh:
But I think they're just having fun with it.

Ejaaz:
Can we for a second talk about the team?

Ejaaz:
So we've mentioned just now how they've all come from China and how China's

Ejaaz:
like really advancing open source AI models, and they've completely beat out

Ejaaz:
the competition in America, Mata's Lama being the obvious one.

Ejaaz:
We've got Kwen from Alibaba.

Ejaaz:
We've got Deep Seek R1. Now we have Kimi K2. The team is basically...

Ejaaz:
The AI Avengers of China, Josh. So these three co-founders all have deep AI

Ejaaz:
ML backgrounds that hail from the top American universities,

Ejaaz:
such as Carnegie Mellon.

Ejaaz:
One of them has a PhD from Carnegie Mellon in machine learning,

Ejaaz:
which is basically, for those of you who don't know, is like God-tier degree for AI.

Ejaaz:
That means you're desirable and hireable by every other AI company after you graduate.

Ejaaz:
But it's not just that. They also have credibility and degrees from the top universities in China.

Ejaaz:
Especially this one university called Tsinghua, which seemed to be the top of their field.

Ejaaz:
I looked them up on rankings for AI universities globally, and they often come

Ejaaz:
in number three or four in the top 10 AI universities. So pretty impressive from there.

Ejaaz:
But what I found really interesting, Josh, was one of the co-founders was an

Ejaaz:
expert in training AI models on low-cost optimized hardware.

Ejaaz:
And the reason why I mentioned this is it's no secret that if you want a top

Ejaaz:
frontier AI model, you need to train it on NVIDIA's GPUs.

Ejaaz:
You need to train it on NVIDIA's hardware.

Ejaaz:
NVIDIA's market cap, I think, at the end of last week, surpassed $4 trillion.

Ejaaz:
That's $4 trillion with a T. That is more than the current GDP of the entire British economy.

Josh:
Where I hail from. And the largest in the world.

Ejaaz:
And there's never been.

Josh:
A bigger company

Ejaaz:
There's never been a bigger company it it's just

Ejaaz:
insane to grab your head around and it's not without

Ejaaz:
reason they supply basically or they have a

Ejaaz:
grasp or a monopoly on the hardware that

Ejaaz:
is needed to train top models now kimmy k2

Ejaaz:
comes along casually drops a one trillion parameter model one of the largest

Ejaaz:
models ever released um and it's trained on hardware that isn't nvidia's um

Ejaaz:
and jensen huang i i need to find this clip josh but But Jensen Huang basically

Ejaaz:
was on stage, I think it was at a private conference maybe yesterday,

Ejaaz:
but he was quoted as saying 50% of the top AI researchers are Chinese and are from China.

Ejaaz:
And what he was implicitly getting at is they're a real threat now.

Ejaaz:
I think for the last decade, we've kind of been like, ah, yeah,

Ejaaz:
China's just going to copy paste everything that comes out of America's tech sector.

Ejaaz:
But when it comes to AI, we've kind of like maintained the same mindset up until

Ejaaz:
now where they're really just competing with us.

Ejaaz:
And if they have the hardware, they have the ability to research new techniques

Ejaaz:
to train these models, like DeepSeek's reinforcement learning and reasoning,

Ejaaz:
and then Kimi K2's kind of like efficient training run, which you showed earlier.

Ejaaz:
They've come to play, Josh. And I think it's worth highlighting that China has

Ejaaz:
a very strong grasp on top AI researchers in the world and models that are coming out of it.

Josh:
Where are their $100 million offers? I haven't seen any of those coming through.

Josh:
None, dude. The most impressive thing is that they do it without the resources that we have.

Josh:
Imagine if they did have access to the clusters of these like H100s that NVIDIA is making.

Josh:
I mean, that would be, would they crush us?

Josh:
And we kind of have this timeline here where we're kind of running up against

Josh:
the edge of energy that we have available to us to train these massive models.

Josh:
Whereas China does not have that constraint. They have significantly more energy to power these.

Josh:
So in the event, the inevitable event that they do get the chips and they are

Josh:
able to train at the scale that we are, I'm not sure we're able to continue

Josh:
our rate of acceleration in terms of hardware manufacturing,

Josh:
large training as fast as they will.

Josh:
And they already have done the hard work on the software efficiency side.

Josh:
They've cranked out every single efficiency because they are doing it on constrained hardware.

Josh:
So it's going to create this really interesting effect where they're coming

Josh:
at it from the like ingenuity software approach we're coming at it from the

Josh:
brute force throw a lot of compute added approach and we'll see where both both

Josh:
sides end up um but it's clear that china is still behind because they are the

Josh:
ones open sourcing the models and we know at this point now if you're open sourcing

Josh:
your model you're doing it because you're behind

Ejaaz:
Yeah yeah i mean one thing

Ejaaz:
that did surprise me josh was that they released a one

Ejaaz:
trillion parameter open source model i i didn't

Ejaaz:
expect them to catch up that quickly um like one

Ejaaz:
trillion is a lot um yeah another thing

Ejaaz:
i was thinking about is china has dominated

Ejaaz:
hardware for so long now so it wouldn't

Ejaaz:
really surprise me if like i don't know a

Ejaaz:
couple years from now they're producing better models

Ejaaz:
at specific things basically because they have better

Ejaaz:
hardware than america than the west um but

Ejaaz:
where i think the west will continue to dominate

Ejaaz:
is at the application layer and i don't

Ejaaz:
know if i was a betting man i would say that most of the money is eventually going

Ejaaz:
to be made on the application side of things i think grok

Ejaaz:
4 is starting to um kind of show that

Ejaaz:
with all these different kinds of novel features that they're releasing i i

Ejaaz:
don't know if you've seen some of the games that are being produced from grok

Ejaaz:
4 josh but it is ultimately insane and i haven't seen any similar examples come

Ejaaz:
out of uh asia from any of their ai models even when they have access to american

Ejaaz:
models So I still think America dominates at the app layer.

Ejaaz:
But Josh, I just came across this tweet, which you reminded me of earlier.

Ejaaz:
Tell me about OpenAI's strategy to open source model, because I got this tweet

Ejaaz:
pulled up from Sam Altman, which is kind of hilarious.

Josh:
Yeah. All right. So this week, if you remember from our episode last week,

Josh:
we were excited about talking about OpenAI's new open source model.

Josh:
OpenAI, open source model, all checks out. This was going to be the big week.

Josh:
They released their new flagship open source. Well, conveniently,

Josh:
I think the same day as K2 launched, later in the day, or perhaps the very next morning.

Josh:
Sam Altman posted a tweet. He says, Hey, we plan to launch our open weights model next week.

Josh:
We are delaying it. We need time to run additional safety tests and review high-risk

Josh:
areas. We are not yet sure how long it will take us.

Josh:
While we trust the community will build great things with this model,

Josh:
once weights are out, they can't be pulled back. This is new for us and we want to get it right.

Josh:
Sorry to be the bearer of bad news. We are working super hard.

Josh:
So there's a few points of speculation. The first, obviously,

Josh:
being, did you just get your ass handed to you and now you are going back to

Josh:
reevaluate before you push out a remodel?

Josh:
So that's one possible thing where they saw K2. They were like,

Josh:
oh, boy, this is pretty sweet.

Josh:
This is our first open source model. We probably don't want to be lower than them.

Josh:
And there is this second point of speculation, which, Ejaz, you mentioned to

Josh:
me a little earlier today, where maybe something went wrong with the training one.

Josh:
And it's not quite that they're getting beat up by a Chinese company.

Josh:
Is that like they actually made a mistake on their own accord and can you explain

Josh:
to me specifically what that might be what the speculation is at least yeah

Ejaaz:
Well i'll keep it short i think it was a little racist under

Ejaaz:
the hood and i i can't find the tweet but basically

Ejaaz:
one of these um ai researchers slash

Ejaaz:
product builders on x got access to

Ejaaz:
the model supposedly according to him and he tested it

Ejaaz:
out uh in the background and he said yeah it's it's

Ejaaz:
not really an intelligence thing it's just worse than

Ejaaz:
what uh you'd expect from an alignment and uh consumer facing approach it was

Ejaaz:
it was ill-mannered it was saying some pretty wild shit kind of the stuff that

Ejaaz:
you'd expect coming out of 4chan um and so sam altman decided to delay whilst

Ejaaz:
they kind of like figured out why um it was kind of acting out.

Josh:
Got it okay so we'll leave

Josh:
that speculation where it is there's a there's a funny post

Josh:
that i'll actually share with you if you want to throw it up which was actually from elon

Josh:
and we'll abbreviate but it was like elon was basically saying um

Josh:
it's hard to avoid the the libtard slash

Josh:
mecha hitler like approach both of them

Josh:
because they're on so polar opposite ends of the spectrum and he said he spent

Josh:
several hours trying to solve this problem with the system prompt but there's

Josh:
too much garbage coming in at the foundation model level so basically i mean

Josh:
what happens with these models is you train them based on all the human knowledge

Josh:
that exists right so everything that we've believed all the ideas that we've

Josh:
shared it's been fed into these models.

Josh:
And what happens is you can try to adjust how they interpret this data through

Josh:
the system prompt, which is basically an instruction that every single query

Josh:
gets passed through, but at some point is reliant on this swath of human data that is just

Josh:
It's too overbearing. And that's kind of what Elon shared.

Josh:
And the difference between OpenAI and Grok is that Grok will just ship the crazy

Josh:
update. And that's what they did. And they caught a lot of backlash from it.

Josh:
But what I find interesting and what I'm sure OpenAI will probably follow is

Josh:
this last paragraph where he says, our V7 foundation model should be much better.

Josh:
And we're being far more selective about training data rather than just training on the entire internet.

Josh:
So what they're planning to do to solve this problem, which is what I assume

Josh:
OpenAI probably ran into in the case that the AI training model kind of went

Josh:
off the rails and it started saying bad things about lots of people is that

Josh:
you kind of have to rebuild the foundation model with new sets of data.

Josh:
And in the case of Grok, I know one of the intentions for v7 is actually to

Josh:
generate its own database of data based on synthetic data from their models.

Josh:
And I'm assuming OpenAO will probably have to do this too if they want to calibrate.

Josh:
A lot of times people call that the temperature, which is the like variance

Josh:
of aggression in which a model uses.

Josh:
And I don't know, I think we're gonna start to see interesting approaches from

Josh:
that because as they get smarter, you really don't want them to necessarily

Josh:
have these evil traits as the default.

Josh:
And it's very hard to get around that when you train them on the data that they've been trained on so far.

Ejaaz:
It just goes to show how, I guess, cumbersome it is to train these models,

Ejaaz:
Josh. It's such a hard thing.

Josh:
Yeah. Yeah.

Ejaaz:
It's not something that you can just kind of like jump into the code and tweak a few things.

Ejaaz:
Most of the time you don't know what's wrong with the model or where it went

Ejaaz:
wrong. I mean, we've talked about this on a previous episode, but

Ejaaz:
So essentially, if you build out this model, right, you spend hundreds of millions

Ejaaz:
of dollars, and then you feed it a query.

Ejaaz:
So you put something in and then you wait to see what it spits out.

Ejaaz:
You don't really know what it's going to spit out. You can't predict it.

Ejaaz:
It's completely probabilistic. and so if you

Ejaaz:
release a model and it starts being a little racist or uh

Ejaaz:
you know um kind of crazy uh you

Ejaaz:
have to kind of like go back to the drawing board and you have

Ejaaz:
to analyze many different sectors of of this model

Ejaaz:
like was it the data that was poisoned or was it the way that we trained it

Ejaaz:
or maybe it was a particular model weight that we tweaked too much or whatever

Ejaaz:
that might be so i i think over time it's going to get a lot easier once we

Ejaaz:
understand how these models actually work but my god it must be so expensive

Ejaaz:
to just continually rerun and retrain these models.

Josh:
Yeah when you think about a coherent cluster of 200

Josh:
000 gpus the amount of energy the amount

Josh:
of resources just to to retrain a mistake is is huge so i think i mean the more

Josh:
we go into it the deeper we get the more it kind of makes sense paying so much

Josh:
money for talent to avoid these mistakes where if you pay a hundred million

Josh:
dollars for one employee who will give you a strategic advantage to avoid having

Josh:
to do another training run, that will cost you more than $100 million.

Josh:
You've already, you're already in the profit. So you kind of start to see the

Josh:
scale, the complexity, the difficulties.

Josh:
I do not envy the challenges that some of these engineers have to face.

Josh:
Although I do envy the- I envy the salary.

Ejaaz:
I envy the salary, Josh.

Josh:
I envy the salary and I envy the adventure. Like how cool must that be trying

Josh:
to build super intelligence for the world as a human for the first time in like

Josh:
the history of everything.

Josh:
So it's gotta be pretty fun. This is where we're at now with the open source

Josh:
models closed source models k2's pretty epic i think that's a home run i think

Josh:
we've crowned a new model today um do you have any closing thoughts anything

Josh:
you want to add before we wrap up here this is pretty amazing i

Ejaaz:
Think i'm most excited uh for the episode that we're probably going to release

Ejaaz:
a week from now josh when we've seen what people have built with this open source

Ejaaz:
model that's the best part about this by the way just to remind the listener that,

Ejaaz:
anyone can take this model right now you if you're listening to this can take

Ejaaz:
this model right now run it locally at home and tweak it to your preference

Ejaaz:
now yes it's going to be you know you kind of need to know how to tweak model

Ejaaz:
weights and stuff but i think we're going to see some really cool applications

Ejaaz:
get released over the next week and i'm excited to play around with them personally.

Josh:
Yeah if you're listening to this um and you can

Josh:
run this model let us know because that means you have quite a solid uh

Josh:
rig at your home yeah i'm not sure the average person is

Josh:
going to be able to run this but that is the beauty of the open weights is that anybody

Josh:
with the capability of running this can do so they

Josh:
could tweak it how they like and now they have access to the new

Josh:
best open source model in the world which i mean just a

Josh:
couple months ago from now would have been the best model in the

Josh:
world so it's moving really quickly it's really accessible and

Josh:
i'm sure as the weeks go by i mean hopefully we'll get open ai's model open

Josh:
source model soon in the next few weeks we'll be able to cover that but until

Josh:
then just lots of stuff going on this was uh another great episode so thank

Josh:
you everyone for tuning in again for rocking with us We actually plan on making this like 20 minutes,

Josh:
but we just kind of kept tailing off into more interesting things.

Josh:
There's a lot of interesting stuff to talk about. I mean, there's really,

Josh:
you could take this in a lot of places.

Josh:
So hopefully this was interesting.

Josh:
Go check out Kimmy K2. It's really, really impressive. It's really fast.

Josh:
It's really cheap. If you're a developer, give it a try.

Josh:
And yeah, that's been another episode. We'll be back again later this week with

Josh:
another topic. and just keep on chugging along as the frontier of AMI models continues to head west.

Ejaaz:
So also we'd love to hear from you guys. So if you have any suggestions on things

Ejaaz:
that you want us to talk more about, or maybe there's like some weird model

Ejaaz:
or feature that you just don't understand and maybe we can do a job at explaining it, just message us.

Ejaaz:
Our DMs are open or respond to any of our tweets and we'll be happy to oblige.

Josh:
Yeah, let us know. If there's anything cool that we're missing,

Josh:
send it our way and we'll cover it. That'd be great.

Josh:
But yeah, we're all going on the journeys together. We're learning this as we go.

Josh:
So hopefully today was interesting. And if you did enjoy it,

Josh:
please share with friends, likes, comment, subscribe, all the great things.

Josh:
And we will see you on the next episode.

Ejaaz:
Thanks for watching. See you guys. See you.