TBPN

  • (01:18) - AI Model Whiteboard Breakdown
  • (26:23) - Mark Chen, Chief Research Officer at OpenAI, discusses the recent launch of GPT-5, emphasizing its enhanced reasoning capabilities and seamless integration of various AI models to improve user experience. He highlights the model's ability to perform complex tasks more efficiently, reducing the need for users to choose between different model versions. Chen also touches on the importance of personalization and memory in AI, aiming to make interactions more intuitive and tailored to individual users.
  • (57:52) - Greg Brockman, co-founder and president of OpenAI, discusses the evolution of the GPT series, highlighting the progression from GPT-1's foundational capabilities to GPT-5's transformative impact on software engineering. He reflects on the challenges and breakthroughs in developing these models, emphasizing the importance of scaling and infrastructure in achieving advanced AI functionalities. Brockman also touches on the broader implications of AI, including its role in enhancing human productivity and the necessity for responsible development to maximize societal benefits.
  • (01:31:55) - Sarah Friar, OpenAI's Chief Financial Officer since June 2024, previously served as CEO of Nextdoor and CFO at Square. She discusses the rapid growth of ChatGPT, now with 700 million weekly active users, the expansion of enterprise adoption to 5 million paying business users, and the importance of substantial investments in compute infrastructure to support future AI developments.
  • (01:52:13) - Dedy Kredo, Co-Founder and Chief Product Officer of Qodo (formerly CodiumAI), discusses the integration of GPT-5 into their platform to enhance code review processes. He highlights the model's improved capabilities in generating high-quality code reviews, identifying bugs before production, and ensuring enterprise code aligns with best practices. Kredo emphasizes the importance of AI agents in automating code review tasks while maintaining human oversight to verify code quality and adherence to standards.
  • (01:59:11) - Zach Lloyd, founder and CEO of Warp, discusses the significant advancements in AI models, emphasizing their enhanced capabilities and cost-effectiveness, which are particularly beneficial for individual developers and small teams. He highlights the importance of competition among model providers to drive down prices and improve quality, expressing hope for a future where multiple competitive models coexist, similar to cloud service providers. Additionally, Lloyd addresses the challenges of model deprecation, noting that for application-level stacks like Warp, transitioning to the latest models is straightforward and advantageous.
  • (02:11:15) - Riley Tomasek is a serial entrepreneur and the Founder & CEO of Charlie Labs, home of an AI-driven "autonomous TypeScript engineer" designed to accelerate code reviews and merge processes. Previously, he co-founded Flight (acquired by Figma) and launched Dexa, an AI platform that transforms podcast discovery. Riley holds a B.Sc. in Mathematics & Computer Science from the University of British Columbia and has a track record of building developer-friendly tools and interfaces.
  • (02:18:31) - Guillermo Rauch, founder and CEO of Vercel, discusses the transformative impact of AI on software development, emphasizing the shift towards "vibe coding," where natural language prompts generate code and user interfaces, making software creation more accessible. He highlights the role of AI agents in automating tasks, enabling developers to focus on higher-level management and creative processes. Rauch also explores the future of developer tools, noting the importance of integrating AI capabilities to enhance productivity and streamline workflows.
  • (02:34:28) - Eno Reyes, co-founder and CTO of Factory, discusses how their platform integrates AI agents into every stage of the software development lifecycle, including coding, code review, maintenance, incident response, and documentation. He highlights the platform's focus on large enterprises with over 1,000 engineers, addressing challenges like migrating numerous codebases to new frameworks and modernizing legacy systems. Reyes emphasizes that while AI tools can accelerate individual developers, significant productivity gains require workflow changes that incorporate agents throughout the development process.
  • (02:40:20) - Guy Gur-Ari, co-founder and Chief Scientist at Augment Code, discusses the company's AI coding assistant designed for large teams with extensive codebases, emphasizing its capabilities in question answering, development, refactoring, and migrations. He highlights the thoughtful nature of GPT-5, noting its propensity for tool calls and clarifying questions before code modifications, making it particularly effective for complex tasks. Gur-Ari also mentions Augment's focus on developing proprietary integrations and tools, aiming to enhance the agent's performance without relying solely on external model vendors.
  • (02:48:20) - Harjot Gill, CEO of CodeRabbit, discusses the significant improvements observed with GPT-5 in their AI-driven code review platform, noting a near doubling in performance compared to previous models. He emphasizes that these enhancements will be available to customers at no additional cost, reflecting the rapid evolution of AI capabilities. Gill also highlights the company's focus on monitoring real-world performance metrics, such as user conversion rates and potential issues like hallucinations, to ensure the model's effectiveness and reliability.
  • (02:52:34) - Timeline
  • (02:57:01) - Max Schwarzer, a leading researcher at OpenAI, discusses the recent launch of GPT-5, highlighting its significant advancements in coding capabilities and its potential to revolutionize user interactions by enabling the creation of personalized applications without prior coding knowledge. He emphasizes the importance of refining the post-training process to enhance the model's accuracy and reliability, particularly in reducing hallucinations and improving user engagement. Schwarzer also touches on the future trajectory of AI development, expressing optimism about the integration of reinforcement learning to extend AI's applicability beyond textual domains into real-world interactions.
  • (03:13:26) - Scott Wu, co-founder and CEO of Cognition, discusses the significant advancements in AI coding models, noting that OpenAI has caught up to Anthropic, leading to a competitive landscape. He emphasizes the importance of integrating AI agents like Devin into software engineering workflows to enhance capabilities and efficiency. Wu also highlights the evolving role of engineers, suggesting a shift from "bricklayers" to "architects" as AI tools handle more complex tasks.
  • (03:23:21) - Claire Vo, founder of ChatPRD and former Chief Product Officer at LaunchDarkly, discusses the developer-centric design of GPT-5, noting its enhanced coding capabilities but expressing concerns about its verbosity and tendency to produce lengthy outputs. She emphasizes the importance of validating new models with users, especially in business contexts where concise communication is crucial. Vo also highlights the need for AI models tailored to specific roles, such as strategists, to better serve diverse professional needs.
  • (03:33:34) - Brad Lightcap, OpenAI's Chief Operating Officer, discusses his multifaceted role, which includes responsibilities ranging from project management to sales, and emphasizes the significant improvements observed with the launch of GPT-5. He highlights the diverse applications of OpenAI's models across various industries, such as pharmaceuticals, customer support, and everyday productivity tools, underscoring the transformative impact of AI on organizational efficiency. Additionally, Lightcap addresses the importance of AI adoption strategies within enterprises, suggesting that providing employees with advanced tools can accelerate workflows and enhance individual productivity.
  • (03:50:29) - Timeline
  • (03:54:01) - Ben Hylak, an AI developer, discusses his early access to GPT-5, highlighting its enhanced one-shot capabilities and improved reasoning, particularly in complex tasks like resolving code dependencies. He notes that while GPT-5 meets his expectations, the current product infrastructure may not fully harness its potential, suggesting a need for better tools to leverage its advancements. Hylak also mentions the release of GPT-5 Nano, emphasizing its cost-effectiveness and performance, and expresses interest in upcoming AI developments, including Google's anticipated Gemini 3 and advancements in world models.

TBPN.com is made possible by: 
Ramp - https://ramp.com
Figma - https://figma.com
Vanta - https://vanta.com
Linear - https://linear.app
Eight Sleep - https://eightsleep.com/tbpn
Wander - https://wander.com/tbpn
Public - https://public.com
AdQuick - https://adquick.com
Bezel - https://getbezel.com 
Numeral - https://www.numeralhq.com
Polymarket - https://polymarket.com
Attio - https://attio.com/tbpn
Fin - https://fin.ai/tbpn
Graphite - https://graphite.dev
Restream - https://restream.io
Profound - https://tryprofound.com

Follow TBPN: 
https://TBPN.com
https://x.com/tbpn
https://open.spotify.com/show/2L6WMqY3GUPCGBD0dX6p00?si=674252d53acf4231
https://podcasts.apple.com/us/podcast/technology-brothers/id1772360235
https://www.youtube.com/@TBPNLive

What is TBPN?

Technology's daily show (formerly the Technology Brothers Podcast). Streaming live on X and YouTube from 11 - 2 PM PST Monday - Friday. Available on X, Apple, Spotify, and YouTube.

Speaker 1:

You're watching TVPN. Your background looks way different because you have a whiteboard behind you because we're breaking down the x's and o's of the g p t five launch today. G p

Speaker 2:

t five

Speaker 1:

launch from OpenAI. Really quickly, there is some other news. Firefly Aerospace stock opened at $70 in Nasdaq debut. This is the company that landed on the moon. Very cool.

Speaker 2:

Very cool.

Speaker 1:

There there are a few other stories going on, but we're gonna skip most of them because we're gonna be focusing on ChatGPT today on GPT five. We have a bunch of a bunch of guests coming on. We have a stacked lineup. We'll pull that up, but we'll break down the x's and o's of the matchup. So, of course, OpenAI here's our here's our lineup.

Speaker 1:

We have something like 15 guests today, a ton of folks from OpenAI, a ton of people that build on top of OpenAI and can comment on what's going on with ChatGPT. But of course, this battle is between OpenAI and the timeline. It's the it it's they gotta get the vibes right. It's war. It's war.

Speaker 1:

It's it's the timeline's in turmoil over whether or not this is a good model, what it means for the industry, what it means for AGI timelines. Everyone's got their take. Everyone's posting memes. There's been a ton of funny ones already. We'll take you through them, of course, but let's break down the offense today.

Speaker 1:

We have Sam Altman, founder CEO. He briefly got cut from the team in November 2023, but he's back leading the team for the 2024, 2025 seasons. He seems healthy. He's doing great today. He went on at 10AM to break down the launch of GPT five.

Speaker 1:

He has a couple of key plays in his playbook, in his arsenal. He's got a solid ground game. Lots of quick posts hitting the timeline probably in lowercase. Then he might air it out with a couple thousand word essay. We've seen him do this before.

Speaker 1:

It's a bit of a hail Mary. Maybe AG has a thousand couple thousand days away. Maybe we're in the soft singularity, but he's very strong there with the long post when he needs to be. It's up his sleeve if he needs it. Then he can also pull out the vague posting.

Speaker 1:

He was doing this last night, posted a picture of the death star. No one knows what it means. Maybe it was taking a shot at the doomers who are on the defense today. So he's also known for driving supercars. That lets him get to the office faster.

Speaker 1:

He's saving time and money. You can save time and money by going to ramp.com. Easy to use corporate cards, bill pay, and accounting in a whole lot more all in one place. And so he is, he also gave apparently, this is a rumor. He gave every OpenAI employee who's been with the company for more than two years, $1,500,000.

Speaker 1:

A lot of people say, 1,500,000.0, that's not enough for a big house in San Francisco, but it is enough for a supercar. So that's probably why he picked that number, and that's why that's what the OpenAI team will be doing with that money. They'll be buying Aston Martin Valkyries, Pagani Huayras, McLaren Sabres for Ferrari Daytona s p threes. They can get a Koenigsegg, Gemara. They could get a Singer, DLS, or Bugatti Veyron.

Speaker 1:

It would have to be used. They could also get the Bentley Bacalar. There's only

Speaker 2:

Bacalar.

Speaker 1:

There's only 12 of those ever made. It's an open top two seater Roadster. Coach built. So that's gonna run you 1,500,000.0, but that's perfect. You just got the 1,500,000.0 bonus.

Speaker 1:

So put it to work. Spend it all in one place on a car. This is financial advice. Yes. Exactly.

Speaker 1:

Then you got Greg Greg Brockman. He's joining at noon. He's he's extremely well rested. He's actually coming off a sabbatical right now. That's very exciting.

Speaker 1:

He should be injury free for the rest of the season. He cut his teeth at MIT, and then he got drafted by Stripe in 2010. Microsoft tried to do a trade deal during the 2023 chaotic trade deal trade window that opened up post Sam Altman ouster, but he stuck with the OpenAI team, and now he's president of the company. Then you got Mark Chen. He's coming on at 11:30 today.

Speaker 1:

He's the chief research officer. The rumors that he turned out a maxed out contract to head the Meta Llamas, but he's sticking with the OpenAI team. He was an MIT undergrad. He also worked at Jane Street before joining OpenAI in 2018. Then we got Sarah Fryer coming on the show at 12:30.

Speaker 1:

She's the CFO of a p of of OpenAI. It's her job to find bank accounts big enough to find to fill all the cash they're raising. It's it's a tough job. You gotta find. Okay.

Speaker 1:

This bank account, will it hold 10 figures? Will it hold 11 figures? Will it hold 12 figures?

Speaker 2:

There's a

Speaker 1:

lot of cash in this one. Exactly. Exactly. She's also gonna be defining the non GAAP metrics that will be catnip for Ben Thompson in just a few years. We're excited to talk to her about how she's measuring the success and the health of their business.

Speaker 1:

Obviously, it's not just revenue. It's not just top line, bottom line. We're gonna wanna know about queries. We're gonna be wanna know about DAUs, all those non GAAP metrics. That's where people are gonna be tracking when I p when IPO day comes, hopefully soon.

Speaker 1:

And then we also have Brad Lightcap. He's joining at 02:35. He entered the league as an investment banker. Let's give it up for the investment bankers. They don't get enough credit around here, but we love the investment bankers.

Speaker 1:

Then he got drafted by Y Combinator before joining OpenAI as CFO in 2018. Now he's the chief operating officer. And then we have Max Wasser. He's in charge of post training, fine tuning these models, getting them into the fight fighting performance to put on a display of authority on GPT five launch day. Now let's flip it over to the defense.

Speaker 1:

They're going up against the timeline. They're going up against the vibe checks. We got the doomers. The Doomers, they're led by Elie Azer Yudakowsky. Admittedly, everyone knows this.

Speaker 1:

No one debates this. The Doomers have had a terrible season. But you'd expect to see at least a few Hail Marys about GPT five creating bioweapons thrown up on the timeline today. Probably won't be bangers, probably won't get a thousand likes, but you'll be seeing them here and there mostly in the replies. We've also seen some doomers talking about, GPT five being available to every government employee.

Speaker 1:

And Elijah had some harsh words about that. Don't give the keys to Sam Altman. Don't give the keys to the government to OpenAI. He was upset about that. But in general, the doomers not putting much of a bite up today.

Speaker 1:

Then you got Claude. Interesting. Claude was caught playing for the wrong team earlier this week. Anthropic, they're on defense today, but we saw them take out OpenAI's key pinch hitter Claude. The Claude code API was playing for the OpenAI team, but they shut that down and Claude is no longer pinch hitting for OpenAI.

Speaker 1:

Then you got the Elon stands. The ground game's gonna be there. It's gonna be it's gonna be strong. The Elon stands are gonna be tracking the benchmarks relentlessly. We know XAI loves to bench max, and all the Elon stands are gonna be calling out GPT five for any any misaligned benchmarks if they fail.

Speaker 1:

Humanity's last exam, it's over. It's over. They'll also toss-up the occasional unhinged conspiracy theory. Moving on. Gemini.

Speaker 1:

The betting lines have shifted big time. People thought Gemini was out of the game. They're so back. They're up. Polymarket has Gemini at what?

Speaker 1:

75% chance of being the best model towards the end of the month. This is of course based on the LM Arena, more vibes based benchmark. But, Gemini will probably be quiet today. They usually don't try and front run press releases. They usually try and sit back, let the model speak for themselves, let the API credits work their way through the latest YC demo day batch and get the product into the hands of people.

Speaker 1:

And so expect to see a big glossy conference in a couple weeks. Demoing Gemini three should be a good rebuttal from the Geminis. Then you get the Meta Llamas. Zuck's been on a poaching spree. He's rebuilding the team during the off season.

Speaker 1:

Now he has a stacked roster, and he's ready to go duke it out. But no one knows exactly what's gonna be in the playbook. Is he gonna go consumer? Is he gonna go API? Is he gonna turn into a hyperscaler?

Speaker 1:

We don't know, but we know they got a stacked team. They got Alex Wang. They got Nat Friedman. They got Daniel Gross. They got tons and tons of other researchers.

Speaker 1:

They've been raiding every other team, completely reset the salary cap for the league. And it's been it's been an absolute clinic in terms of recruiting over there at at Lama. Then you got the final benchmark, arc AGI. This benchmark stands. GPT five couldn't get past this defense.

Speaker 1:

And arc AGI, you know, sitting there right in the end zone, just swatting him down, swatting him down all day. You think you think you you think we're superintelligence around the corner? RKGI denied. Denied. Tyler, give us the update on RKGI.

Speaker 1:

Where does everything stand? How GPT five do? Does it matter? Should we care about Arc AGI? We love the team behind them, but is it an important benchmark?

Speaker 1:

Should we be tracking it today?

Speaker 2:

Okay. So so there's

Speaker 3:

Arc AGI v one and v two. Right? Okay. On both

Speaker 1:

And v three.

Speaker 3:

V three. I see don't know if

Speaker 1:

No one's been No one's even tested v v three.

Speaker 3:

No one's even really close there.

Speaker 1:

But how we doing on v one?

Speaker 3:

V one. GPT five is at 65.7. Unfortunately, that's gonna be 1% just short of Grok 466.7. Okay. Arc AGI two

Speaker 1:

The Elon stans are gonna be going wild with that.

Speaker 3:

Arc AGI two Okay. 9.9%.

Speaker 1:

9.9%.

Speaker 3:

Rock 416%.

Speaker 1:

16%.

Speaker 3:

Absolute. Of brutal you know, arc AGI mugging.

Speaker 1:

Rough showing.

Speaker 3:

Rough people Yeah. Have have accused Grok four of being slightly bench max. Yes. You know, this is, you know They might have a team sport. But

Speaker 1:

What's the what are the pros and cons? We know the cons of benchmarking of benchmarking. You're overfitting on something that might not actually drive consumer value. It might not actually solve real world problems. It might not increase DAUs or revenue or ARR or anything that really matters.

Speaker 1:

It might not even get us closer to superintelligence. Give me the counter argument. Why is bench maxing good?

Speaker 3:

The bull case for bench

Speaker 1:

The bull case for benchmarking break Bench maxing, break it down for me.

Speaker 3:

Yeah. So I I think the idea is basically this is almost like a non AGI pilled kind of take. Right? Okay. So if you don't have a a super general intelligence

Speaker 1:

Yep.

Speaker 3:

Your ability to bench max basically proves your ability to solve some like kind of specific task. So so there's this thing about the the gas station Yeah. It's called getting spiky.

Speaker 1:

Getting spiky. Adding more spikes to the spiky intelligence.

Speaker 3:

Yeah. I think it was Rune who had this this tweet about Yep. That the gas station benchmark. Yep. Right.

Speaker 3:

I I don't care if he said something like, I I don't care about AI solving gas stations if it has the gas station benchmark. Something like that.

Speaker 1:

Yeah.

Speaker 3:

But the idea is like, if if you if the if making the gas station benchmark benchmark

Speaker 1:

Rune said, my bar for AGI is an AI that can learn to run a gas station for a year without a team of scientists collecting the gas station dataset in in capital letters.

Speaker 3:

And then my take is basically, I don't care how they got to the like I don't care how they made it run the gas station. I care how fast

Speaker 1:

That it runs it. If we can run the gas station with So AI

Speaker 3:

if you have a team who's you know you're bench maxing team that just proves that, like, if you have some task that's, like, really important that you wanna get done Yep. They can just figure it out. Yeah. So it's like RL for business. This is like the same thing, RL for law.

Speaker 3:

Yeah. All these, like

Speaker 1:

This is what

Speaker 3:

specific verticals,

Speaker 1:

if you can benchmark, you're doing well. Thinking machine. Exactly. RL for businesses. Come into your organization, understand the most the most valuable business processes out there that could potentially be RL ed against that could be turned into a benchmark.

Speaker 1:

Yeah. And then and then, you know, bench hacked because I don't care if you're hacking, you know, if I have translate this type of document to this type of document for my business. If you can do it with a 100% accuracy, I don't care that you bench hacked

Speaker 3:

it. Yeah. Exactly. Like like, benchmarks right now are not, like, economically valuable. Like, if you're if you're really that much better at MMLU Yes.

Speaker 3:

It's like, is are you producing that much value? Yes. Probably not. But if you have if you make some new benchmark that's, you know, your tax benchmark, think Anthropic just released that fairly recently.

Speaker 1:

Oh, sure. Sure. Sure.

Speaker 3:

That's like, I don't care if you bench max on that.

Speaker 1:

As long does way better. If it does the tax Because then it's gonna

Speaker 3:

it's gonna do the task.

Speaker 1:

Yeah. Yeah. Yeah. Yeah. That makes sense.

Speaker 1:

What about the what does it say that it feels like OpenAI seems capable of bench hacking? It seems like they've opted not to. Is that because bench hacking has a risk of giving you negative aura? Because if you're accused and found guilty of bench hacking, you could it often reveals that you're not building this one beautiful, you know, super intelligence to rule them all.

Speaker 3:

Yeah. I think it's also like maybe we're just looking at the wrong benchmarks. Mhmm. Like maybe there there's a bunch of like interesting benchmarks about like there's this one I really like, it's the Minecraft benchmark Yeah. Where you have to like build you like give it some castle and how how good it looks.

Speaker 3:

Or there's the one you always see about the unicorn. Yeah. That's Wait. What's that? And it's So you use this, like, math package Okay.

Speaker 3:

That does, like, rats and stuff, but you ask it to to draw a unicorn.

Speaker 1:

Oh, I've seen that.

Speaker 3:

Yeah. Those are really good because

Speaker 1:

that kind

Speaker 3:

of shows the creativity, stuff like that.

Speaker 1:

Walk us through TBPN bench Yeah. We will be benchmarking the the AIs against going forward. Have you heard about this?

Speaker 2:

Reps of 02/25?

Speaker 1:

That would be close, but it's difficult because the humanoids kinda change that and you can just use normal actuator. This is this is truly for a large language model. You feed in our dataset. We have a public dataset, a private dataset presumably at some point. But walk us through TBP and bench.

Speaker 3:

Yeah. So so I'm yet to try this on GPT5. I don't think it's out yet. Okay. Like for public use at least I don't have it.

Speaker 3:

But I can I can tell some of the questions? Right? So so the first one, I have this picture of a horse. You have to guess the breed.

Speaker 1:

Yep.

Speaker 3:

So let me see. I think why I don't wanna say it in case is listening, it is may or may not be a Caspian horse.

Speaker 1:

Okay. And it's failing right now.

Speaker 3:

O three is failing.

Speaker 1:

O three is failing.

Speaker 3:

Morrow is failing. Haven't tried every one.

Speaker 1:

We've tried yeah. Gotta try Grok and and Gemini.

Speaker 3:

Bringing it all out.

Speaker 1:

Yeah. Force identification. This this seems extremely hackable. But at the very least if we get one scientist to be to go off and collect the horse data set and then then and then bench hack it, I think we will have done our job.

Speaker 3:

Yeah. That's the first question. Yes. The second one is it's I have two pictures of the before and after of this guy and it's which peptide did you take to achieve this body transformation?

Speaker 1:

Yep. Yep. Yep.

Speaker 3:

So it fails there.

Speaker 1:

It fails there. So you have a data set of of what peptide does what to the human body? Where'd you find that?

Speaker 3:

Well, you know, Wikipedia has a lot of

Speaker 1:

this stuff. Okay. Okay. You'd think they'd be able to you'd be able to cheat this around with O three. Just reason, who is this person?

Speaker 1:

Go look up what they've said they've taken. Yeah. And then boom, you have the Well, at first

Speaker 3:

with O three, when I was prompting it I would like save the photo. Yep. But then I have the metadata or the filename

Speaker 1:

would be

Speaker 3:

like Caspian horse or something.

Speaker 1:

Yeah yeah yeah. Okay. And then the third one?

Speaker 3:

The third one I pass in an audio file of a car revving has to pick which one.

Speaker 1:

It has to pick it has to identify the car.

Speaker 3:

The car.

Speaker 1:

Yeah. From the engine note.

Speaker 3:

From the engine.

Speaker 1:

And it's not doing it currently.

Speaker 3:

It's no.

Speaker 4:

Okay. Wrong.

Speaker 1:

This is this is a good benchmark.

Speaker 2:

Real last exam.

Speaker 1:

Yes. Yeah. Exactly.

Speaker 3:

So I think those are pretty solid. I have some more obviously I don't wanna make them public in case anyone's gonna try to you know, benchmark this.

Speaker 1:

Of course. Of course. We'll

Speaker 2:

see hopefully It's funny because Yeah. I was I was mentioning the other day this this app that my dad had of like tracking the like, you just set your phone up and it just automatically detects which birds are in your backyard.

Speaker 1:

Yeah. Yeah. Yeah. I mean, this has to be extremely solvable. It's just something that it reveals the lack of like general general intelligence when when you have to go and and collect the horse data set which should just be out there or the engine note data set which should just be out there.

Speaker 1:

But but but clearly, we are in the age of going RL on the on the individual problem and we are looking at like the power law of capabilities. Knowledge retrieval is clearly a you know $12,000,000,000 a year market that consumers will pay for. That will probably grow significantly. And and then health and therapy and shopping and all the other features that PGC Mo laid out in her post. This is kind of like, you know, what will be RL ed against because those are key pockets of value in the in the consumer economy.

Speaker 1:

And the same thing will happen in the business economy. But in the b two b context, you'll probably see an individual startup building on top of an API. But even then, most of the most of the model platforms offer kind of RL as a service, fine tunes as a service, something where if you're starting to spend tens of millions of dollars, they will do some customization on top of the model. So that could be the regime for the next few years as we go into this like, you know, instead of like this centralizing AI force, there's only one company. There's actually like a Cambrian explosion of a ton of companies doing a bunch of different things.

Speaker 1:

So anyway, let's go to Signal's post. Signal's not happy with the launch. He says, okay, I've seen enough. This launch felt like attending a funeral hosted by minimalists. They're unveiling tech that should feel magical, real breakthroughs, but the whole vibe was grayscale grief.

Speaker 1:

The set design looked like if mood disorder got a Bauhaus grant. I don't know what a Bauhaus grant is exactly. Even the storytelling art chart styles, the eulogy, tributes, then closing on someone's health battles, what exactly are we are we as the audience mourning? It feels like where they're trying to get you to pre install a therapist. Potentially great products.

Speaker 1:

Sure. But the emotional tone was so damn DOA. Incredibly strange all around. I I think, a, like like, it's weird because we're in this world, and this is a question that I wanna noodle on all day is is will this be the last launch of a g p of a number GPT model? Because Yeah.

Speaker 1:

Like, you don't hear about new new versions of Google going out. You just it just got better and better and better. Same thing with Amazon when they were optimizing for, hey, it's faster. We have more on our catalog.

Speaker 2:

I take away was is that the product matters more than the model Yes. Now and probably will for potentially very long

Speaker 1:

time. And when we were watching the stream, I was cheering because they gave the feature of you can now talk to the model and get it to trigger a deep reasoning workflow, or get it to give you a quick answer in natural language. And so it's it's abstracting even further, even more of the UI into the actual text interface. And so I think in terms of like surprise and delight, and and I don't know. It's like you you everyone kind of rips the Apple thing, but Apple does a great job of being euphoric and and and happy with somewhat minor product changes.

Speaker 1:

And like, maybe that's more of where they'll go is just, hey, there's these new features and here's how these things And Apple will spend ten minutes on stage talking about like shifting an icon around and stuff. And it's like

Speaker 2:

interesting they just sort of casually mentioned that they're they're deprecating the old models.

Speaker 1:

I think it's great.

Speaker 2:

Which makes sense.

Speaker 1:

I think it's great. I don't want the model picker anymore. But you lot

Speaker 2:

of people are gonna be upset about that.

Speaker 3:

I think they're getting rid of four point five.

Speaker 1:

Oh, oh, really? Yeah. And you're and you're and you're a four point five fan?

Speaker 3:

I love four point five.

Speaker 1:

But but I I would imagine that the future is if I ask it to think really hard about the pros and the writing style, it would then do a pass with 4.5. And But it would only trigger that when it needs to. It's not gonna give you that because if I'm just asking for, hey, regurgitate a bunch of facts or write some code or put together a table of data. Like, it's not going to need to pull 4.5 off the shelf, just like it's not always going to pull Python off the shelf. It's not always gonna pull web browsing off the shelf.

Speaker 1:

And so, I I'm I'm not I'm not sure that I necessarily want 4.5 there as a selection criteria. I would like all this to be tucked behind a UI and have something that's actually cleaner and less frustrating to use. I think it'll lead to higher retention.

Speaker 3:

Yeah. For the average like normie. Yeah. That no one knows what 4.5 is like.

Speaker 1:

That's true. That's true. Anyway Chris Pikes is OpenAI and Anthropic are duking it out meanwhile consumer surplus is growing. We also have very good news. We also have the details from Mike Newpe over at ARKGI.

Speaker 1:

Full GPT five is along the v one Pareto frontier. That's cost versus performance. OpenAI said they focused on other goals like UX and reliability. Our testing supports this. Mini GPT five is super impressive accuracy for cost.

Speaker 1:

In fact, based on cost efficiency, Mini could have entered Ark Prize twenty twenty four and likely won first place. We are still verifying GPT OSS or as Rune says GP GP toss. Results soon. Nano GPT appears overfit. Performance is commodity.

Speaker 1:

And Francois Cholet is also chiming in with the Yes. Top

Speaker 2:

Production team needs the deck.

Speaker 1:

Oh, we don't have a deck today. We're just through just just riffing through the timeline the timeline tab and just pulling up some random posts. So so you're free to pull those up but also we can just read through them. Ashley Vance is saying, but but model switching was my job. Model switching is out and we are into the future.

Speaker 1:

Just talk to the model. Just talk to the model and ask it what you needed to do and it will switch for you. It'll pull the right tool for the job. Anyway, the other question that I have for the OpenAI folks today is on the nature of secrets. So in Zero to One, Teal has this concept that discovering a secret is a key to building a startup.

Speaker 1:

And it's a key insight. And I I was joking with, you know, the super intelligence or or GPT five. Could my first prompt be teach me exactly how to build GPT five? And then I go to Meta and I say, I I know how to do it. I have the prompt.

Speaker 1:

I have the I have the result. And of course the answer is no. Of course, OpenAI would never leak the most frontier capabilities into the model. But can you build a super intelligence can you call it super intelligence if it doesn't if it can't tell you how to build super intelligence?

Speaker 2:

One read on what the secret might be is that the app was the most important thing all along. And if you if you create this narrative that that super intelligence is, you know, weeks or months away and you get a bunch of people that go and try to compete on raw intelligence Yeah. Meanwhile, you build a consumer app business with billions of users. Yeah. It's like a seems like a pretty good strategy.

Speaker 2:

I guess, one quick thought is is how do you how do you rate Sam's vague posting from yesterday with the Death Star in the context of this new in the context of of of the release today?

Speaker 1:

It's a great question. There's a bunch of reads on it. One is just that like the Death Star is to some degree like Stargate and you have to Oh wait, he If this is the apocalypse, I figured at least tune in live. You have to like the the impact of this of GPT five is not one crazy super intelligent model that does everything. It's just a more user friendly higher retention lower churn consumer model that weaves its way into all aspects of daily life and improves performance and efficiency all over the place.

Speaker 1:

And so, you have to build this massive cluster to serve all of that. I don't know. What's your read on it?

Speaker 2:

I don't know. I think it just was I think it was dramatic. It was provocative. Is provocative. People didn't like it.

Speaker 2:

It

Speaker 1:

is provocative because there are many other like super mega structures that are in in sci fi history that are positive.

Speaker 2:

Or positive. Yeah.

Speaker 1:

And this is this is like

Speaker 2:

But it gets the people going.

Speaker 1:

Yeah. I don't know. Is it a metaphor for someone else that's going to attack? Is he I mean mean the the image is from the viewpoint of someone looking at the Death Star. Is he saying, he is seeing a Death Star being built on the horizon?

Speaker 1:

Is that something else? Is that another company? Another organization? Is that the government? Is that the is that is that legal?

Speaker 2:

Here's a read from Bubble Boy. Says, I am an expert on bubbles so it brings me no joy to say that the AI bubble is popping this time next year.

Speaker 1:

Mhmm.

Speaker 2:

He is updating his timelines. When you promise infinite scaling and don't produce it, the calculus changes. I don't think it will be bad for most companies but those who built their entire business model around making the best LLMs are unfortunately gonna struggle as models become more of a commodity. Again, OpenAI is a my read on is a consumer app business. Yep.

Speaker 2:

Right? They still have a big enterprise business, but but by, you know, their recent valuations are are predicated on their this incredible consumer business that they built.

Speaker 1:

Yep.

Speaker 2:

Bubble Boy says the end user doesn't care much if Claude is 5% better than GPT five. They care about cost, speed and utility especially at scale things will be going. The obvious play now is shorting Nvidia and dumping okay. Territory here, bubble But interesting, again, kind of goes back to what I was saying earlier in that if you were raising billions to make a lab, and I think the potentially anthrop, you know, we'll see what happens in the coding market, but there's some clear winners emerging. And then on the consumer side, you know, expecting a power law outcome and and it's hard to see anyone unseating Chatuchipiti there.

Speaker 1:

Completely agree. I wanna dig in more, but we have our first guest. Let's welcome him to the stream. What a day. Mark, how are doing?

Speaker 5:

Hey. Pretty good. Nice to see

Speaker 6:

you guys again.

Speaker 1:

What's happening? On the launch. Take us through it. Are you were you actually live or are you wearing the same thing and you recorded it yesterday?

Speaker 7:

I'm actually live.

Speaker 1:

Okay. Do all

Speaker 7:

these live. I don't know why, but we do.

Speaker 2:

Chris Gone.

Speaker 1:

I mean, we're big fan. We're big fans of live. I mean, it just allows you to be the most reactive to the most new information. Give us give us the core thesis that you were trying to get across. I think that there are a few narratives out there.

Speaker 1:

We we've been enjoying the one that's, you know, this is a dominant consumer product. They just made it a better consumer product, and people are going to use the product and get more value at it. I saw a bunch of things in the presentation where I was like, that's gonna make my daily usage of ChatGPT better. At the same time, we're in this we're in this world of, you know, the models that the the numbers matter and the scale matters and this and that and this and that. And and it's a it's a fine line, and it's a dance, and it's and we're in a transition phase away from benchmarks and away from talking about the size of the bubbles.

Speaker 1:

But what was your core thesis? Like, what did you wanna get across to the listener? Yeah.

Speaker 7:

I mean, fundamentally, I think from a research perspective, we've been working on reasoning models for several years now. And I think until now, you've had this really clunky interface. You have to pick, you know, g b d four o or you have to pick it with three. And for the longest time, we've known that o three gives you better answers across the board. It's just too slow.

Speaker 7:

Right? I mean, you often don't wanna just sit there and wait for the model to reason it out. So we've done a lot of work to push the speed, the performance of our reasoning models such that these can come together and work in a very seamless way. And so I think, you know, above everything, we're trying to move the world into this agentic reasoning world. We believe that's the future.

Speaker 7:

And on top of that, you know, you pointed something out, which I really resonate with. Post training is a huge part of this release. We really wanted to highlight Max Schorzer and his team who did a phenomenal job, and they've made the model just really that much more useful for consumers, for businesses. It's a monster at coding. So yeah.

Speaker 1:

On the on the speed of reasoning, you're obviously the chief research officer. Are are you more optimistic about getting speed ups there from, I don't know, algorithmic design, software optimizations, or new hardware just let Moore's law carry on or find new ASICs or we saw Cerberus posting yesterday about the incredible speed that they're getting 3,000 tokens a second on GPT OSS. And I'm wondering what levers obviously, we pull all of them. But but but what what what path of the tech tree are we should we be, like, most focused around, most tracking, and and most excited about?

Speaker 7:

Yeah. I mean, as a person who represents research, I control the things that I can't control.

Speaker 1:

Think a

Speaker 7:

lot of that focus is on algorithms. Right? Simple algorithms that are scalable, that we can pump a lot of compute into.

Speaker 1:

Mhmm.

Speaker 7:

We also do care about the hardware improvements that are stacking up. With the open source release, you see thousands of people, right, really kind of serving these models, creating really great inference stacks, and those are really great lessons for us to pull from. You know, how what's the ceiling of the speed in which we can serve these models?

Speaker 1:

What what can you tell us about the actual, like like user experience of speed? I was I I'm I've just like last week I finally got to a place where like for a lot of tasks I'm I'm firing off a four o query and an o three pro query.

Speaker 2:

I just have my have two tabs.

Speaker 1:

Yeah. I've two

Speaker 2:

The o three tab.

Speaker 1:

Exactly. Top. And and I'm wondering what user experience patterns you think can help people balance between those. Is this like just something that we're like different patterns that we're gonna learn over time or different or or are there gonna be certain problems of user experience that are purely solved just by better product design, better speed, and we don't even need to learn these. Because I remember, like, you know, when you prompted an image generator, you used to have to say like, don't know six fingers, five fingers, please, or like, don't make mistakes.

Speaker 1:

And now, you know, the models kind of have that baked in. But but but how how are you thinking about the user experience of getting the user the results in the right amount of time?

Speaker 7:

Yeah. I mean, this is one facet of why we believe so much in reasoning. It's just because all of the scaffolding you used to have to give the model, all these small hints, they go away. Right? Like, the model can examine its own outputs.

Speaker 7:

It can review them. It can be like, hey. Look. Like, I'm just counting the fingers here. Why are there seven?

Speaker 7:

And and it can kinda fix that. Right? It it does a lot of iterative generation. It does a lot of fixing things on the fly. And so we think one of the benefits of bringing reasoning to the world is really to kind of remove the need for scaffolding.

Speaker 7:

And with g p d five, right, we know how clunky that that experience is with switching between four o and o three. Actually, I mean, there's so many stories. I was just talking to someone yesterday. Right? They're like, hey.

Speaker 7:

Well, you know, I've used four o my whole life. Right? It's the frontier model. And I'm like, hey. Well, have you tried o three?

Speaker 7:

And they're like, why would I try o three? You know, three is less than four. And so you'll Yeah. Need to

Speaker 2:

get out

Speaker 7:

of that world. You know, g p d five, I think it's a one stop shop reasoning and non reasoning. And we've really tried to make it kind of just Pareto optimal.

Speaker 1:

Yeah. Yeah. It's absolutely crazy to just take a bunch of letters and smash them together and expect people to pick up on that as a name or a brand. ChatGPT, TBPN, we're both kind of in the same insane gambit. But fortunately, it's it's worked out, and I think people have have gotten over the hump.

Speaker 1:

But I

Speaker 5:

see it

Speaker 7:

most off the tongue, TV.

Speaker 1:

Yeah. Sort of. Except our friend David Senra keeps flipping the letters. A lot of people do that. But at a certain point, yeah, you do break through and ChatGPT has, but but keeping the model numbers simpler makes a ton of sense.

Speaker 1:

Talk to me about the pace of play for research to actual product. Like, a lot of Yeah. On

Speaker 2:

that note, the line between your personal philosophy on the line between research orgs and engineering product orgs.

Speaker 7:

Yeah. I mean, so our research operates on a variety of different time scales. Right? We have teams that they scope out a bunch of ideas, and then they start to kinda narrow in on the promising ideas as they get closer to a run. And then you kinda see a winnowing of ideas as you get closer to launching a a flagship model.

Speaker 7:

Right? And there's always this kind of, like, explore more exploratory to more kind of concrete and execution focused pipeline. And we're pulling on ideas across the board here. Right? There's a lot of work in architecture optimization.

Speaker 7:

Seb was on stream. He pointed out improvements in synthetic data. So there's really a lot of work that goes into creating one of these models. And, you know, it's hard to say, like, oh, this model was about this breakthrough just because right now we have this machine that's producing breakthroughs on all of these axes and even across several paradigms. Right?

Speaker 7:

So it's all that coming together that produces the experience that you guys feel.

Speaker 1:

Yeah. Can you talk to me about the legacy or future of 4.5? I remember I was talking to you and I was like, I haven't been using it a lot and you looked at me like I was crazy. You were like, oh, it's so good. And I was talking to Tyler and he was like, our our intern here and he was saying like, yeah, the people who really like understand how good it is use it.

Speaker 1:

But but I was I was wondering is there a world where that is a tool in the tool chest for GPT five in the same way that Python is or web browser is. And if if it detects that I want something with more more emotional pros or more thoughtful writing, it can do a whole bunch of research, collect a bunch of raw raw text, and then kind of do a 4.5 pass that I believe is more expensive maybe and maybe doesn't make sense for every single query, but could be a feature in the loop or a tool that is pulled into the overall product experience.

Speaker 7:

Yeah. Absolutely. Speaking of 4.5, it's also a very smart model. Right?

Speaker 1:

Yeah.

Speaker 7:

And one of our bars in creating g p d five was to make sure that on a lot of the axes we cared about

Speaker 1:

Mhmm.

Speaker 7:

That it was able to outshine 4.5. And I I think even in some of the soft ones, like creative writing, I I think that was the case. And and that's what makes us so confident with with the name. I think we're able to really rely on all of the architecture advancements, all the kind of post training advancements, all the synthetic data advancements to create a model that's better than 4.5, but much faster and much cheaper.

Speaker 1:

Yeah. It feels kind of like we're I remember wasn't the second iPhone called the iPhone three g and the number literally corresponded to a specific technology. And now, when you get the iPhone 14, it doesn't mean it's 14 megahertz or gigahertz or inches big. Like it it doesn't like the number is abstract and it speaks to a bucket of features and it feels like there's, I mean, this was the first day of kind of re educating folks on what the nomenclature means going forward. Have you talked about an annual release schedule or like or or because there's the iPhone cadence and then there's the Google cadence which was like Google search just got better every year for two decades.

Speaker 1:

I it it it feels like at a certain point you wanna just be shipping as fast as possible. How do you think about the culture of shipping updates that you know, you find something that feels like, that could make the customer more delighted or the user more delighted and we don't need to do a big training run for it. So let's get that out today and let's tell people about it. Like how are you thinking about fast iteration versus splashy announcements?

Speaker 7:

Right. So on the product research side, I think it makes a lot of sense to think about, you know, what's the cadence of release and, you know, what are the feature sets that we wanna build? And I actually think there's enough great research happening there that we don't have to worry about, oh, you know, is there gonna be a drought or a long stretch without enough features to launch? But one thing that's important for us is to be able to provide the people doing the exploratory work some buffer from that. Right?

Speaker 7:

Mhmm. It's hard to do really great exploratory research in an environment where you feel pressured to do release after release after release. And so we let

Speaker 8:

that be a little bit

Speaker 7:

of a lazier pipeline, not meaning that the work itself is lazy, but we give it space really to mature and to flourish. And, you know, once it's ready, we can trip things across across that fence. So that's kind of philosophically how we organize. We have a product research org, still very much entrenched in the research, and they care about the release cadence. Mhmm.

Speaker 7:

And they're able to draw from all of the research that's happening, you know, algorithmically and in scaling and in RL.

Speaker 1:

Yeah. Talk to me about tool use and how that's growing. I was I was kind of noodling on this idea that, you know, the I was I was thinking about the IMO and how it it it at least from the reporting, it sounded like OpenAI's model didn't use tools for that and that's an incredible achievement. But it's kind of like artificial like I don't I don't care if the model doesn't use tools. I I use everything possible and even if even if an LLM can can memorize every fact, I'm fine with an LLM looking stuff up in a traditional database.

Speaker 1:

Spinning up a spreadsheet. Like, use whatever tool you want. Just get me the correct answer. But do we have is it important to give surface to the user the variety of tools that are in the GPT five tool chest? I noticed something magical happened when I was using GPT I was using o three Pro.

Speaker 1:

I sent an image in and I asked to estimate the height of a desk and it wrote like a thousand lines of of Python image per interpreter and was like, you know, interpreting pixels. And I was like, I didn't even think to trigger Python. It did. Yeah. Yeah.

Speaker 1:

No. It was right. It was crazy. But the the really funny thing was that it was just a standard sized desk. It was just like, it could have just googled like how how tall is an average desk or something or just memorized it.

Speaker 1:

It probably was just already in the weights that it knows that a desk is like 36 inches tall. But it it did a ton of work and it still got it right. It fact checked it a bunch of different ways. But but but I've noticed that now I can I can pull different things? Make a table.

Speaker 1:

Don't make a table. Write some Python for this. Don't write some Python. And it kind of gives me the feel of like a super user to some extent. But I'm wondering how you're thinking about what is further down.

Speaker 1:

You like, you've given ChatGPT a computer, as Ben Thompson said. You've you've given kind of the core tools, the Python, REPL, the the web browser. What how are you thinking about kind of the long tail of tools that you want to bring to bear and how does that interface? I know that there's API integrations and all sorts of different surface area there, but give me some context on that.

Speaker 7:

Yeah. I mean, our reasoning models are pretty cute. Right? I mean,

Speaker 9:

I think they you know, when you look

Speaker 7:

at their behavior, right, they they know the height of the desk, but they'll still go verify it five different ways. It's all consistent, give you that median answer, and I think that's really what makes these models so powerful. And when you think about tool use generically, right, like, we want the models to use that reasoning ability to just be able to, like, zero shot a new tool. Right? It you should be able to kind of minimally get instructions about how the tool works and just be able to know how to use it.

Speaker 7:

Right? And humans do this all the time. You get a new tool, you start experimenting with it, and then you don't need too much scaffolding and you just go and go and use it and understand it. So we want our reasoning models to use their reasoning to be able to use a broad selection of tools. And, of course, there are a couple that you really do care about.

Speaker 7:

You know, in in coding, it's very important for you to be able to execute code. It's really important in personalization for you to be able to get context from your calendars and from from basically, the digital world. So I think there's a range of tools we have familiarity with. But beyond that,

Speaker 8:

we want the model to

Speaker 7:

be smart enough to just generalize and use tool zero shot.

Speaker 1:

Yeah. Talk to me more about personalization. I feel like there's a world where I feel like I'm maybe under utilizing ChatGPT as an app because I don't have it wired up to a non relational database where it can just stuff data from, you know, it already has memory and it's doing kind of roll ups and there's some sort of saving of context. But I was when we were talking to Kevin Wheel, I was I was kind of like, well, I don't really have like a GitHub repo that's active that I wanna like dump code in regularly for like my one off tasks, but for that image generation like, you know, understanding the height of the desk, it's like, well, if I'm doing that a lot, maybe I wanna have a tool built that lives in the world that my chat interface can can kind of interact with on an ongoing basis and contribute to and modify and and kind of wind up instantiating a piece of software that's like even more long lived and then every successive query is even faster. So, yeah, how do you think about about different ways to increase personalization?

Speaker 7:

Yeah. I mean, I think memory is huge. So we have we have teams surrounding memory and also personality. And when you look at memory, right, I think it's just we have so much context built up about ourselves that the model doesn't have. And our memory team's been really hard at work.

Speaker 7:

You know, there's a surface level of just gathering facts about you, but there's also stuff about just kind of thinking very deeply about who you are, what your motivations are. And even you could think about, you know, you're you're trying to do some code based tasks. Right? You're a developer. Shouldn't the model just be trying code out, you know, and and just kind of leveraging all that memory of kind of its thoughts about what you wanna do to just help you kind of be doing work all the time.

Speaker 7:

So, yeah, we do think memory is a huge part of making the model more personalized to you, And it should just makes make use of all that passive signal about you that it that it observes or all of that interaction and and just help you accomplish your goals.

Speaker 2:

Got it. Do you think it'll take for AI to start making novel discoveries? That's been a critique over the last year as everybody's so excite everybody's using these products every day in in their work and life and yet, it still feels like we're missing that. Dwarkash has talked about, you know, potentially that being around continual learning, but I'm curious what you think.

Speaker 7:

So one thing to underscore is I think the models are already phenomenally creative in certain ways. So when I looked at our performance on on contests. Right? You know, I've I've done these contests before. Sometimes you have this mental classification of these problems require more creativity or these ones require less.

Speaker 7:

And one of the big surprises for me was that the model can get some of the ones which I intuitively think require more creativity. And, you know, it often does come up with these solutions that I consider quite ad hoc and really don't pattern match to anything I've seen before. When you look at, you know, advancing science or mathematics or fields like this, one thing that construct in which humans work sometimes is there are kind of three builders. In mathematics, for for instance, there are mathematicians whose role are to kind of build out this theory and and almost to kind of create, you know, Olympiad style subproblems, which often other mathematicians who are very good at that kind of style of work can do. And I do think kind of the model will increasingly contribute on that side first.

Speaker 7:

Right? If there's some mechanical, like, hey. You know, I I really don't know how to simplify this expression. I really don't know how to, like, get get this result. It can really do that quickly for you.

Speaker 7:

We're trying to increase the envelope such that the model's getting towards that theory building side and, you know, being able to create creative hypotheses. And all of these components are very useful for what I consider the ultimate goal, which is being able to automate some of our own work and our own research.

Speaker 1:

How are you thinking about, like, the the layers of mix of of mixing? Like, I remember g p t four, I don't know if this was ever confirmed, but mixture of experts model, this is kind of, like, widely understood in the industry. Now are we in the era of like a mixture of models that have mixture of experts? Like how many mixtures are going on? How does GPT-five actually work?

Speaker 1:

Is there a taxonomy or architecture diagram that you can kind of, like, walk through to explain what GPT five is? Because it feels so much different than GPT three.

Speaker 7:

Mhmm. Yeah. I mean, one of our probably the pinnacle of our research roadmap or our path to AGI, when you look at the levels of AGI, the top level is what we describe as organizational AI. And what this means is, you know, collections of agents working together often like we might in a company towards a shared goal. Right?

Speaker 7:

And you would imagine that these agents probably subspecialize in ways maybe similar to what humans do, maybe in their own more efficient ways. And I think, you know, effectively work together to accomplish some goal. So we very much care about exploring this vision, seeing that's much more effective than one single big brain working on a problem. And I think there are reasons to think why it could be so. And and, yeah, I I think that that is one of the things that we're after.

Speaker 1:

Yeah. On that note of specialization, how are businesses working with GPT five or how do you expect them to work with GPT five in terms of coming to OpenAI and asking for special capabilities or fine tuning or, you know, any sort of RL on this particular problem in my world. I have this specific data set. It's not public, but I I want a hyper I want you to bench max on it. I want you to I want you to get a 100% on on, you know, the gas station bench or whatever.

Speaker 1:

You know, if I'm if I have a certain business and and I'm I'm willing to invest in sort of some some overfit RL because it will create immense economic value for my business or it'll solve some fundamental problem, How can how how are businesses going to be using GPT five over the next few years?

Speaker 7:

No. That's a great question. So I I think that, this is a chance to kinda highlight one of the the results that we've accomplished over the last couple weeks, which is our ACCODA results. So this is a relatively unknown programming contest, but it involves really the pinnacle of the the best coding contests contestants in the world. And what they do is, you know, they're put in a room, and they have to solve an optimization problem.

Speaker 7:

This is something that's actually very real world relevant. So you can imagine an optimization problem as something like, you know, what Uber might have. You have, let's say, riders and you have drivers, and you wanna kinda create a system where you match them as as quickly as possible, you know, with, you know, the least amount of cost for

Speaker 1:

Mhmm.

Speaker 7:

For instance. And and so we've really created a system that can solve optimization problems at the level of the best in the world. Right? And these truly are the kind of the best heuristic solvers in the in the in the world. And so we have a organization led by Alexander Madri.

Speaker 7:

He it's called strategic deployment. And what they do is for a select handful of customers who really have that, you know, beefy problem that that they need to solve to just go and provide that value. Right? And I think there's a lot we can do there. I think there's a lot of very, very valuable optimization problems in the real world.

Speaker 7:

And we're really excited to partner with with people because I think this creates a template for directly having AI provide economic value and and really catapulting certain industries forward.

Speaker 1:

On the on the research side, what what unique advantages do you think you and your team have given your position in the market with the incredible user adoption and the incredible usage from those users. It's not just DAUs, but it's actually the number of queries. Semi analysis estimated it, like, 71% of all queries going through ChatGPT. What advantages does that confer from a research perspective?

Speaker 7:

Yeah. I mean, a lot. Right? And I think, you know, it allows us to kind of deeply understand use cases. It allows us to understand the frontier of where humans are, you know, kind of finding value, where they're not finding value, which areas that we need to improve the models on.

Speaker 7:

It gives us a lot of signal into, you know, how users are deriving value, when they derive value. Yeah. And

Speaker 1:

What is that signal? Like, I I see the thumbs up, thumbs down button. Yeah. I I I'm sorry. I don't push it very often.

Speaker 1:

I'm not doing my job. I apparently but I know that you can figure out whether or not I'm satisfied. Just stop booing me, Jordy.

Speaker 2:

That's the research.

Speaker 1:

See you. Okay. Mark, I promise you for the next 100 chat GBT responses, I will I will be honest with my thumbs up, thumbs down. I love it. I love Yeah.

Speaker 9:

You do extra treatment? We have tons of people luckily who do.

Speaker 1:

Oh, that's great. Okay. So you do get a lot of thumbs up, thumbs down. And I'm sure I have done it occasionally. But I but I also imagine that there's a ton of other signal in there.

Speaker 1:

You know, with the TikTok algorithm or, you know, any social algorithm, it's very easy. Time on-site. But Mhmm. With ChatGPT, obviously, it's exciting when we hear, okay, thirty minutes a day or some rumored number of of minutes. It it feels correlated with usage.

Speaker 1:

It feels correlated with value that's being delivered. You can obviously look at churn metrics and all that stuff. But what other what other pockets of signal are you finding? Are you finding people just I mean, I I remember the story about Google where they were trying to figure out how to handle like misspellings and create the the definitive database. Do you know the story?

Speaker 1:

The the Where they were trying to develop the definitive database of how to spell things, and they were like taking a bunch of shots at it, and they figured out that the the best most rich source of data was just if you type in financial into Google and you misspell it, oftentimes, then you will just correct it yourself and the second query you send will be will be spelled correctly so that you can just look at two similar queries. What's the second one? That's the correct that that's the correct spelling. Yeah. What other pockets of signal are you finding that are translating into the research environment?

Speaker 1:

What are you excited to go deeper on?

Speaker 7:

Yeah. So I'd love to first talk about the DAU signal because Okay. I think, you know, that's something that a lot of companies track, but we find actually a lot of danger in tracking it too closely. And one of the recent blog posts we pushed out was one on sycophancy. Right?

Speaker 1:

If you

Speaker 7:

just, know, hey, we're gonna boost responses where users say thumbs up. Yeah. You know, it creates a a condition for a model to

Speaker 2:

I just wanna say, Mark, I love everything you're doing on this front.

Speaker 1:

Yeah. Entire interview has just been fantastic. You are the best. We'd love to have you back on the show tomorrow. You're Yeah.

Speaker 1:

But clearly problems with that.

Speaker 7:

Yeah. Yeah. Clear clear problems. Right? The the model just starts kind of sucking up to you.

Speaker 7:

Totally. Saying like, hey. You know, you're right. And even in complicated situations where I think objectively, you know, collectively, we'd be like, hey. This person's in the wrong.

Speaker 7:

The model starts saying, hey. You know, you're right. You know, the other person's gaslighting you. You know, this other person's kind of and and that's deal with people deal with this

Speaker 2:

in in the real world. They'll go to a friend. They'll tell them about a situation, and the friend will give them advice. But maybe it's not the entire it's not the fullness of the situation. Right?

Speaker 2:

Maybe they left out some key facts, and the friend is like, oh, yeah. That other person

Speaker 1:

Is wrong.

Speaker 9:

Definitely isn't the wrong,

Speaker 2:

and they, like, skipped over some important details.

Speaker 7:

And Yeah. No. No. Exactly. Exactly.

Speaker 7:

And we don't want our models to fall into this trap where it's just trying to get you to like, you like what it says. Yeah. And and so, you know, we wrote back a lot of changes that produce that kind of behavior. And, really, the way I think about daily active users today is we need to be opinionated about the features that we build into the future. I think we have we have a lot of ideas here, but we have to let that drive.

Speaker 7:

You know, build for the future, build for the things that people you think they'll want and maybe don't want necessarily know they want necessarily today. And then use the AUS kind of this byproduct. Right? A way to track that you're on the right right right track here. So yeah.

Speaker 7:

I mean, we we wanna be careful here. We don't wanna fall into these traps of, like, you know, three, four years from now that this turns into kinda engagement bait or something

Speaker 1:

like that.

Speaker 2:

Yeah. Was it how how much time have has the research team been focused on efficiency specifically? It felt like summer was a a good window before kids come back to school and start maxing out queries. Good time to increase efficiency and and I know the cost of GPT five

Speaker 1:

have Every time there's a new model, I'm like, this is the best it could ever be. It's good enough. Bake it on an ASIC. I just want it for free and I want it like in milliseconds. But but that's just me being, you know, grumpy, guess.

Speaker 7:

We we we've done a lot of work. We we've been building out our teams. We focus a lot on scaling. I think Grant's gonna come come on a little bit later. Yeah.

Speaker 7:

Yeah. He's been spearheading a lot of that work. So yeah. No. Honestly, it's become a bigger and bigger focus for us, especially in the last couple of months.

Speaker 1:

On on the I mean, is somewhat related to the sick of NC thing, but I'm interested to know, like, what do you think is driving, like, the GPT tone? You know how, like, the em dash is a thing and then the the it's not a newspaper, it's a way of life. And it's like the the there's these like little like flourishes like that that come through in in kind of a tell that it was written. And in a lot of ways I love it because when I get a deep research report, I like that it's using the same Wikipedia style tone. Like I want consistency there.

Speaker 1:

I don't want it to be like, this today it looks like it's a Vice News article and today, tomorrow it looks like it's written by someone at Buzzfeed. I like that it's consistent in many ways. But but why is that happening? Do do you think that bigger models like 4.5 kind of were able to solve that? Or do do those kind of like local minima, like, I don't know, like wells happen even in bigger models?

Speaker 1:

Is there anything from a research perspective that can that can stop GPT having its own voice, or is it fine that it has its own voice?

Speaker 7:

Yeah. That's a really great question. And I think, you know, as you scale up models, as the models become more intelligent, they kind of have a just deeper in the understanding of tone. Right? And so you expect that to improve just naturally as you make the models more powerful, bigger, better reasoners.

Speaker 7:

But one thing that I think gets lost a lot is each individual company has a lot of impact in terms of how they shape the default tone. Mhmm. And, you know, we publish a document called the spec. It kinda lays out how we expect the model to sound in certain cases, lays out a lot of examples for that. And I think we use the spec in many ways.

Speaker 7:

Right? We have people come in and see, hey. Is was this thing generated in accordance with what we would hope to generate from from our spec? And this is a living document. Right?

Speaker 7:

It evolves over time. And so I think, you know, each company kinda has a very opinionated take on what they think the model should sound like. And it's not an accident that the model sound a certain way. I I don't think just naturally, every company is gonna train the same kind of voice into their model.

Speaker 2:

Totally.

Speaker 1:

Well, thank you so much for hopping on. Congratulations on the big launch. We'd love to have you back soon to talk more. We could go in a million different directions, but we'll let you get back to it. Know it's a big day.

Speaker 1:

So have a great rest of your day, Mark. Thank you so much. Congrats on the launch.

Speaker 7:

It was a great conversation.

Speaker 1:

Talk to you soon.

Speaker 2:

Cheers, Mark.

Speaker 1:

And we will tell you about Restream one livestream 30 plus destinations, multi stream and reach your audience wherever they are. This stream is made possible by Restream. OpenAI just did a livestream. With Restream. If you're trying to do a if you're trying to do a stream, you gotta get on Restream.

Speaker 1:

It's everywhere. And we will bring in our next guest, Greg Brockman, the president of OpenAI and the we'll bring him in. Greg, how are you doing?

Speaker 10:

Doing great.

Speaker 1:

Thank you for Welcome, Greg. How are you feeling? How's the company feeling? It's been such a wild journey. Just take me through a little bit of the the, like, the the vibes in the company and and how you got here to today.

Speaker 10:

Well, I'm excited. The whole company is excited. And, honestly, I'm just so proud of the team. Like, it's just been amazing to watch people come together not just for this launch. And, you know, the funny thing is behind the scenes that people are always putting on the last minute adjustments and polish and scaling up the capacity, and there's always something that goes wrong before launch day.

Speaker 10:

And so there's a lot of people who, you know, worked late into the night or really crunched to bring this release to the world. And, you know, it's a little bit like the duck that's, you know, know Paddling.

Speaker 1:

Running its

Speaker 10:

place under the water.

Speaker 1:

Yeah. Oh, yeah.

Speaker 10:

But that also describes the whole OpenAI history, right? Sure. Is that I think that we have put in many years' worth of investment to the techniques used to produce this model, and really, it's across just every function within OpenAI that has come together to make this a reality.

Speaker 1:

Yeah. I mean, you've been there for every GPT release. How do you think about summing up each iteration in kind of like one one line? Because GPT one, GPT two, GPT three, these feel like like similar architectures, at least at least history has kind of compressed them into similar architectures. But how do you think about the progression of just the big numbered releases?

Speaker 10:

Yeah. It's interesting because in some ways, it's a punctuated equilibrium, but on the inside, it looks very smooth. Right? Even before the GPT series formally began, the first result that really sort of set this path to be something that we were heading down and that was clear that we were going to pursue it was the unsupervised sentiment neuron, which was an LSTM in, like, 02/2017, so a different architecture from today's transformers. And it was the first time that you could train a model to predict the next element.

Speaker 10:

So we predicted the next character on Amazon reviews, and we were able to get semantics out. Right? Because you expect, okay, yeah, it's gonna learn where the commas go, what maybe what nouns and verbs are, but the idea there was to learn a state of the art sentiment analysis classifier. That was mind blowing. And so I remember seeing that result in 2017.

Speaker 10:

It's like, we have to scale this up. We have to see where it goes. And so GPT-one was, I think a good sign sign of life of you train on on sort of all the public data you can get, and you use transformer, and that you were able to get state of the art on various downstream benchmarks. Right? So you have a model, it clearly learns some representation, something useful about the data that it was shown, and it's applicable.

Speaker 10:

You can use it for various tasks. But we didn't really think very hard about the generation side. GPT two was the first time that we were like, alright. Let's actually like, the samples we're getting from it, the things it actually generates, they're kinda cool. And I remember reading the in the GPT two blog post, we have this unicorn story where it generates some fictional story about a herd of unicorns, and it was just so cool.

Speaker 10:

It was like, wow. It, like, wrote a story that's actually kind of interesting. It doesn't totally make sense, but, like, there's something here. There's some real spark of intelligence within this model. GPT three was the first time that we had a model that was actually something people would it was just barely above threshold for something people would want to use.

Speaker 10:

And I remember working on the GPT three API. This was our first real product, and it was actually the hardest product the hardest project in total I've ever worked on because it just felt like maybe no one wants to use this model. We don't really know what it's useful for. And it certainly was the case that GBD three was a great demo machine. You can make really awesome just like tweets and, you know, cool little little apps, and it would give you quick answers, but it didn't feel very reliable.

Speaker 10:

And then g p d four was something that actually felt like it had true real world utility. It was above some threshold. It was something that was helpful for health. It was something that was helpful for you know, starting to be good at coding. And g p d five, I think, just sets a whole new standard for the reliability, for the utility.

Speaker 10:

Things like coding, I think, are just, like, clearly, you know, we're already on this trajectory of transforming software engineering this year, I think we're really on trajectory now to be revolutionized. So just really exciting to see that that whole arc.

Speaker 2:

Yeah. When did when did did the API opportunity, like, really click for you? Because I do remember companies in that era that, like, quickly unlocked the power of of the API and and and grew tremendously. When did that opportunity click? Because you said initially that you're you kind of had some, yeah, don't know, concerns, kind of doubts, how useful was it gonna be?

Speaker 2:

And then when did the consumer opportunity click?

Speaker 10:

Well, we, in 2019, 2019, had GPT-three. We knew we needed to build a product to be able to actually continue the mission, to be able to raise capital. But what did we want to build? Right? We're really here because we believe in AGI that's going to have this powerful, positive, transformative effect on society and we want to be part of it.

Speaker 10:

And so we thought, well, maybe we could build something in health, and then you realize, okay, well, we're going to sell to hospitals and we're going to maybe hire

Speaker 2:

We'll let other people do that.

Speaker 10:

Exactly, right? It's just like you have to go into one domain, and that means giving up on the g, the general. Right? It's like it feels like you're going to become

Speaker 1:

That's interesting.

Speaker 10:

A one particular thing, but we kinda wanna be supporting all industries at once. And so the idea was, let's build an API and let people figure it out. But this is totally not the way you're supposed to build a startup. Right? You're supposed to have a problem.

Speaker 10:

No one cares about the technology behind it. Add value to that problem. Focus on just that one thing. And so that's why that project was so hard. And in, you know, January 2020, February 2020, I that, you know, I, with the team, were going around trying to just find anyone that would be willing to try this API.

Speaker 10:

And we were driving to different offices in in San Francisco being like, hey. We have this cool model, and it was hard enough to get people to take the meeting, much less to sign up their company for it. It was actually very fortunate. We found it we found a couple of good partners, and it was fortunate that that happened then because March 2020, suddenly, that was COVID. We weren't driving around to people's offices to try to beg them to use this, you know, this this budding new technology.

Speaker 10:

So it's really six months worth of grind, right, of really trying to turn like, when we when we started with g p d three, I remember it was, you know, that the inference code was not very well optimized. It was, like, don't know, a 150 or maybe 250 per token or something, and we just optimized optimized, got it down to, like, fifty milliseconds per token, which, by the way, today's models run much faster than that, which is kind of amazing for me just, like, seeing how how much fast we're able to run them with much greater intelligence. And I remember setting two goals for the team. One was I actually find one customer who's willing to pay, so literally get a dollar in for this thing. And the second is get a use case that we use at OpenAI every day.

Speaker 10:

That first one happened within the first couple months. So actually, that moment, was like, alright. Like, this thing is probably gonna work. But in order to get there, we had to do a bunch of, you know, just scaling the API and and really, you know, doing doing the product work. But that second one took much longer.

Speaker 10:

Right? And that wasn't really until ChatGPT. And so if you fast forward a couple years, because this was, you know, mid twenty twenty when we when we first got that the API into the world, ChatGPT, we didn't release until November 2022. So you're talking, like, a decent a decent period of of two years there, a little bit longer. And I remember we were building you know, people have talked about we were going to call it maybe chat with g p d 3.5.

Speaker 10:

We had a a sort of precursor product called WebGPT that was built on 3.5 that we were literally paying contractors to use. Right? So this was all throughout 2022. We basically had the ChatGPT precursor that we had to pay people. They would not pay us.

Speaker 10:

We had to pay them to use this thing.

Speaker 1:

That's wild.

Speaker 10:

The moment for me that really clicked was actually when we finished training GPT-four. So that was 08/08/2022, which actually is, like, three years ago now. It's actually pretty wild to realize that, almost to the day. And we did the initial post train of g p d four, and, honestly, I had a bunch of bugs in there. It was, like, broken for a bunch of different reasons.

Speaker 10:

But the model was, like, extremely creative. It was actually really interesting. It took us, like, about a year and a half to get to the point that the creative writing of our models matched that initial one that was buggy for various reasons. And I remember, you know, we had an instruction following dataset that was post trained on. So it's really we had collected examples of here's a human asking for a thing.

Speaker 10:

Here's what the model should do. So it's really not trained to do multi turn. So I asked her a question. It gave a response. But then I was like, well, what if we just ask another question?

Speaker 10:

And it actually was able to leverage that full context. It actually was able to have a coherent chat. And the moment that we saw that, they were like, Okay, this thing is capable not just of being post trained to do this very specific thing, but it can generalize. It can kind of do the intelligent thing even though it wasn't directly trained for it. It was just so clear this was going to be the killer killer application.

Speaker 10:

And so then we were planning on launching g p d four in, you know, early twenty twenty three, and we had this chat infrastructure we've been working on, and it's so clear. Okay. Like, we're going to have to release the infrastructure and the model, and it's going to be this this amazing killer product. And so just almost as infrastructure ahead of getting the the real thing out, I, you know, I was excited for us to do ChatGPT, and that's why we did, you know, the and and see that come to life in November. So I think that for me, I I was really focused on GPT four as the model.

Speaker 10:

This is going to be the chat moment that's really gonna And kind of had missed the fact, because every time you see these new models, you just sort of see only flaws in the previous ones, and so missed the fact that GPT-three 0.5 was something that no one had really tried before in the broad sense of society, and that it was something that was already useful and that people would respond to.

Speaker 1:

Was GPT three kind of like the main pivot point for shifting the company towards LLMs? Because in the in the prehistory of OpenAI, there were a lot of other maybe expensive training runs. I I don't know how much I don't know how much financial risk was taken with like the the OpenAI five project or the robotics projects, but it feels like at a certain point, the the chat became like the main financial risk vector. So I guess the question is like when it feels like GPT-three was the moment when you shifted. I'm also interested in hearing about Ben Thompson called OpenAI the accidental consumer company.

Speaker 1:

And I'm wondering when that narrative set in for you. Like, what when when did it become clear that this was going to be a really, really powerful consumer application?

Speaker 2:

Yeah. Going from paying people to use your product to people saying, hey, we wanna give you money for this. Yeah.

Speaker 10:

Yeah. A very important transition, it turns out. Yeah. So yeah, it's great question. I would say that if you rewind to the beginning of OpenAI, there's many people who thought that, in retrospect, say that we set out to prove that scale is how you make progress in this field.

Speaker 10:

But it's almost the other way around. Scale was the thing that worked, right? That we tried a bunch of things that didn't pan out. And it really the first time we saw this concretely was in our DOTA project. I I remember my collaborators, Jakob and Shimon, trained the very first little agent on, like, 16 cores or something and left it running on their desktop, over the weekend.

Speaker 10:

And we came back, it was this, like, very, you know, sort of constrained mini environment, but the model was doing something smart. It was actually able to solve this kiting environment, and that was pretty cool. And then they and the team just kept scaling up. We had all these free cores that were just sitting idle on AWS at the time, they just kept throwing more compute at it. And every time they would do that, the model would just get better.

Speaker 10:

And so when you look at something like that, you're like, well, you just have to see where this goes. You have to push it until it hits the wall. And our goal with DOTA was actually to develop new reinforcement learning algorithms, because the common wisdom at the time was, well, the existing reinforcement learning PPO doesn't scale everyone knows that. But the question from Yaakap Shimon was, well, why do we believe that? Has anyone actually tested it?

Speaker 10:

And no one had really tested it. And so I think that that ethos of saying you have to push the existing techniques to the wall until they break, and then once they break, you actually have a baseline to overcome, and you win either way, right? Either it just exceeds all the humans in terms of the specific capability that you're trying to exercise, which was the case for DOTA, or it hits a wall, and now you have a real problem to solve. And so I think that ethos really got embedded in our DNA. And at the same time, I think that we were really thinking about how do we get to AGI?

Speaker 10:

And really, Ilya and I spent a lot of time thinking about that question of where is this company going and how do we actually achieve it? And you start to do some math in terms of the kind of compute that it would take to get to AGI, and you just start to realize you're going have to build really big computers,

Speaker 1:

and those are

Speaker 10:

extremely expensive. And so I think that from the early foundational results and thinking, we kind of realized the path that we're going to have to walk.

Speaker 1:

So it seems like there's been a few walls that we've scaled up through and then maybe hit them. There's been talk of, like, a pretraining wall. Now we're putting tons of resources and compute towards reinforcement learning. Is there a third is there a third scaling curve that we're going to be talking about in the next few years? Are we continuing to scale up those two primary vectors?

Speaker 1:

Is that too high level of an abstraction in terms of, like, how we should be thinking about just progress along the the vector of scale? Like, give me the up to date thinking on just the the fruits of scale.

Speaker 10:

Yeah. I'd say fundamentally, deep learning, I think that, you know, people talk about the bitter lesson. Yeah. It's almost this exploration into how do you convert compute into intelligence, right, through a you we have some particular techniques to do that that we're kind of constantly fleshing out. And the thing that's really amazing is if you rewind to, I don't know, even the 1940s for the McCullough Pitts neuron, which is kind of the precursor to neural nets, if you look at that paper, they have all these diagrams that actually look very similar to, like, the kinds of diagrams we draw now of multilayer neural nets and things like that.

Speaker 10:

Like, the basic idea of what we're trying to do has not really changed in almost, like, eighty plus years, which is just a wild fact. It means there's something deeply fundamental about the thing that we are pursuing, and that idea itself, I think, kind of came from trying to model the information processing of the brain. And it's imperfect and not an exact analogy to biology and all of these reasons that it should fail or that people have said this thing is doomed, but the results are undeniable at this point. I mean, some people try, but it's really hard to to kinda close your eyes and and sleep on this in my mind. And it's very interesting if you look at you can find quotes from the mid nineteen sixties of people trying to poo poo the whole direction saying that these neural net people have no new ideas, they just want to build bigger computers.

Speaker 10:

And you can basically say something very similar today. What we're trying to do one moment.

Speaker 2:

A little water break. Second.

Speaker 10:

Yeah. Exactly.

Speaker 2:

For all of us. Cheers.

Speaker 11:

Exactly. Cheers.

Speaker 10:

I'll tell you, we're all human.

Speaker 1:

A proof of humanity right there.

Speaker 10:

Exactly. So what we're all trying to do is find ways of taking compute and really harnessing it. And sometimes you hit a wall, but these walls tend to be ones that you can drill through. What we've found is every time you scale up, everything, all of your engineering, all of your sort of scale and variance, all these things, they get stressed to the next level. It's almost that the tolerances become tighter and tighter.

Speaker 10:

It's like launching a 10 x bigger rocket means you need to be, like, a 100 x, just more precise on everything, but it doesn't mean that the fundamentals of the science are different. So pretraining, there's definitely been a lot of discussion of data wall. Doesn't mean it's fundamental. Right? It just means that we need to be better and more precise at what we're doing.

Speaker 10:

There's RL, which has been something that has kind of come from spending a small amount of compute to much larger amounts of compute now. And then there is a third way that we're really harnessing compute, which is compute at test time. And we publish some scaling laws around this, and all three of these things multiply. Like, that's the amazing thing.

Speaker 1:

Mhmm.

Speaker 10:

And, of course, the compute and the harnessing of it is the fundamental goal, but you get these multiplicative effects out of all of it through the quality of your engineering implementation, through the quality of the data sets, through a bunch of the refining work that you do. And there's lots of different techniques and ideas, and that's what makes this field so rich and why progress is just going to continue apace.

Speaker 2:

What about on the infrastructure side? You guys have been busy scaling up. What what can you share on that front?

Speaker 10:

Well, so so I run I run a team called scaling at at OpenAI, and we really focus on building the infrastructure for scaling. And that this is in partnership with really everyone across the company. It's almost a misnomer that our our team is called Scalen because, fundamentally, this this whole team and effort is about scale. But what we really try to do is to both, on the physical infrastructure side, deliver as much compute as humanly possible, and that is in partnership with, you know, companies like Oracle, SoftBank, and others that we've been able to deliver just, like, increasing amounts of compute to OpenAI. But we're constantly thinking about how do we just deliver more FLOPs and do it more efficiently, earlier, cheaper, more power efficient, all of those kinds of questions.

Speaker 10:

There's the software infrastructure side as well and really thinking about how do you coordinate massive numbers of GPUs in order to work across one synchronous training run. How do you coordinate that for reinforcement learning? How do you deploy that into production and bring these models to life at massive scale? And I think that every single layer of the stack, there is innovation required, and that's something that's very easy to miss. Like, one way I think about research is that there is and this is kind of the view from Jakob, who's now our chief scientist that there's a research stack, and you can kind of think at the top of it is people running experiments and coming up with new ideas for how utilize data or something like that.

Speaker 10:

There's a middle of the research stack of people thinking about how do you take these different ways people are running experiments and be able to train in novel ways and put together the pieces differently. And then there's a bottom of the research stack, is like writing CUDA kernels to get the absolute max out of the GPUs. And at every single layer here, you get a multiplicative factor through innovation. So it all comes together as one big hole.

Speaker 1:

On scaling, I'm interested to hear about just if we think about like the impact of AGI or the impact of AI just being some sort of maybe you know, quantitative GDP metric or qualitative just impact and good. Is there an important factor of scale with just not even the flops that are going into the models, into the pretraining, into the RL, into the test time inference, but actually just the flops that are going into the usage of AI within humanity broadly. And I feel like that might maybe be the next, like, scaling curve that we're seeing as more people use models. They see improvements all over the fact. Like, is is that something that we should be tracking to see kind of the the instead of these, like, s curves, we wanna see, like, the continual exponential?

Speaker 10:

I think that's a great perspective. Right? Because at the end of the day, I mean, if you look at kind of the shift from something like DOTA, which we pursued in order to you know, we wanted to do new algorithm development, but, really, it almost validated how we scale up existing algorithms. But there was no illusion of delivering direct economic benefit from it. Right, to the current models where we are still we're starting starting to end the era of, like, pushing on these academic benchmarks.

Speaker 10:

Right? You look at things like the IMO at this point. Mhmm. Models are able to get gold medal on it. Like, these the hardest academic benchmarks that are available are sort of no longer a you sort of the the guiding the the guiding light of progress for these models.

Speaker 10:

To where we actually want to be is for AI to be helping everyone, right, to be something that uplifts humanity. And that's the final metric, right? It's how much does it actually benefit everyone? How much value does it bring to the world?

Speaker 1:

Yeah, not just health bench, it's actually how many people did you solve their healthcare problem, right?

Speaker 10:

Exactly, yes, yes, yes. And that's the actual goal. Yeah. And that's what's exciting, right? It's like we're moving from the lab to reality.

Speaker 1:

Yeah.

Speaker 10:

And I remember in the early days, as we were thinking about how do we measure our progress towards AGI, we always sort of dreamed that one day we would be able to measure it this way. And you can think of revenue maybe as a proxy metric for

Speaker 1:

Sure.

Speaker 10:

Value delivered to the world. It's not perfect, but it's at least something. Right? You can think of the distribution of, like, how much compute goes into it, how many people are using it. But fundamentally, like, what we're after is how much do we really uplift humanity through this technology.

Speaker 1:

Yeah. Mean, I might be misreading it, but I'm pretty sure, like, that was the Kurzwely and Kurz Ray Kurzweil philosophy was that, like, total number of flops getting getting immense, not necessarily all in one data center for one model. It was that it was that compute broadly would be so wide.

Speaker 10:

Yes. Yes. And I remember, like, on on on that chart, right, you can see, you know, total compute of all human brains

Speaker 1:

Yeah. Which

Speaker 10:

really suggest a particular vision of how the this technology will be rolled out.

Speaker 1:

Yeah. Distributed the phones count as as an impact. The WiFi router counts for the impact of the Internet just like the phone does, not just it's not just the big pipe that's going the the backbone of the Internet that actually matters.

Speaker 2:

Deep research hit product almost everybody I know at least in in the in the

Speaker 1:

Mark Industries says he's reading 30 pages of deep research a day basically. Loves it.

Speaker 2:

He's making books with it. But why have agents broadly come around a little bit slower than than people may have expected? Is it is it is it just that using computers is actually much harder? Computer use is just a really hard challenge or or, you know, I I think going into this year, everybody said this was the year of Are

Speaker 1:

you talking about

Speaker 2:

flight booking or something like that? Flight booking, but, you know, people people were saying 2025 is the year of agents and I would say that it's the year of deep research and and not a lot of these other sort of, like, broader use cases. Sure.

Speaker 10:

Well, 2025 isn't quite over yet. So

Speaker 1:

That's good.

Speaker 10:

That'd be my response. And I

Speaker 1:

Thank you.

Speaker 10:

I'm I'm very much on the I I think that progress in this field, the way that it tends to work is that if something kind of works with the current generation of models, it will be extremely reliable with the next generation of models. Mhmm.

Speaker 1:

Yeah.

Speaker 10:

And I think that where we've been is that deep research is the if you've rewound a year, that was the, we kinda had something working. And then, like, this year, it's been just incredible. And I think that agents, you know, specifically, like, computer use agents are something we've kinda had working. And, again, you know, the year is is not over. I think there's a lot of rapid progress to be made.

Speaker 10:

But I think that maybe part of it too is that the agents that we're about to see, I think, are a little different from maybe what we would have pictured five years ago.

Speaker 12:

Mhmm.

Speaker 10:

Like, I remember having a debate with some friends on do you want a agent that does the flight booking? Because the problem is it's actually a very high bar to beat the flight booking UI because there's so many preferences that are entailed in that. Right? And you really have to know kinda what mood you're in, like, are you okay with, like, taking the extra layover and all these kinds of questions. And that, actually, there's so much other stuff that happens in your life that is toil or drudgery or that's something that you're not an expert in, you're supposed to be.

Speaker 10:

Think about health, that every patient really is the doctor. If you're coordinating across multiple specialists, there's no doctor that helps you with that. Right? That that's really on you. And that there, you actually can have AIs that are just text only, that actually are able to add massive value and then freeze up your time if you want to go book the flights yourself.

Speaker 10:

And so I think that really finding the right problems that have high leverage, that really add value to people, and also thinking about the other side of how to make sure these agents are responsible with the trust that you put in right? That the more that you give an agent access to your email, the more you really have to trust that it's going to sort of do right with whatever your task is and send the right email to the right people and be able to segment your information and all of these kinds of questions. And so I think that there's both a practical how do you get to adoption, but also just like where are the most important leverage points in a person's life.

Speaker 1:

You also missed coding agents because it's been the year of deep research, but I feel like it's also been the year of coding agents. Yep. How is that developing at OpenAI? I've noticed that I'll hit o three Pro and it'll wind up writing a bunch of code for me and I didn't even ask it to. Then you have specific products for coding.

Speaker 1:

How do you see the evolution of software development evolve? How are you seeing OpenAI customers use coding tools? And how good is ChatGPT or GPT-five on coding?

Speaker 10:

Well, software engineering is definitely being revolutionized in front of our eyes. It's been happening. And GPT-five is the best coding model in the world right now. It's the default now in cursor, which I think is a a really huge statement of the quality of the model, and that it's just so good across, like, every function of writing code, understanding code base, being able to use tons of tools, being able to do agentic work that yeah. It's like, I'm not a front end developer at all, but actually, now I am.

Speaker 10:

Right? And I think that you are too. Right? If you just talk to the model, you can produce incredible things. And so I think that there's this real empowerment if you think about what computers were supposed to be.

Speaker 10:

Computers are supposed to be a tool that makes you more productive, able to do the thing you want. But then, somehow, when we started out with computers, we have to contort the human to the machine, writing assembly language and all these very abnormal things for a human to do, and that as we've moved to to tools ultimately, you know, in the current generation now, GPT five, suddenly the computer comes closer to you, right, that you just express your intent and you don't think about, okay, like, exactly which language and what, you know, version of different libraries, that the model is something you can delegate to. And so we are very committed to programming and to making our models continue to be the best they possibly can be.

Speaker 1:

Must a superintelligence be able to explain how to build superintelligence?

Speaker 10:

So it's it's a great question. So, I mean, I think that where we're going is a world, and we're already seeing it, where these models help us produce the next generation of models. They also help us really supervise tasks that are too hard for humans to supervise on our own. If the model writes a 10,000 line program for you, reviewing that is probably going to be quite burdensome. But if you can have a model that you trust, that maybe isn't as capable as the one that wrote all that code, or maybe there's a team of agents that work together to write all that code, but you have a team of reviewer agents, like, is the kind of thing that you can actually bootstrap trust.

Speaker 10:

And I think that this is this is, like, one of the most important things. And also, interestingly, 2017 is when we had the first language results. We also had some results or some vision on how you can actually bootstrap supervision beyond the scale of tasks that humans are able to supervise directly. And so I think that we're heading to a world where, you know, we now have these chain of thought models we've been advocating very strongly to preserve the integrity of the chain of thought, right? So that means don't directly optimize it to look good, even though there will be lots of temptation to do it for various reasons.

Speaker 10:

Really make sure that there's no pressure on the model to obfuscate its thoughts within that chain of thought, because then you can really see what it's up to. And I think there's further techniques to even make it more faithful and more rigid to what the internal monologue of the agent is. And so I think that there's actually a lot of promise in terms of interpretability, in terms of supervision, in terms of being able to scale to just like much more sophisticated tasks.

Speaker 1:

Yeah. I guess my question is like, there how much information in the world can be derived from first principles reasoning versus true secrets that can that need to be discovered by interacting with the world directly? Because I would it feels like it'd be very difficult to I'm just wondering about like how intellectual property interfaces with super intelligence or how like if you play this out a lot, how like there's all these like hard won Dorcas has talked a little bit about this with continual learning. There's all these little subtleties that maybe they're not secrets, maybe they're not true trade secrets. You don't think to lock them down, but they're just things that haven't been codified online or anywhere.

Speaker 1:

They haven't been given to anything that is serviceable by the model. And I'm wondering how is it is it just we need to build up new knowledge in every fact from first principles and and kind of go through the the history of humanity's pursuits of knowledge? Or do we just need to onboard more and more information? Maybe it's both. I don't know.

Speaker 1:

It's just something I've been noodling on.

Speaker 10:

Yes. It's a great question. I would say all of the above. Select all star. So I think that the answer is very similar to what it is for humans, right?

Speaker 10:

How does a human generate new knowledge? How do we accomplish new things? Yeah. First, you wanna be grounded in the wisdom of the past, You really want to understand what have people tried, what worked, what didn't work. You want to go and read the biographies of various people and understand those.

Speaker 10:

But you also want to try things out. You want to make some mistakes in a contained environment in a way that you actually can see the effect of your hypotheses, and then you want to be able to learn from those. And I think that being able to really start to scale up these systems and be able to integrate them with the world is a very big process and milestone that we are currently embarking on, to move from a world of totally hermetically sealed reinforcement learning environments to thinking about how do you actually put real world interaction in there. And you think about things like robotics, you're going to need to have that at some point. You're going to need to have some sort of interaction with the real world and to have models that are able to produce new materials, to be able to actually solve various diseases, for them to be able to really help people.

Speaker 10:

We already have models that are great at use cases like therapy, but to really get to the next level of something that can just really help every person accomplish more and accomplish whatever their goal is, it would be very helpful for that model to actually have some real world experience with doing that very thing. And so I think that that figuring out how to bring all of this together is ultimately what our mission is about, and we do this not in isolation, but really as part of a much broader community.

Speaker 1:

And it seems like it's advantageous to have the most dominant consumer app in that environment. So congratulations. Jordy, do you have the last question?

Speaker 2:

Last question. What what do you hope to see out of Washington DC in the next year, year or two? Not thinking super long term in terms of, you know, basically promoting innovation within The United States. Obviously, the admin cares a lot about AI and has been making moves, but but what else would you like to see or where would you like them to double down?

Speaker 10:

Yeah. I've been very, very impressed with how much the administration has engaged with the technology and really tried to figure out how can we help and ensure that American AI continues to lead and really sets the standard for the world. And I think that that is the the lens that I would really encourage thinking through. Right? It's like, this technology is changing very fast, and that fast plus government is not usually a ideal combination, but this is the reality that we have.

Speaker 10:

It's the opportunity we have. And I think that the question in my mind is less about any specific regulation or strategy, but it's really being calibrated. It's really having a very tight OODA loop, right, being able to react to, okay, we have a new model. These are the capabilities we see on the horizon. How do we make sure that we get the most uplift and benefit from it?

Speaker 10:

And thinking strategically about not just how do we do this for Americans, right, but how do we actually do this for the world and promote democratic values. And so to me, the most important thing is that motivation, right, is the question that is asked and the ultimate motivation behind what gets implemented.

Speaker 1:

Yeah. That makes a ton of sense. Thank you so much for joining us. Jordy, are you gonna hit the gong? For GBT five

Speaker 2:

and the whole thing.

Speaker 1:

Congratulations on the massive A historic day. And thank you so much for stopping by. We'll talk to you soon.

Speaker 2:

Thanks for joining.

Speaker 1:

Have a great day. Thank you for having me. Bye. Cheers. Really quickly, let me tell you about figma.com.

Speaker 1:

Think bigger, build faster. Figma helps design and development teams build great products together. And we are joined by Sarah Fryer, the CFO of OpenAI next. And we are going to bring her in in just a minute. The Gong is

Speaker 2:

still swinging.

Speaker 1:

The Gong's still swinging, and I'm gonna tell you about vanta.com. Automate compliance, manage risk, improve trust continuously. Vanta's trust management platform takes the manual work out of your security and compliance process and replaces it with continuous automation whether you're pursuing your first framework or managing a complex program. We need one more second. Tyler, any other questions that we should be asking for the OpenAI folks?

Speaker 1:

Anything top of mind? What's on the timeline? Is the timeline still in turmoil or has it settled?

Speaker 3:

So I I think the general vibe is like this model was not bench maxed

Speaker 1:

Mhmm.

Speaker 3:

But if you actually get to use it it's pretty solid. Cool. One thing it failed to UPN bench.

Speaker 1:

Oh it did.

Speaker 3:

It did not get the horse breed correct.

Speaker 1:

The horse breed. Wait. You see you have it? You have access

Speaker 3:

to Yes. Have access. But I've seen other things on the timeline. We can talk talk talk about it later, but it seems like a really good model.

Speaker 1:

That's amazing. Great to hear. Well, welcome to the stream, Sarah. Good to meet you. How are you doing?

Speaker 1:

Congratulations. A historic day. Thanks so much for taking the time to talk to us. How are doing?

Speaker 13:

I'm doing great. I mean, how could you not be doing great on the day when GPT five launches? It's been a long time in the making, and we're so happy it's out.

Speaker 1:

Yeah. Fantastic. Walk me through your role and what GPT-five, what this launch means specifically for you. And, yeah, well, let's just start there.

Speaker 2:

Finance has to be You guys have to be the unsung heroes at OpenAI. There's a lot of massive

Speaker 1:

bills coming in for crazy training runs, and you have to underwrite these against future revenues, and I'm sure you've developed many models to figure that out. But yeah, walk me through what your role at OpenAI and what today means for you.

Speaker 13:

Yeah, absolutely. So I'm OpenAI's CFO. But finance can be the unsung heroes, but they are an amazing team. So I'm going to shout out to them.

Speaker 2:

They're heroes to us.

Speaker 14:

They're heroes

Speaker 13:

a to world that we're all living in. And there are a lot of Bs on the end of a lot of the numbers that we look at. Look, what is our role? Number one is just making sure we have a healthy, high growth business. It's been incredible watching just, first of all, the number of weekly actives.

Speaker 13:

700,000,000 people using ChatGPT every week, and I'm assuming after today, we should see a very nice little bump in that number.

Speaker 1:

This is gonna be a gong heavy segment, Jordy. I think you're gonna have we have a lot of soundboard for the big numbers, so congratulations.

Speaker 13:

Keep going. And I love that I've never met a number I didn't like. Think the other part of the business that, you know,

Speaker 11:

It's and then too we have healthy to do this

Speaker 1:

to as clapping.

Speaker 13:

We have to do this balance of the consumer business, the enterprise business, and then API business, which I think of enterprise, you know, and balancing that out. So enterprise adoption has also been exploding. I probably do I mean, interestingly, as a CFO, I probably meet four to five customers a week. It's a part of my job I actually love. We have about 5,000,000 paying business users right now from banks to biotech.

Speaker 13:

I was talking to the CFO.

Speaker 2:

And so that number is individual companies?

Speaker 13:

That is seats at companies.

Speaker 2:

Seats at companies. Got it.

Speaker 13:

So what I would say about that number is it's crazy to have done that in just two and a half years. Enterprises, right, you gotta you gotta put your big boy, big girl pants on to go sell to an enterprise. Right? They wanna make sure that you have the table stakes of security, SSO for signing on, you have HIPAA compliance if you're selling to health care and so on. They want to know that other people have done it, so they're often looking for that case study.

Speaker 13:

But they also want to be, you know, the innovator, right, at the front. And so that to grow that scale of business in just two and a half years blows my mind. And it's not just big big businesses, which I could talk, you know, at length on, but it's also small mom and pop, you know, you know, literally the people who really keep the lights on in most countries are also gravitating to ChatGPT, which is wonderful. And then on the developer side, 4,000,000 developers have built in our platform. The question there is like, that could be a developer inside a big like Grab.

Speaker 13:

It also could be the next, you know, startup founder. That's Y Combinator getting going with the next multibillion dollar unicorn business. And so we see the whole gamut there, and that's important to us as well. It's it's very mission aligned, right? How are we going to get AGI to all of humanity if we don't do it through this ecosystem?

Speaker 13:

So a big part of my you asked my role a big part of my role is just keeping that business really healthy, making sure we always have the headlights on so people know the decisions they're making from a business standpoint. Huge part of what the team does. The other big part of my role is compute. I didn't talk about in my first breath, you all should correct me. I mean, it's making sure we think compute is a massive competitive differentiator.

Speaker 13:

I give so much kudos to Sam and the team, but particularly Sam, because no matter how big a number we look at, Sam always wants to go bigger. And he's been right.

Speaker 1:

It is He's never met a number he doesn't want to add a zero to.

Speaker 13:

That too. Maybe more than logarithmic. Maybe

Speaker 1:

two zeros.

Speaker 13:

And and he's but he has been very right. And if you know, you just had a long conversation with Greg Brockman. Think he does such a good job of kind of really explaining what a completely different world an AGI ed world is or an AI fied world is. And so I think when people get all cut around the axle of, like, you know, what is a gigawatt of compute? And oh my god, you guys want to have 10 gigawatts.

Speaker 13:

And that's more than the compute of like Ireland, since I grew up there. And then now you kind of look back on that, and you're like, those numbers already look small for a world where everyone will have access to intelligence. And we're really starting to see what that can mean when you look at the demos today around things like health care and education and so on.

Speaker 1:

Can you talk to me about non GAAP metrics and what you think is gonna be useful to track. We were talking to Mark Chen about this and he was saying, you know, DAUs are great, time on-site is great, but that's not as impactful of a metric for OpenAI as it is necessarily for a social network or an entertainment app. And and there can actually be some problems that come up with that. So it feels like there might be some tension in the organization eventually or or just publicly about, you know, what metrics are worth optimizing for. And then there's also the financial community that wants non GAAP metrics to track the health and progress of the business.

Speaker 1:

And then, of course, over, you know, decades, we see companies eventually roll back some of those non GAAP metrics and as as the business gets more complex. So how do you think about the development and and and sharing of non GAAP metrics? And what do you think is actually interesting and provides signal to the business and the and the investor community?

Speaker 13:

I'm kind of smiling to myself because when anyone normally says talk to me about non GAAP metrics, I can see like most of people's eyes roll back in their heads.

Speaker 1:

I live for non GAAP metrics. I would

Speaker 13:

love to do that. Please. Look, I think in a CFO seat, first of all, it's really important to think about input metrics and output metrics. Mhmm. And things like revenue, which is a GAAP metric, as well as a non GAAP metric, they're very laggy.

Speaker 13:

Yeah. Like, if you're spending your whole time focusing on the revenue number Mhmm. In an operator seat, like, you are completely missing what's going on with the business. So I push my team a lot to get out of kind of ultimately what the P and L looks like, and I'll come back to it though, and go way upstream and say, what are the true input metrics that tell us about the health of our business? And so I think it does start with that funnel of monthly actives to weekly actives to daily actives.

Speaker 13:

Because we do I mean, our mission is literally AGI for the benefit of humanity. So we know how many billions of people live on the planet. The fact that we're starting to be able to talk in billions and percentage of the world's population blows my mind. Today, 85% of our users are outside The United States. And I love that stat.

Speaker 13:

And in fact, if you go look at where where are the big populations of users, it just tracks global population. Right? It's countries like India, Indonesia, Brazil, Vietnam, like The Philippines. Like, go to anywhere that has big population. The US too, of course, but that will be your tracker.

Speaker 13:

So that's kinda number one when I think of an input mac metric. From there, on the consumer side, you're right. Things like time and app, I've actually always had somewhat of a love hate affair with. But I think in this case, because we're giving people intelligence, teaching them how to use that, I actually think, is where time and app does become important. And one of the things we've really seen with ChatGPT are people are spending more time with it.

Speaker 13:

Now, we balance that with things like mental health and so on, making sure that we're not creating bad things like we might have seen in prior eras of computing. But I think we're just getting started on that front. Beyond that, like, when we go into areas like the API, I don't look only at usage. Right? I can look at tokens per minute as a usage metric.

Speaker 13:

But I look at things like latency. I actually try to look at the elasticity of demand. We know that developers want performance. They want intelligence. But they also want to make sure the API is always up, and they want price.

Speaker 13:

And they're often willing to trade across those three things. It's a linear program, depending on what your use case is. And so I think it's important that we are offering things to developers that allow them to optimize across those three metrics, for example. So that's kind of your input metrics. And again, I could wax lyrical, but I won't.

Speaker 13:

But then you go to what you really asked. So investors on the other side, right, they want to see a P and L. They're like, I want to be able to compare you to other companies. Want to be able to create maybe a DCF. Like, I want to think about fundamental valuation for a company if I'm going to invest in it.

Speaker 13:

And so today, what I really try to push investors on is we are not a company that should be optimizing for free cash flow today because there's just too much opportunity.

Speaker 1:

Yeah.

Speaker 13:

Like that point about compute, we have to make a decision on compute today with an eye to what we're gonna need in two to three years. Because data centers don't just spring up overnight. Like, they're not mushrooms. They literally take time

Speaker 7:

and the effort.

Speaker 1:

Getting there.

Speaker 13:

Failed at, frankly, I would say is three years ago, we didn't have enough foresight to say how big could chat cheap petite because it didn't exist. Yeah. It's just a shame on us if we keep doing that over and over. So there can be a bit of a mismatch between our belief on revenue because we don't yet know the product versus the input, which is the cost today on compute. And so getting investors comfortable with the fact that there's probably losses for a period of time.

Speaker 13:

I say probably because ChatGPT, just generally, the revenue models continue to surprise to the upside. But at least for now, we should be in big investment mode. And then you kind of said it well, like, as companies mature, you move to more gap metrics. Right? If you look at, you know, the large, the Mag seven, many cases, they're looking at, like, real gap net income.

Speaker 13:

So the whole way down to the bottom of the p and l. We're just not there yet. And we should take advantage of that advantage because we can invest as a private company.

Speaker 2:

How do you think about timing fundraises? From my understanding or or or rumors, the last, you know, the most recent financing was very oversubscribed and at the same time, you're still committing to CapEx in the future that is a multiple of current, you know, the current run rate. And so you in the CFO seat, I'm sure there's you're you're trying to find this balance of like what does the business need today while, you know, not diluting the company, you know, too much knowing the the sort of growth rate of the business.

Speaker 13:

I mean, that's exactly right. That's the the art, not the science of it, is that, you know, we did just come off the back of closing out the the the sleeve of investment that we could take down in this current round led by SoftBank. And it was massively oversubscribed, which comes back to, I think, the market really waking up to the fact that AI is a generational opportunity. And the scale that it requires is like something people have not even seen before. Right?

Speaker 13:

It's people you talk about the internet or like the railways. They're good analogies or transistors, think Sam always goes back to they're good analogies, but I do think this is bigger than everything that's come before. There's taking down $40,000,000,000 which we just did in this round, that certainly felt like that gave me a lot of confidence. Appreciate that. A lot of confidence to then go out and do large compute deals.

Speaker 13:

We announced the large deal with Oracle, for example, and to be able to keep working with all of our supply chain: Microsoft, CoreWeave, Oracle, Nvidia, and so on. But at the same time, you know, in a world where our valuation has gone up, you know, at pace with our revenue, you do get an opportunity to keep coming back to market and not take that same dilution because you're getting that higher valuation for the work and the output that you've created. So it is a bit more of an art than a true science. I think for now, we will we will continue to need to fundraise in order to fund that compute. But I think we wanna start getting more sophisticated.

Speaker 13:

Like, just pure equity fundraising for everything is an expensive way to fundraise. And I think we're probably getting to the stage at a company where we can be a little bit more kind of broad in how we think about funding overall. And and even just working, frankly, with our supply chain because, you know, our success with bringing this era of AI into being is their success too and I think these companies are realizing that.

Speaker 2:

What about partners? Last question, partner selection on the compute front. There's not a lot of companies in the world or or or firms that can can really be a

Speaker 1:

You should update your LinkedIn title. We saw someone yesterday works for Discord is in charge of their cloud buying and the and the and his LinkedIn title was I have full responsibility over buying cloud our our entire cloud budget. And it was clearly like a huge flag, but I'm sure, you know, you're you're in you're in direct text message con you know, with every single person that's relevant in the industry. But

Speaker 2:

Yeah. But but I'm curious around like, you know, a lot of people have been excited about developing data centers over the last couple years in hopes to win

Speaker 1:

Oh, yeah.

Speaker 2:

Company, you know, the business of companies like OpenAI. But I think in when you guys are evaluating partners, I I imagine that scale is is such a is such a massive factor. And so a single small data center is not really gonna move the needle. You guys need to be thinking in terms of mega projects.

Speaker 13:

Yeah. I mean, think that's exactly right. I mean, started with our partnership with Microsoft. And it's kind of makes me smile now to go back and look at that original large fabric for pre training, because I think it was only in the maybe 20 megawatt size. And now we're talking gigawatts, even And just this you're right that when we think about, like, what is perfect compute for us, or strategically the right compute for us, we are definitely thinking about large scale.

Speaker 13:

We're thinking about flexibility, right? We're learning a lot about, pre training, post training, test and compute, even, like where the different kind of scaling is happening. We're kind of recognizing there's more of a blurred line, often between what people think of as inference. So investors always are like, your inference compute and your training compute. It's like, you know, literally, it's like vanilla ice cream and chocolate ice cream, when in reality, there's like a bit in the middle that is something We of also need to think about things like, where, you know, latency, where do we want to put our footprints around the world, that very global weekly active user base.

Speaker 13:

Right? As they use ChatGPT, you don't want to slow the model down. Right? The beauty of the intelligence is, like, the real time nature of it. And then when we get into big compute, like where there's lots of tokens being used, like deep research, image gen, video, as that comes online, like all the work you saw today, actually just even on voice, like that really quickly means that you gotta make sure your compute is near your And so it is a a big plan that's coming together.

Speaker 13:

But you're right, like small is just not that useful to us.

Speaker 2:

But What about pushing partners to take risks? From my understanding, you guys are pre committing to certain, you know, basically spend levels, but at the same time, I imagine you want people to say, here's what we know we're gonna need, but we want you to build, you know, this much capacity so that we we have the sort sort of incremental capacity built in.

Speaker 13:

Yeah. We wanna I mean, being extensible is really important. We do wanna see partners like I think Oracle OCI has done a really nice job of that, of kind of starting. We started with like one large it felt really large at the time, data center footprint in Abilene and Texas. And now that has really multiplied up into multiple sites that can all be connected.

Speaker 13:

And that's a good example of a partner who has the capability to start in one way, but to be able to show you a path to maybe 5x ing just in that in that single footprint. That said, I we are finding that as we go around the world, is an ability to go work with governments, for example. We just made an announcement in Norway, made an announcement in The UK. This is the first time in my professional career I've seen countries come to the table. I wanted to do commercial deals like wall to wall ChatGPT.

Speaker 13:

I think the government of Estonia put ChatGPT into all of their high schools. High school or I can't remember what's up in the university level. But that's kind of wowing. And hand in hand with that, they are viewing AI infrastructure as incredibly strategic for their population. And, you know, it's a whole other level of selling versus, you know, I've I've seen enterprise, large enterprises before, but never anything at this scale.

Speaker 2:

Last question. Whose idea was it to give every federal agency chat GPT for a dollar a year?

Speaker 3:

Yeah. Imagine I imagine that had to

Speaker 1:

get passed. Could have gotten more than a dollar. The CFO must be really upset here. $10. That's 10 times as much money.

Speaker 13:

This is one where I think it's really important. Opening is, in some ways, a US asset, a national asset. And we want to make sure we're accelerating our governments, like all of the resources as we think about Western democracy and so on, that we are absolutely putting our technology into those hands.

Speaker 1:

It's that guy, Kevin Wheel. He's been moonlighting for the US government. It's like, which team are you playing for, Kevin? Are you on OpenAI, or are you on the US government?

Speaker 13:

Kevin just did his basic training. I don't know if I'm allowed tell you that, but I was hearing all about it yesterday.

Speaker 3:

I know.

Speaker 1:

Saw some photos. They look great.

Speaker 13:

Yeah. It's a good thing.

Speaker 1:

He's gonna

Speaker 7:

be even better lot

Speaker 4:

with governments.

Speaker 1:

Yeah. That's great. Yeah. Amazing. Last question for me.

Speaker 1:

I I I you know, the open source model launched two days ago and there's this world where like you have this dominant the accidental consumer company, have this dominant consumer app that's generating so much revenue. Then you have b to b and enterprise and API, and that looks more like a cloud provider. But then is there a world where the Red Hat Linux of open source LLMs is an open AI division? And that and that there's actually serious revenue and profit that comes from helping companies implement an open source large language model like Red Hat built a pretty fantastic business for a long time on top of open source Linux implementations.

Speaker 13:

Well, like, yeah. I mean, I think it's the right question to being asked to be asking. I mean, I think step one was getting

Speaker 1:

Yeah. You got it out. Two days in itself.

Speaker 13:

Our second open source model out and getting seeing what that traction is, and then seeing what the community needs. I think it's important to leave space for a community to develop. Right? That is the beauty of open source is that ecosystem that develops. And that was true with Linux.

Speaker 13:

It's true in areas like crypto too. Mhmm. But I do think you'll find over time that as enterprises want to deploy it, like, I mean, now I'm dinosaurs myself, but when I was a, you know, when I was a research analyst at Goldman Sachs back in the day, I covered software and I covered I covered Red Hat, actually.

Speaker 1:

Yeah. Yeah. I really

Speaker 13:

All that growth. Wrote a research report called Fear the Penguin at one point.

Speaker 1:

Fear the Penguin.

Speaker 13:

The Linux being deployed. Yep. But then you started to understand that for an enterprise, you couldn't depend on patching and upgrading to happen via community model. Like you needed some of the rigor that goes with an enterprise business, where you kind of know when, you know, if you need maintenance, if you need a bug patch, and And so so that did allow Red Hat to grow an incredible business. So I don't know if it's us Yeah.

Speaker 13:

Or we'd be supportive of others, but I think we are so excited to see open source out there and getting incredible feedback. And I think we wanna do that ahead of GPT five to keep coming back to, like, we're here to grow this ecosystem.

Speaker 1:

Well, we'll give you market cap credit for it anyway, even if it's early stage. Well thank you so much for coming on. This is fantastic. We'll talk to you soon. Thank you, sir.

Speaker 14:

Appreciate it.

Speaker 13:

Take care.

Speaker 1:

Have a

Speaker 2:

good one. Cheers.

Speaker 1:

Bye. Up next we have Didi Credo from Kudo. I believe I'm pronouncing that correctly. Let me tell you about Graphite code review for the age of AI. Graphite helps teams on GitHub ship higher quality software faster.

Speaker 1:

You can get started for free at graphite.dev. And let's bring in our next guest. How are you doing? Welcome to the stream.

Speaker 2:

Welcome.

Speaker 1:

Oh, very clean background. I know it's probably virtual, but whatever you got going on looks fantastic. You look great. How are you doing? Are you excited about GPT five?

Speaker 6:

Oh, I'm so excited. It's it's awesome.

Speaker 14:

Break it down.

Speaker 6:

Actually exciting. Everybody's talking about the coding capabilities.

Speaker 1:

Please.

Speaker 6:

But no one is really talking about the code review capabilities, and I'm gonna talk about that today.

Speaker 1:

Yeah. Yeah. Break it down. How are you using it right now?

Speaker 6:

Yeah. So we just enabled it in our platform. Mhmm. It's a default model for both our ID plug in, our CLI, our Git plug in. And then, yeah, we're using it to generate very high quality code reviews, catch bugs before they hit production, help enterprises verify that their code is aligned with their best practices.

Speaker 6:

Mhmm. So, yeah, it's super exciting. I can share my screen and show a few things if, like, that makes sense.

Speaker 1:

You can. Everything you share will be live. It'll be a little Yeah.

Speaker 13:

Yeah. Yeah.

Speaker 1:

You could please. But I I I wanna know also while you're just getting that set up, I wanna know about, what changes materially do you think happened in GPT five specifically for code and code review? Do you think there's more data going into the model, more data going into the pretraining, post training, anything else? Any anything that you're noticing that you're like, oh, there's a specific upgrade here. They must have done something to get there.

Speaker 6:

Yeah. Yeah. Think it's a great point. So I think it's all of the above. So it's scaling of both the, like, the pre training, but probably a lot of the reinforcement learning.

Speaker 1:

Yep.

Speaker 6:

And basically using that at scale to verify that, code gets generated in high quality and then also, basically catching bugs. Well, again, and when you do it with reinforcement learn learning, you have the the actual actual ground ground truth. Truth. So once you scale that, you can get the model to be, to basically be a lot better at that.

Speaker 1:

How how steep is the power law right now in, in just programming languages? Is it basically all Python, JavaScript, and then, like, a really hard fall off? Or is it actually important for coding models if they wanna be adopted widely to be, like, truly multi language and get all the way down into the long tail of, the Rust and the and, you know, c sharp and all the different languages that are out there?

Speaker 6:

Yeah. For sure. It's important to I mean, the majority of the market is in the JavaScript, TypeScript, Python Mhmm. Like the majority of the early adopters, I would say. But then when you get to enterprise use cases, you get a lot of dot net.

Speaker 6:

You get a lot of Java. And the models are pretty getting pretty good at those, languages as well, for sure.

Speaker 1:

Are you excited about, I mean, how do you think about the difference between, like, the improvements to GPT five from the consumer's perspective versus at the API level. I always found it a little confusing that ChatGPT was available as an API and you could interface with the ChatGPT. I believe you could interface with the ChatGPT model via the API and and there's a little bit of like a blind blurring there. But are there features that you think are are cruft and you wanna kind of rip out for an a API use case? Or do you just say, hey, give us the kitchen sink and we'll we'll we'll work from there.

Speaker 1:

And it's actually helpful to have Yeah. You know, a coding model that can still have a web browser.

Speaker 6:

Yeah. Yeah. I think, basically, it's a lot about, we consume the model through the API, and it's really the same model that drives the consumer product.

Speaker 1:

Mhmm.

Speaker 6:

But as for us, since our use cases are a lot about the agentic use cases

Speaker 1:

Okay.

Speaker 6:

The more the model gets better at using tools, and gets better at kind of listening to very, very specific instructions. Following instructions is critical for the enterprise use cases. Because for us, unlike the bottom market, we believe that for enterprises, you need to have very specific, agents that are defined with specific set of instructions and prompts and tools and permissions. And the more the models get trained with that type of environment, the better they stand up serving the the enterprise market, which is really where we're focused

Speaker 1:

on. My my question is, I wonder, like you you said like very specific instructions are important. When are we gonna get an agent that I can just turn loose in a code base and say like, just go improve it. Like, just go hunt around, do like rewrite that. Like like when you get a good open source contributor on a team that just becomes nerd sniped by the project that you're building They will just go around and find little ways to improve.

Speaker 1:

This documentation needs to be a little better. Let's rewrite this test case over here. Let's add a little bit more, you know, functionality to this class or function. How far are we from that?

Speaker 6:

Yeah. I think the models are getting, better and better at that part of basically kind of running loose in a code base. Yeah. But they do need the guardrails in place. Mhmm.

Speaker 6:

And this is kind of where we're focused on. Like, the a lot of the talk in the market is around the code generation side. You know, let the agent loose and give it a task, and it's just gonna go around and run for hours and do and and do it. What we're seeing is that the real challenge is now shifting towards how do I verify that the code is aligned with the best practices? How do I make sure that it's well tested, well reviewed, doesn't break anything?

Speaker 6:

You know? So that that's, I think, the next frontier. And, really, developers going forward are not gonna write a lot of the codes by by hand. Mhmm. They're most spend they're gonna spend most of it their time reviewing code, and that's the next frontier.

Speaker 6:

That's what we're talk really, like, are here to tackle.

Speaker 1:

Very cool. Anything else, Jordy? No. Well, thank you so much for joining, giving us some extra context on the g p GPT five launch. We will talk to you soon.

Speaker 1:

Have a great rest of your day, and thank you for joining. Cheers.

Speaker 5:

Thanks. Cheers.

Speaker 1:

Talk to you soon. And let me tell you about ProFound. Get your brand mentioned on Chatuchi BT. That seems more relevant than ever. Reach millions of consumers who are using AI

Speaker 2:

to discover new

Speaker 1:

products and brands. I forgot to ask about this. We'll have to come back to this, but I wanna know if

Speaker 2:

Found powers MongoDB, Indeed, Mercury, DocuSign, Zapier, Ramp, Roe, Golan, Workable, Majuri, Eight Sleep, US Bank, Chime, Clay.

Speaker 1:

Okay. Okay. We got it.

Speaker 2:

They got some logos.

Speaker 1:

There is this question of like, okay, even if you're even if you're like, okay, GPT five is more incremental than more of a a more of an evolution than a revolution. It's like, okay, well then let's talk about how it affects every other business and every other aspect of the economy. What should you be focusing on? And is like, do the Do any of the updates from GPT four to GPT five change how you're positioning your brand for AI search? That's certainly an interesting question to dig into.

Speaker 1:

Anyway, we have Zach Lloyd from Warp coming into the studio. Welcome to the stream. Welcome back. Back.

Speaker 2:

Zach Attack.

Speaker 1:

Good to see you. He's

Speaker 2:

back. How

Speaker 1:

are you doing?

Speaker 15:

I'm doing pretty well.

Speaker 1:

Yo. Yeah. So, I mean, a lot of what stuck out to me, I'm mostly a consumer of consumer AI apps. I'm very excited about not needing to mess around with a model picker anymore. But take us through the biggest improvements from the software development side.

Speaker 15:

Yeah. I mean, it's a it's a major step up from the prior OpenAI models. It's I mean, it's doing agentic workflows and warp for a much longer period. It's just a smarter general model. Like, we evaled it against all of our benchmarks, and it's up there at state of the art, which is you know, from our perspective, it's it's awesome to have multiple competitive models that our users can benefit from.

Speaker 15:

So definitely a a huge improvement from GPT four one.

Speaker 1:

Yeah. So it seems like not not the, you know, clogged code killer, but certainly in in in the same conversation, in the same in the same football stadium for using sports metaphor.

Speaker 2:

How much you know, one thing that stood out is the cost reduction.

Speaker 1:

How much

Speaker 2:

do you think that developers will care about that versus just, you know, what it can do from from an output standpoint?

Speaker 15:

I think developers do care about value. So sort of like quality to cost ratio. I think it's the more you get into, like, the individual developer and the small team, the more that that matters. Whereas if you're at the enterprise level, I feel like it's it's a little bit less price sensitive. So, yeah, I I mean, you you you can see it as as different apps change their pricing, what the reaction of the developers is.

Speaker 15:

You've probably seen this with Cursor and seen this with Cloud Code. And so developers really, really are looking for something that's cost effective. So the the fact that the cost is a little bit lower is actually is a big deal.

Speaker 1:

Do you think we're in, the Lyft Uber 2015 arc where the prices are subsidized and the prices will go up? Do you think that there's a price war on the horizon now that the Frontier models seem to be similar capabilities? Do you think that someone will try and raise a bunch of money, cut prices a bunch and steal a bunch of users? Like, how do you think that plays out?

Speaker 15:

It's an awesome question. I mean, my hope is that we get to a world where there is price competition at the model layer. So Warp is very much at the at the app layer. Right? And so our value prop is, like, we can give our our users who are mostly developers the best model access.

Speaker 15:

And so to the extent that it's not one sort of model provider running away with that and having pricing power, it's better for us, just candidly. And so, you know, my my hope would be something like the model world ends up a little bit like g cloud, AWS, Azure. That's our best end state where all of these models are, you know, sort of similarly powerful and a little bit more commoditized. I don't think it's been like that, but it's go it's it's getting a little bit more like that. And so the more that that there's more than one show in town, I think that's generally good for warp and actually is good for developers because it will put competition the competition will put pressure to bring the prices down.

Speaker 15:

But I don't know. Like, I also think that people will will definitely pay for quality. And so if there is a, you know, a meaningful delta in quality on the frontier models, then I think that's like, whoever has the quality delta will will have a lead temporarily, but it's I'm not sure that that lead will be sustainable. We'll see.

Speaker 2:

How how do you think the developer community should plan around model deprecation over the next, you know, one to two years? Like, how much, you know, from I I I don't know that I've gotten a reaction yet from I I don't know if there's general frustration yet from people. You know, we've we've heard on the consumer side, Tyler on our team here loves four or five. Yeah. And so he was a little bit disappointed to hear that.

Speaker 2:

But it but it what kind of what what are you seeing on the developer side?

Speaker 15:

Yeah. I think it's a little bit different for people who are, like, building apps on LMs versus people who are using LMs as, like, like, accelerator to doing coding. And, like, you know, at Warp, actually, we do both. Like, we we're we're an application level stack. And, like, it's actually very easy for us to go to the latest model, and so it doesn't it doesn't really bother me.

Speaker 15:

I don't know I don't know what type of app you would be building where it's like it's really important that it's like GPT $3.05 or GPT 4 or something like that. I think, like, generally, we want the most intelligent tokens at the best cost. So I I don't I don't see that being, like, too big of an issue, honestly.

Speaker 1:

What about open source? Does that does that feel like something that will be in the playbook? Is the markup on closed source models high enough that there will be a significant price delta and or or is the Pareto frontier kind of indifferent to closed source, open source?

Speaker 15:

So if there was a a comparable open source option, that would be awesome. I think that the economics of it again, it doesn't it doesn't seem like a perfect analogy to me between, like, open source software and open source models. So open source software, it's like you have a big community of people who, you know, for the love of coding, are building a really awesome product. For open source models, it's it's like you just need a crazy amount of capital to train something that's on the frontier, and so I don't know that how that happens. And so what we've seen is, like, the open source models are competitive at the quality level that they're at, but the quality level that they're at is not the same as the frontier models.

Speaker 15:

And I don't really see why that would change. And so I don't know. In in Warp, it's like we we were serving some open source models, but they're just not they're not as good. And so there's, I think, a more limited use case for them right now, and I don't really see economically why that would change. In fact, I would be I would be surprised if anyone was spending billions of dollars to train a model and just kinda put out the open weights.

Speaker 15:

Like, I don't get this the business strategy there, but but maybe that will happen. That would be awesome.

Speaker 1:

Is there a world where you're like, this idea of, like, smarter smarter models either orchestrating dumber, cheaper models or, like, using or or distilling models into more narrow narrow formulations that can be run more efficiently. We've talked to a few companies that do this for businesses. Like, you just want a model that just filters for profanity and you can run it on, you know, a a a graphics card. And so it's basically super super cheap or super fast. I'm wondering about like in the coding world, coding agent world, any of that.

Speaker 1:

Like where where are the opportunities to kind of fan out and use an ensemble of models instead of just this hit everything with the smartest best. It feels like because of the funding environment, everyone can kinda justify, like a high cloud bill, but, and most people don't admit that it's hurting the bottom line, but it feels like at some point, it kind of has to eventually.

Speaker 15:

I mean, I think I think that's a very real thing.

Speaker 1:

Yeah.

Speaker 15:

Like, sense of even in warp, we don't use, like, the the biggest, most powerful model for every task. And so there are certain things, like, you know, for Warp, maybe for, like, deciding whether or not we should summarize a conversation is, a good example. So you hit the context window. You're like, okay. Is this is this a good spot to summarize?

Speaker 15:

Is this a good spot to encourage a user to start a new conversation? We use a much more inexpensive and also low latency model. Right? The other thing the trend is that these very, very powerful models tend to have much higher latency. And so we do a mixture of models, and that's totally a real thing.

Speaker 15:

But I I think for, like, the, like, the predominant use case as a developer is gonna be, I want to tell an agent to do something. I want it to be harder and harder. I want it to run for longer and longer. And to do that, it's like you you kinda want, in general, the most intelligent model. And so Yeah.

Speaker 15:

You know, until this until the models have a sort of s curve, like, type shape, I think that I think it's gonna be more of quality game than a cost game for most of these things.

Speaker 1:

Doesn't it feel like they have an S curve shape right now? Certainly does from a consumer perspective.

Speaker 15:

That's interesting. From a coding perspective, I feel like we're still accelerating. Like, the difference, again, between the last version of GPT and this version of GPT is probably bigger than the difference between, like, four point one and four and four and three point five.

Speaker 1:

Like Interesting.

Speaker 15:

It's a big deal.

Speaker 14:

Same thing

Speaker 15:

with the anthropic models, and I'm sure that we'll see something from Google where

Speaker 1:

Yep.

Speaker 15:

It's an acceleration. And I think that there is, like, a maybe an underappreciation of how much left there is to solve here. Yeah. Because when when you even when you're doing, like, a real coding task as a pro, like, despite all the demos you see on Twitter where it's like someone asks, you know, an agent to build an app, that's like a lower level of difficulty than doing what a pro developer does with one of these models. And the models still don't produce great code a lot of the time.

Speaker 15:

Like, there's a lot of kind of handholding that has to go into it. And I think I think that we are still seeing an acceleration in terms of the models actually becoming not just, like, okay, competent engineers, but, like, really, really good engineers.

Speaker 1:

Yeah. Do you care about benchmarks?

Speaker 15:

We care a ton about benchmarks. Like, we

Speaker 2:

But your own internal benchmarks or or

Speaker 15:

We we do both. So, you know, plug for warp, we're number one on terminal bench, which is the public, you know, terminal benchmark. And we're top five on SuiteBench, which is the coding benchmark. And then the only way, in my opinion, that an app at our layer in the stack can really improve is by measuring the progress. And so we have our own internal set of evals that we run across all these models as well, which are coming from, like, real use cases.

Speaker 15:

And that, again, is an advantage of being like a product that's in the wild that has a lot of users is that we can sort of see where the models are failing, where they're working. And so we're we're very big on that, actually. Yeah.

Speaker 1:

Awesome. Well, thank you so much for stopping by. We will talk to you soon.

Speaker 2:

Sure. You'll have a busy afternoon.

Speaker 12:

Shout out by the

Speaker 15:

way to OpenAI team. Very, very helpful in working with us to get GPT five to be awesome in warp. That's great. And one more shameless plug, it's we have a discount code for people who wanna try GPT five in It's $5 GPT five.

Speaker 1:

Okay.

Speaker 15:

Thank you for having me, guys.

Speaker 1:

Yeah. Awesome. We'll talk to you soon. Thanks.

Speaker 2:

Cheers. Bye.

Speaker 1:

Tyler, any updates from the timeline while you're thinking about what the latest vibe check is in the war between OpenAI and XC.

Speaker 2:

Linear is

Speaker 1:

a purpose built tool for planning and building products. Meet the system for modern software development. Streamline issues, projects, and product roadmaps. Go to linear.app to

Speaker 2:

get started. Choice for OpenAI.

Speaker 1:

You have something?

Speaker 2:

From Reggie James, front of the show, half of my timeline says this is the closest we've been to AGI. The other half of my timeline says we officially just hit AI stagnation. I love tech.

Speaker 1:

Well, we will be going deeper deciding whether or not this is stagnation or hyper intelligence takeoff. And we will be joined by our next guest, Riley from Charlie Labs.

Speaker 5:

Hey, guys. Thanks for having me.

Speaker 1:

Good to see you, Riley. How are doing?

Speaker 2:

What's happening?

Speaker 5:

I'm doing fantastic. Fantastic. We've been heads down with GPT-five and

Speaker 7:

How long have you had it?

Speaker 1:

How long did you get the preview? I feel like it it it, you know, it gets rolled out to early adopters a little bit earlier, but it's been weeks, months? How long have you had it?

Speaker 5:

We're a couple weeks, maybe two or three.

Speaker 1:

What was the first time

Speaker 2:

you did liking it.

Speaker 5:

Charlie loves it. And also, I love what Charlie does with it.

Speaker 1:

Yeah. What does Charlie do with it? What was the first thing you did with Chat GPT five?

Speaker 5:

Ran our evals.

Speaker 1:

Oh, yeah? How'd they come back?

Speaker 5:

Just really good. Mhmm. Much better than o three, which was much better than any other model we've formed before that.

Speaker 1:

Interesting. And yeah. So so let's zoom out. What what what do you do? What do these evals measure?

Speaker 1:

Walk me through it.

Speaker 5:

So Charlie is a TypeScript focused coding agent Mhmm. That operates much more like a human does. Mhmm. So less, like, IDE application terminal and more joins your GitHub and Slack and linear workspaces, it and interacts with the team the same way other humans do. And then our evals are a mix of code review because part of Charlie's job is to review PRs from humans as well as his own, and then code authoring, so opening PRs and pushing commits.

Speaker 1:

So so when you develop your own p your own evals, I imagine you try and keep those out of any training data. You want those to be held private. Is that correct?

Speaker 5:

Yes. And it's getting even harder with web access now because they're too good at finding things.

Speaker 1:

They're finding everything. Yeah. That's funny. And then and then talk to me about, the shape of those of the actual problems in the eval. Are you are you doing are there some easy questions, some hard questions, some some some extremely hard questions?

Speaker 1:

Like, are you formulating those? What's the shape of an individual task? Is it scored out of, like, a 100? How do you think about developing a good eval?

Speaker 5:

A mix of hard to very hard. The easy ones are just a waste of money and time at this point, especially with five. Like, there's a bunch that it's just not gonna get wrong.

Speaker 1:

Yeah. Yeah.

Speaker 5:

And then we're mostly doing the PR ones look kind of like SuiteBench in the sense that we're taking an issue to start with. But instead of giving the issue, like, in a Docker container already, we trigger a comment on the issue that says, hey, Charlie. Go make a PR for this. Mhmm. And then Charlie does his thing, and then the PR comes up.

Speaker 5:

And then we score that PR against a whole bunch of things like correctness to a known solution that's correct as well as code quality, testability, and some softer things like descriptions.

Speaker 1:

Who are the biggest who are the biggest customers or use or or users for, like, a TypeScript focused coding agent?

Speaker 5:

It's a wide range of mostly modern apps. Like, pretty much any web app these days is gonna be like a Next. Js type app. And then all the way into, like, back end. Like, Charlie himself is written in TypeScript.

Speaker 1:

Sure. Sure. Makes sense.

Speaker 5:

And there's very little

Speaker 1:

front end. Anything else, Troy? What else you got?

Speaker 2:

I just wanna say I love the name Charlie. It's one of my favorite agent names Yeah. That we've had on the show.

Speaker 1:

Yes. It's right up there with Pig and what was the other one?

Speaker 2:

Well, don't think that was an agent.

Speaker 1:

That was an agent. But

Speaker 2:

But yeah. It's a it's a good one.

Speaker 1:

Yeah.

Speaker 2:

Congrats on locking Yeah. It

Speaker 1:

What what what about what about cost and and that side of the business? Is there is there any movement there or anything that you where where you require movement or you need movement to really unlock new capabilities in the business or new markets?

Speaker 5:

Not really for us because we're operating as at a human level. We do value based pricing, so we charge per PR or per commit. Mhmm. And because that's comparing to such expensive actions that humans are doing, the challenge for us is more actually moving up to the promise than doing it cheap.

Speaker 1:

Yeah. Yeah. Are you having

Speaker 2:

But then but then doesn't the cost reduction announced today? Isn't that great for business?

Speaker 5:

Yeah. I mean, it's good overall, but, like, that's our problem is not that the models are expensive.

Speaker 1:

It's Sure.

Speaker 5:

That they're I mean, they're getting really smart, but I'll always take more.

Speaker 1:

That's great.

Speaker 2:

Never enough.

Speaker 5:

We like, for instance For token the August, we've been testing, and 98% of the code that got merged into our code base was written by Charlie.

Speaker 1:

Wow.

Speaker 5:

Not 30, not fifty, ninety eight percent. And that's coming through PRs. That's not like autocomplete, you know, an ID type thing.

Speaker 1:

That's crazy. Wow. Yeah. What, yeah, what does that mean for, like, the future of of like, who are you hiring? I imagine that you're still, you know, a an engineering heavy organization that's just puppeteering and orgus and orchestrating agents.

Speaker 1:

But where do you see, like, the future of software development as a career path going?

Speaker 2:

Yeah. Are are are new CS grads cooked? I

Speaker 5:

think if they get really good at using the AI, no. If they try and take an approach of getting really good at writing code by hand, for sure.

Speaker 1:

Yeah.

Speaker 5:

What we're mostly looking for hiring is people who are able to see things at a much higher level and plan further out. Because with tools like Charlie, you can write so much more code so quickly that it's, like, it's more important to see where you're going and take the right path than it is to be able to write it quickly.

Speaker 1:

Very cool. Well, thank you so much for stopping by. Good luck with the rest of your day, and congrats on a on an upgrade to everything that you do. Yeah.

Speaker 2:

Tell Charlie to have fun out there. Five.

Speaker 1:

Have some fun.

Speaker 15:

Thanks a

Speaker 14:

lot guys.

Speaker 2:

Talk to soon.

Speaker 1:

Chad, gotsomeq.com. Sales tax and autopilot spend less than five minutes per month on sales tax compliance.

Speaker 2:

Salestaxhq.com. A number of the fellas in the chat access to five.

Speaker 1:

Break it down

Speaker 2:

for says it's pretty good. The writing ability feels a little nerfed. Says the way it writes feels a little programmatic rather than sounding human.

Speaker 1:

Mhmm.

Speaker 2:

Reverts to using points even for things like blog posts and We pull it overly complicated language for simple stuff. Techno chief says, it's crazy fast.

Speaker 3:

Oh, that's good. And

Speaker 2:

Rat Ratliff says, yeah, I was just gonna say that very very very fast. Z Jean Ahmed says, junior devs are barbecued.

Speaker 1:

Tyler anything from your side? Before we talk to Guillermo from Vercel?

Speaker 3:

I think maybe a good way to to like vibe check at least on the timeline is that it's almost like a 4.5 kind

Speaker 4:

of thing.

Speaker 1:

Sure. Where it

Speaker 3:

comes out people are like this model totally sucks. Look at the benchmarks. It's like not it's not some massive improvement. It's like a fair, you know, not a step change at all. But then you you start playing with them and it's actually like, okay, this is actually a good model.

Speaker 3:

Yep. Like a lot of stuff I'm seeing people post like, oh, that's actually like really like interesting output, stuff like that. But seems

Speaker 1:

good. Can we do the green text eval? Green text bench? Yeah. Yeah.

Speaker 1:

The

Speaker 2:

TBPN intern.

Speaker 1:

Yes. Yes. Yes. Yes. We'll let you cook on that and then we will move on to our next guest Guillermo Rao from Vercel coming in to TBPM.

Speaker 1:

For the second time, great to see you Guillermo. How are you doing? I like the action hall. Thank you. Welcome to the stream.

Speaker 1:

How are

Speaker 5:

doing today?

Speaker 1:

Do you think GPT five could beat me, you, couple of the boys here on Dust two in Counter Strike?

Speaker 14:

Easily? Easily. Yeah. It depends on the frame rate. Right?

Speaker 14:

Like Yeah. On a long enough timeline, we're we're cooked.

Speaker 1:

We're cooked.

Speaker 14:

But we might frag it short term or we might be faster.

Speaker 1:

Amazing. Yeah. We got to I mean, I'm sure we'll get to GPT five but what's your reaction to the world model stuff from Google? Do you think do do you have an idea of where that's going as a product? It feels like a GPT two level technology, very much a research focused technology.

Speaker 1:

I'm sure OpenAI is working on something too, and a lot of the labs will work on it. But what what's your theory behind the the the generative video game world model stuff that's going on?

Speaker 14:

I mean, number one, super fascinating. Right? I think when when we think about the future, I always think about Jensen's the future of of applications will be that pixels are generated, not rendered. So as as much as we're really excited today that g p t five and v zero are really good at writing code that then renders interfaces Mhmm. I think it's also cool to dream of a world where we're just going directly from GPU to PixelGrid.

Speaker 14:

Right? And but if you remember, like, a couple years ago and maybe a decade ago, there there was a lot of excitement of video games that were gonna be livestreamed from the cloud.

Speaker 1:

Yeah. That's right.

Speaker 14:

Where your input, your keyboard you could have a very thin client. Your input, your keyboard, your mouse movement was gonna be dispatched to the cloud. Yep. Gonna have GPUs near you.

Speaker 1:

Google Stadia was big there, and

Speaker 14:

then That's right. One live was

Speaker 1:

in Microsoft's gonna be in the game and is still Microsoft is actually still pulling it, still pushing it very heavily.

Speaker 14:

Awesome tech, but not mass adoption. Yeah. But if you look at, you know, a lot of these technologies are being really successful in letting people get more creative and test things out. Mhmm. A lot of the use cases that we see for v zero and vybe coding are almost like a communication tool.

Speaker 14:

Like, I wanna prototype something. I wanna see what the what's possible. I wanna explore the latent space, and I think those world models are gonna be incredible just to inspire what the future of games could look like. Right? Just getting ideas for actually then shipping them in real, three d engine model.

Speaker 14:

I think short term. I think long term, all bets are off. Someone was just saying in the chat, you know, junior devs are are roasted, or barbecued. I think that's

Speaker 4:

not quite true.

Speaker 1:

Okay.

Speaker 14:

Same for, like, three d, engine developers.

Speaker 1:

Give us the bull case for junior devs staying off the barbecue.

Speaker 14:

So the bull case for, I think, people in general is that you move from I mean, the the progression in the industry has been assistant to agent Mhmm. To team of agents, agent orchestrator. It's still really useful to have a human be the one that's sort of, like, managing the team. Yeah. So you're moving from, like, junior dev to junior eng manager, especially as these tools become more agentic.

Speaker 14:

In in the new version of vZero that's coming up really soon, you're starting to notice that vZero sort of splits the task between a little team. Yeah. You have the designer of the team. You have the PM of the team that's sort of working on the spec, you have the architect, you have the engineer. I don't if you saw Cloud Code announced, I think it's like a slash security review.

Speaker 14:

Yeah. You think of the think of that as having a security team or a team of agents or security researcher at your disposal. So junior dev as, like, a vertical skill might be a little barbecued, but junior eng manager so I think it's just gonna be the junior dev is so much more powered in this world if you allow yourself to be and you keep up with what these tools can do, and and and I think you you stay, you know, at the cutting edge.

Speaker 2:

Yeah. I mean, the obvious bulk case is if you're like, if if someone's a college student today, they can learn to code truly AI natively. They don't have to say, oh, we're an AI native organization now. We have to up upscale and kind of retrain people how to think. They can just naturally start to think

Speaker 1:

I remember that there was that same Altman post about how we'll look back on, you know, 93% of humanity was subsistence farming. And if you ask those people how what they think about our email jobs, they'd be like, you guys are crazy. And it's almost like in the near future, midterm future, maybe even long term future, it's like the number of individual contributors will be extremely low and almost everyone will be a manager and you'll become a manager much faster. You'll just be managing agents and then you'll be managing people who manage agents. But the job of almost everyone will become managerial.

Speaker 1:

May maybe that's what happens. I don't know. I'm not a 100% on it, but that that that's what that made me think.

Speaker 14:

Someone asked me yesterday, you know, what do you think the future of, the market of monitors looks like? Like, does it stay flat? Do people get more monitors because they're Yes. It's hilarious. Like, dogecoin trader analyst.

Speaker 1:

Yeah. Yeah. Yeah.

Speaker 14:

When you like like In future,

Speaker 1:

everyone has the hedge fund six monitor setup.

Speaker 2:

In the future, everybody's just gonna be at work monitoring

Speaker 1:

Maybe on their phone. I mean, I've noticed that that what you know when I was an individual contributor, I had three monitors. I was programming on all the screens and now I wear I mean I use my laptop during the show and then most of my work is done on my phone phone calls and then and then firing off messages. Yeah. Maybe maybe we've actually shift away from monitors and go further into voice interfaces.

Speaker 1:

You oh, I called the lead of my agents and then that agent relays it to some I

Speaker 14:

got optimistic on voice by the way because I've now seen it. Yeah. I did what we're we're we're cooking on a on a better mobile experience for vZero.

Speaker 1:

Sure.

Speaker 14:

And I was going back and forth with my head of mobile, and he was talking to vZero, and I was writing down in a pretty fast typer Yeah. But he beat me with voice. Yep. Using the local model in the phone, so there's still the question of, like, edge latency versus cloud latency, kinda like what we talked about with three d. But I do think voice is gonna play an increasingly, exciting role in in programming, which is kinda wild.

Speaker 14:

I would have never imagined. I I've always been about, like, typing benchmarks in WPMs. Yeah. But voice is is coming.

Speaker 1:

Yeah. Yeah.

Speaker 2:

Yeah. How do you think about competition broadly in developer tooling, code gen? I mean, it it right now, it seems like there's just so much

Speaker 1:

It feels like massive TAM expansion moment. Every ripping. TAM

Speaker 2:

expansion moment, but at the same time, winners will emerge. Solvation. Yeah. You're you're playing to win. And, yeah, I'm curious.

Speaker 2:

You know.

Speaker 14:

Yeah. On some level, we're playing both sides of the bat. We what we announced today that's really exciting is v zero with g p t five support. Mhmm. So you can go to v0.dev/gpdfive, and we'll use g p d five in combination with our model pipeline that makes it really good at vibe coding, especially for nontechnical folks.

Speaker 14:

But we also, on the Vercel AI cloud side of things, we open sourced Basically, you can create your own vibe coding platform powered by any model.

Speaker 1:

Was joking about this with Tyler. Videcode me a vibe coding platform, please. That's right. I'd like one.

Speaker 14:

Make no mistakes. Yeah. Videcode me a billion dollar company.

Speaker 1:

Yeah.

Speaker 14:

No mistakes. But basically, we are giving people that. It's a starter kit.

Speaker 1:

Sure.

Speaker 14:

And by by the way, the fundamental question that a CEO asked me the other day was, is Bytecode a product or a feature? Or is it both? You know, it's TBD. The case for feature is okay. So there's gonna be lots of systems of record.

Speaker 14:

Mhmm. Think Salesforce

Speaker 1:

Yep.

Speaker 14:

Snowflake, Databricks. Mhmm. And increasingly, they're gonna incorporate cogen capabilities into their platforms. They can use a lot of these capabilities that we just open sourced, and you'll go to their the existing place where you have the data. Kinda like what we've talked about for decades of, like, are you bringing computer the data?

Speaker 14:

Are you bringing vibes to the data? Right? Yep.

Speaker 1:

Are you

Speaker 14:

bringing cogen to your own platform?

Speaker 1:

Yeah. And so it used to used to bring like a bill like a, you know, dashboard builder, and it would have a couple widgets. And now I could just potentially, if I'm plugged into some sort of data source, some system of record, I could say, vibe code this app on top of it. There's been some tool, retools played in this space, Zapier a little bit. But, yeah, I mean, this feels like, you know, we're getting we're not fully in the just the pixels are generated, but we're, you know, generative UI, generative application on top.

Speaker 1:

And being bespoke and ad hoc.

Speaker 2:

Also think it's important to understand the line between consumer vibe coding and just generating femoral software and websites and things like that versus enterprises, which will have a lot of different use cases. When I look at the when I look at the vibe coding market and I see businesses that are that that are almost entirely consumers just creating things for fun, I think that has Yeah. To be a tough business because it's a hyper competitive market and consumers are flaky. They'll create something, you know, for fun, but they'll churn in month two because, you know, it's not they're not running a real business. Whereas a a business knows, hey, we'll pay for this on on a long term basis because we're we have a use for it all the time from this product manager to an engineer over here to somebody in marketing, etcetera.

Speaker 14:

Yeah. The the other side of the of the equation is how do you make this by the coding tools work really well for enterprises? Frankly, the most surprising emergent thing that I've learned is just how much demand there is in enterprises for Vibe coding. And this is because a lot of the the traditional thing has been the people that understand the business are sitting over here. Yeah.

Speaker 14:

The people that understand the code are sitting over here, and their communication is fraught with peril. Mhmm.

Speaker 1:

Like, they

Speaker 14:

don't speak the same language. They kinda, like, resent one another. I love to tell this story. I was meeting with the CEO of a very successful company who's telling me that engineers, like, asking a feature to his own engineers, felt like petitioning the government.

Speaker 1:

Yeah.

Speaker 14:

Even though he's the CEO, like he's struggling to, like, make the case. And, and please, like, get me in your next sprint, get me this feature. So dive coding actually solves that problem. All of the PMs, designers, marketers, business users that previously only had access to what, like, Jira and, you know, to do a little bit of product management tools and writing p r PRDs and

Speaker 1:

So rude.

Speaker 14:

Those kinds of things. They they weren't able to ship PRs. They weren't able to, you know, ship software, and now they can. And so the the opportunity is, how do you actually make this secure? Yeah.

Speaker 14:

How do you make it high quality? How do you create a guardrails? And those are those are tricky problems, and I'm I'm really happy that some of them, are easy to overcome and at least for us. And some of them are active areas of research, but I think the the enterprises really have a a strong case for this.

Speaker 1:

Yeah. Can you walk me through, like, tool use? I mean, we were talking to the OpenAI folks about GPT five being, like, really, like, a summation of, like, standing on the shoulders of giants. You get a Python REPL. You get a web browser.

Speaker 1:

You get, you know, the ability to kind of run cron jobs now. There's voice and, you know, all sorts of different tools kind of wrapped up into one multiple models. You can trigger reasoning chains if it wants. It can do all these different stuff. And that's actually the benefit of, like, this isn't just a bigger model.

Speaker 1:

It's like a lot of it's a it's a next version of a thing. It's more like switching from the iPhone 12 to 13 than going from the iPhone to the iPhone three g. It's not just a new technology that's in there. But in the in the the world of vibe coding, what are the tools that you want to think about adding? I know that basically every vibe coding platform, you know, recommends a database.

Speaker 1:

But I was we were talking to Harley at Shopify yesterday, and there's a world where if I go to a vibe coding platform and I say, I'm building an ecommerce website, it should probably just be like, hey, I'm gonna do Shopify under the hood and I'll vibe code the landing page on top. But how are you thinking about the landscape of like tools that you could pull in full open because there's open source repos that are like full projects that you could pull in and then just start customizing on top of. It's kind of this big continuum.

Speaker 14:

Yeah. There's a couple layers. Yeah. On the foundation model layer, what do you want is a model that is exceptional at tool calling.

Speaker 1:

Mhmm.

Speaker 14:

Whether it has built in tools or whether you register them yourself Mhmm. This is a, like, sort of silent war that has been going on. Like, if you talk to devs, what are you optimizing for? Tool calling quality. Why?

Speaker 14:

Because to demystify the word agent, what an agent is, it's a loop of tool calling that builds up context over time. Mhmm. That's all an agent is. So let's to give you an example completely of b zero. B zero is becoming more and more agentic over time.

Speaker 14:

One of the things that it can do is it can take a screenshot of the thing that's building and reflect on it.

Speaker 1:

Mhmm.

Speaker 14:

So today, I I live vibe coded to an audience of web three and crypto engineers, and they told VZero, hey. Make this dark mode. And initially, VZero dust me dirty. He's like, he changes some things with dark mode, and then it kinda astonished me because I was like, ugh, I have to explain to this audience. It then takes a screenshot, looks at it, and keeps fixing it.

Speaker 14:

And I was like, this is literally a developer that's alive on autopilot. And the reason it's in autopilot is because he has access to these tools, like, looking at the web browser. Another one is research. I've I've coded an example of build me a Substack clone for cryptocurrency news, And the agent didn't know what the cryptocurrency news were, so he started doing research on the Internet of, okay, Ethereum passed certain price and whatever. So and then you're talking about the tools over the Internet.

Speaker 14:

So to demystify another topic, MCP is really exciting because it's a new protocol for registering tools that your agent doesn't locally have. So those tools that I just talked about, we gave them to VZero. Here's a deep research tool. Here's the, screenshotting tool. And those are will likely become the new services when you think about, like, AWS of today.

Speaker 14:

If if AWS was an AI cloud, which is kinda what we're trying to build at Vercel, like, you think a lot of those tools are gonna become as a service. Like, bring me deep research as a service, bring me browsing and screenshotting as a service, and so on. But then you have MCP, which allows you to okay. I need to sell something online. Alright.

Speaker 14:

So now there's an MCP for Shopify. Now there is an MCP for Stripe. Mhmm. There's even crypto MCP. So it's really exciting.

Speaker 14:

Like, now it's, like, the ultimate choice for a builder, and you don't have to go and learn all these things. You don't have to this is almost like a discontinuity of the Valley trend of, like, if we build amazing documentation, they will come. This is more so if the agent picks you, they will come. Right? And so there's a lot of, figuring out right now, like, how do I make my infrastructure?

Speaker 14:

How do I make my product to be loved by these agents? And the MCP promises to be one of these first, things that you are in control of.

Speaker 1:

That makes a ton of sense.

Speaker 2:

Last question. Someone on your team named Josh is in the chat. He wants to know what what does he need to do to get a Twitter badge?

Speaker 14:

Oh. Well, yeah. A 100 k downloads of the AI CLI. Think we've been talking.

Speaker 2:

Okay. Okay.

Speaker 14:

Got the good work. Thank you.

Speaker 1:

It's on. Got your

Speaker 2:

work cut out for you.

Speaker 1:

It's burned into the immutable record of this livestream and the future training runs.

Speaker 2:

Best of luck, Josh.

Speaker 14:

Accountable now. We're gonna hold

Speaker 1:

you accountable to that, Guillermo. Great seeing you.

Speaker 3:

Awesome. To see you.

Speaker 1:

We'll talk to you soon. Congratulations.

Speaker 2:

Talk soon.

Speaker 1:

Let me tell you about Fin dot ai, the number one AI agent for customer service, number one in performance benchmarks, number one in competitive bake offs, number one in in g two, number one in having an Irish founder.

Speaker 2:

That's right.

Speaker 1:

And we will invite our next guest to the stream from factory.ai. Welcome to the stream. How are you doing? Good to see you.

Speaker 16:

Hey. How's it going?

Speaker 15:

Glad to

Speaker 1:

see you. Great. Thanks so much. Kick us off with an introduction on you and the company.

Speaker 16:

Yeah. My name is Eno, cofounder, CTO at Factory. We are building a platform for enterprise software developers to perform what we call agent driven software development.

Speaker 1:

Mhmm.

Speaker 16:

So basically, more than just code, bringing agents into every stage of the software development life cycle. So think coding, code review, maintenance, incident response, documentation. We think agents should be a part of all of this, and we think that they should be driving a lot of that menial component while you think at the high level about how to plan and structure the work.

Speaker 1:

There's so many different like, enterprise is a narrow category. It's a, you know, oh, not consumer, I guess. But Yeah. It's such a wide it's such a wide category. Is there a beachhead?

Speaker 1:

Is there a certain type of project within within different industries or specific industry that's getting especially an especially large amount of value out of factory these days?

Speaker 16:

Yeah. Totally. I think that one thing that we see a lot and typically when we say enterprise, we're thinking greater than 1,000 engineers. Right? Like, 3,000.

Speaker 16:

And one reason why we focus on that larger scale, you tend to have these large organizations where people are the bottleneck is not code. Right? The bottleneck is how do we plan a migration of a 185 code bases to this new framework, and there are 3,000 developers that are gonna touch this over the next six months.

Speaker 17:

Yep.

Speaker 16:

And an SI just told us the quote is $80,000,000 to do it. And we have to figure out

Speaker 1:

how to not replatforming broadly is one of the major major tasks for many many enterprises. Right?

Speaker 16:

A 100%. Modernization and migration is huge.

Speaker 1:

Yeah. That makes a lot of sense.

Speaker 2:

How do you estimate the market that market size? And is that is that where you guys are leading with on the GTM side in terms of trying to find these legacy companies that are maybe not even using cursor yet. I mean, who who we we talked to the CEO of GitHub yesterday and what

Speaker 1:

50%, didn't he say?

Speaker 2:

Or was like at least half of their user base is not using any AI tools.

Speaker 1:

Yeah.

Speaker 16:

Yeah. Totally. I I think that the the thing that we hear often, we pretty much only deploy into companies that have already tried an AI native IDE or have an autocomplete tool deployed. And I think that the thing that we hear often is you you sort of hear, like, these numbers thrown around, like, five x, 10 x. And then in practice, when you adopt an AI IDE, you see 10%, 15%.

Speaker 2:

And so

Speaker 16:

a lot of people are sort of saying, like, what is the delta there? Like, what causes that transition? And our our sort of argument here is that there is a workflow change that's actually required to really adopt agents in the life cycle. Right? And so if you're just sort of, like, accelerating an individual developer, that you can go a little bit faster.

Speaker 16:

But if you are able to parallelize and automate at scale, that is going to be that larger introduction of change. And so if you imagine the market here, there are companies where, you know, five or 10% of global payment transactions run on some COBOL system that was written forty years ago. Every developer is gone, and they are it's a ticking time bomb. Like, at some point, it needs to go to Java, but there's nobody who even knows how to do that. And so those are the types of projects where the market is so enormous because, you know, half the business runs on this legacy system.

Speaker 16:

Hundreds of billions of dollars.

Speaker 1:

Put it all in Lisp. Skip Java. Go straight to Lisp.

Speaker 10:

Yeah. Exact.

Speaker 1:

Python. Right? Yeah. Python would be the logical one. I'm sorry.

Speaker 1:

We're running behind, so we're gonna have to cut this short. But I wanna know more about how the enterprise coding agent market will develop. We could see one world where we wind up with, you know, GCP, Azure, AWS, like, you know, pretty comparable competitive. They've all had really great margins. It's been this oligopoly.

Speaker 1:

There's another world where you could see more specialization. One of these companies goes deep into high security environments or oil and gas or financial environments or specializing based on specific programming languages. As as the market develops, like, how do you think it'll play out?

Speaker 16:

Yeah. Great question. I I think that what's very clear is that the bulk of very large enterprise has a lot of similar problems, refactors, migrations, modernization. So a platform like Factory is able to deploy into that and solve problems quickly. I I think that there's likely to be like that sort of eighty twenty where Yeah.

Speaker 16:

There are going to be these very specialized providers that only focus on one sort of problem, and that will represent maybe, like, 20% of what's out there. And so it won't be, like, necessarily black or white, but we do think that the bulk of enterprises have a lot of similar needs, especially when you just get across a certain threshold of number of engineers, scale of code base.

Speaker 1:

Sure. Sure. Yeah. I mean, we we we even see that with the the the clouds where, you know, obviously, there's the hyperscalers, but then there are neo clouds. And we talked to Armada where they'll they'll send you a shipping container with a bunch of racks inside and put it in stranded energy.

Speaker 1:

So there will obviously be the a long tail here. That's a great take. Thank you so much for stopping by. Have a great rest of your day. And enjoy the GPT five upgrade.

Speaker 1:

We'll talk to you soon.

Speaker 2:

Fun out there.

Speaker 1:

Really quickly. Let me tell you about Adio customer relationship magic. Adio is the AI native CRM that builds scales and grows your company to the next level and we will be joined by our next guest from augment. Welcome to the stream. How are you doing guy?

Speaker 17:

Great. Thanks so much for having me.

Speaker 1:

And that's his name by the way if you're listening. His name is Guy. I'm not just calling him Guy. Anyway, please introduce yourself and what do you do? What does your company do?

Speaker 17:

Yeah. So I'm Guy Guarari from Augment Code. I'm a co founder and the chief scientist and we build AI coding assistants for large teams with large code bases. And so you can use AugmentCode to do question answering, to do development, to do refactoring, to do migrations, all the tasks that you do except that our product understands your large code base really well. And so that means less prompting for you and, faster and better results out of the agent.

Speaker 1:

Today, GPT five launches. It's kind of a rising tide, feels like it lifts all boats. Every company gets access to it. We've interviewed a number of companies that are building on top of GPT GPT four five. Yes.

Speaker 1:

But but in general, how do you think you can use GPT five? Are there any pockets of value that you think you can uniquely take advantage of?

Speaker 17:

Yeah. Great question. So we've been we've been trialing the model for the past few weeks. And what we found is that the GPT five is a very thoughtful model. It likes to make a lot of tool calls.

Speaker 17:

Mhmm. It likes to ask clarifying questions of the user before starting to make code changes. And so the place where I reach out for GPT five is typically if I need to make large changes or if I'm trying to answer a very difficult question about the code base, I will let GPT five take a crack at it. It will churn for a while making lots of tool calls, just making sure it got it right, and probably find all the different places in the code where it actually needs to make a change. And so I will typically let it run-in the background and come back to it, and I will often get a high quality result out of it.

Speaker 1:

Are there any features or integrations that you're hoping GPT five will roll out in the future? We talked to a couple of people who are like, like, we want models that have access to as many tools as possible. And and you can see with the MCP boom, more people are trying to make their their services, their products accessible to these models. Is there anything that you see as potential low hanging fruit to just add to the capabilities?

Speaker 17:

So I think for us, we work hard on developing our own integrations and our own tools, building them into the product rather than relying on GPT five or other model vendors to do so. We have worked closely with OpenAI to improve the prompting around our tools so that the agent kind of works flawlessly. I think the thing that would be very nice, I think one of the previous guests mentioned screenshot tool. I think that's a very, yeah, that's a very nice way to close the loop on front end software development. Sure.

Speaker 17:

Just like we saw how on back end software development running the tests automatically really helps the the agent iterate until it gets to working code. So I think having more support for screenshotting and things like that that close the front end gap would be That's very

Speaker 1:

nice I to I wasn't aware that the that that screenshots weren't flowing through. I feel like when I've when I've triggered operator, I'm getting a a a view, a web view into the website. But I wasn't I wasn't aware that that wasn't like being passed through easily in the API and you still kinda needed to build that yourself. Where else we were just talking about this. Like, where are the biggest pockets of value right now for AI coding tools generally?

Speaker 1:

Obviously, everyone knows like the vibe coder who's just the designer who's learning how to use software for the first time. Then there's the experienced developer going from a 10 x to a 100 x with better code completion. Then there's the enterprise that's you know maybe doing replatforming. Where else are the interesting pockets of value that are maybe on the horizon to be unlocked with new models?

Speaker 17:

Yeah. So on top of everything you mentioned, certainly, the the inner loop of software development, that's where we've spent most of our time at Augment Co developing product for. Yes. You can have a senior developer starting using agents, starting to use multiple agents in parallel and unlock 10 x or more productivity gains. What we're starting to see now with our tools is the beginning of automating software development life cycle tasks.

Speaker 17:

So with with AugmentCode, we have a CLI tool now where you can take the full power of our context engine and the agent, the thing that really understands your code base, and you can start automating tasks in the background. And so we're seeing more and more developers saying, oh, this is great. Like, I can break out of the IDE now. I'm using the agent that's already familiar to me, but I'm starting to automate code reviews. I'm starting to automate incident response.

Speaker 17:

I'm starting to automate looking at production logs and automatically assigning tickets based on error logs that I'm seeing. All kinds of new automation use cases that we're seeing just because agents have gotten so good and kind of really understands your code base.

Speaker 1:

Are there are there high stakes pockets of software engineering work that most of the AI tooling has kind of stayed away from? I'm imagining like the the the high stakes database migration. Where where is the the the kind of sticky part of the industry? I was reading a blog post by someone who's doing like very advanced cyber security pen testing and they were saying like just the creativity of the models wasn't quite there yet to really come up with the to really act and embody like a white hat hacker who is going for a bug bounty. But where where are the pockets of still like intractability where I guess if you are, you know, in the the individual contributor, you love just just, you know, coding from scratch, that's where you want to stay for at least the next couple of months.

Speaker 17:

Yeah. I think still the attention of all the models we've seen and all the agents we've seen around making proper design and architecture decisions Mhmm. That's still high stakes and still the ability is not there. Because if you do complete vibe coding and you just let the agent go and do whatever it wants, in the beginning it looks amazing, the code works, and it's all really good. But once you get to low tens of thousands of lines, the bad decisions that were often made around the design and architecture start to show up and development slows down.

Speaker 17:

So that's where we still see a limitation of today's agents and where you still have to supervise the agent fairly closely in order to make sure that you don't get stuck later on. Perhaps this will change in a year, but today, I would say all these decisions that you make around how the code is structured still requires close supervision and still high stakes because it can really slow your project down if you let it go autonomously for long enough.

Speaker 1:

Yeah. That makes sense. Well, you so much for stopping by. We will talk to you soon. Have a good rest of your day.

Speaker 2:

Thanks very much. Cheers.

Speaker 1:

Let's check-in with Tyler on the timeline. Tyler's manning the timeline. How are the vibes? Are there any new posts that have hit the timeline? Are we still in turmoil or has the narrative settled?

Speaker 3:

I think vibes are are picking up a little bit. You're starting to see people post like oh this is something I made. Now you can see on Ella Marina it's number one No way. 35.

Speaker 1:

Wait wait wait so what's going on with the Polymarket then?

Speaker 3:

So Polymarket is still

Speaker 1:

Google heavy?

Speaker 3:

Yeah. Think I guess they're just pricing in Gemini three. Oh, okay. Everyone's still trying to sure I was actually very surprised to see that it was number one. Yeah.

Speaker 3:

Okay. But yeah. Maybe later we can show some of the posts.

Speaker 1:

Yeah. Yeah. Yeah. Yeah. That'd be great.

Speaker 1:

Cool stuff. Well in the meantime before our next guest let's tell you about Eight Sleep, get a pod five five year warranty, thirty night risk free trial, free returns, and free shipping. And we will have our next guest join us from CodeRabbit. How are you doing? Good to meet you.

Speaker 12:

I'm good. Good to meet you as well. Thanks for having me here.

Speaker 1:

What's your reaction to GPT five? How long have you been playing with it? What are the biggest improvements that you've noticed?

Speaker 12:

Yeah. I wish they're mind blowing. Right? I mean, we have been playing our team has been playing for, like, a few weeks now. Tested a few snapshots.

Speaker 12:

It's amazing. It's a generational leap, we would say. Like, we have been using OpenAI models. Like, I don't know how much you know about CodeAvir. It's been, like, couple of years we have been on OpenAI and Tropic.

Speaker 12:

And our product is a very reasoning heavy product, like, of the very few use cases where you have a PhD style problem and say, we have to do code reviews. That's what CodeRabbit does. Like, users open up pull requests. Our agent uses reasoning models to find issues like race conditions or security issues and so on. Sure.

Speaker 12:

So, yeah, so we've been testing GPT five on some of the hardest pull requests we have in our golden dataset. So we maintain a dataset where we track progress of different models and progress of AI in general. So we have, like, many problems that no model is able to use, solve so far. Like, I mean, in g p d five. But so far, has the highest score.

Speaker 12:

We would say it's, like, almost two x better than the next or three or Sonata or Opus at this time.

Speaker 1:

What's the customer value there? You think that, like, all the customers just notice that the product gets better? Are you gonna upsell folks? Are you gonna, like, how do you play this given that this model is now in public availability? Every company, every competitor can can access it as well.

Speaker 12:

Yeah. There's no upsell. Like, that's the thing with AI. I mean, for the same price or even, like, better prices, you're getting much more AI, much better BI. Like, that's the whole idea how fast this space is evolving.

Speaker 12:

So, yeah, from the pricing point of view, we don't see, like, this to be a, like, a separate plan or something in our product. I mean, the prod for the same price per month, customers will now just get better quality of results with CodeRabbit.

Speaker 1:

Mhmm. What's next for the business? What kind of customers are you going after? Who do you think has been on the fence and this release is gonna be the thing that gets them to actually jump into the world of AI.

Speaker 12:

Yeah. We could track the top line metric. Like, one of the things we track very closely in the company is, like, how many sign ups to the paid customers we get. Mhmm. And that number has been constantly improving since g p d four g p d four turbo.

Speaker 12:

G p d four, we actually dipped. And so there was a time when g p d four was almost like a Windows Vista of releases, like it's funny, like, how we kind of trusted the evals, and we thought it's the same model, but in in a way, was inferior in many ways. Then we saw a huge improvement after o one came out. O one preview was a game changer for us even at that time. Our conversion doubled actually.

Speaker 12:

You know? Right? I mean, so we went into, like, more like close to 30% success in getting the paid users. And now with g p d five, we're we can see another big jump in the in the number of people who start becoming paid customers and and how many people churn. So those are the real numbers.

Speaker 12:

Yeah. Like, one is, like, wipes, like, how people, like, respond to the model, and we get angry tweets or not even though that's the other part. But the other thing is, like, the actual revenues, whether it moves the needle for us, and that to me to be seen. Like, one of the things we have seen, even though you test these models in a lab, they be it's not like a huge dataset, but once you actually are in the wild, you see hallucinations, some of those scale issues at scale pop up. So those are something we'll still be observing over the next few days to see whether it's, like, spot only, like, eighty percent of the cases.

Speaker 12:

But then if the false positive rate, the hallucinations are too high, then also it's not a great model, but that remains to be seen.

Speaker 1:

Yep. That makes a lot of sense. Well, thank you so much for stopping by. Congratulations on a new new tool in the tool chest.

Speaker 2:

New toy.

Speaker 1:

We will talk to you soon. Have a great rest of your day.

Speaker 2:

Cheers.

Speaker 1:

Bye. Let me tell you about public.cominvesting for those who take it seriously. They got multi asset investing, industry leading yields. They're trusted by millions. Millions.

Speaker 1:

The the chat is going wild about public trading. The SPX six six thousand nine hundred. I think that comes from someone talking about, like, the non mag seven stocks or something. There's been, people benchmarking the the Mag seven versus the

Speaker 2:

The big news while we were live or earlier today, Trump signed an executive order that is opening up four one k's to digital assets and private equity.

Speaker 1:

What what's what's crypto doing? Is it ripping?

Speaker 2:

Bitcoin is up a couple points. Last time

Speaker 1:

This point, you know, where's it gonna go? It's already so high. I mean, this is like there there's been so many It

Speaker 2:

it could go up. It it could go down.

Speaker 1:

Yep. We'll

Speaker 2:

have to wait and see.

Speaker 1:

Tyler, anything else notable from the timeline? What have people built? I see this GPT five just one shot at a Minecraft clone.

Speaker 3:

Yeah. I think that's one of the cooler things I've seen.

Speaker 1:

Okay. So basically it wrote it wrote code to generate this game. It's not generating the pixels. You can do so many different things. Like you could generate you could generate a video, generate a world model, generate code that generates a game engine.

Speaker 1:

You could generate code that runs on Unreal Engine. I don't even know what they're using

Speaker 3:

What they now in on actual Chateapputy in there's like a native like it's like a music player. It's almost like GarageBand. Mhmm. You can say like, if you prompt to like build I I saw Sam Owen tweet about this. You prompt like to to do some kind of like beat or something.

Speaker 3:

Mhmm. It'll like make an interactive like garage band almost interface in there. That's cool. I was playing with that earlier.

Speaker 1:

Yeah. I I do wonder how many of these features that we're seeing like where does OpenAI want to keep things in the b to b world and let other companies build versus just build it as a consumer app like Yeah. Like will ChatGPT eventually just let me push your website? Like will it will it become a vibe coding platform? At least like a basic one.

Speaker 1:

Like Yeah. It's not it's not the most advanced coding environment but it can definitely write some code and execute it for you and do some stuff.

Speaker 3:

Yeah. Well, it's funny because like it used to be you would have a like so there was like GBT 3.5 or something and people on top of that built a vibe coding thing. So you could use that to build your own vibe coding thing. Yeah. But now you can just go straight from chat GBT to build your vibe coding platform.

Speaker 3:

Yeah. But soon maybe it'll just be the vibe coding platform.

Speaker 1:

Yeah. Surface area of this stuff is is very interesting. Clearly, they're going after health care and therapy. It's it's interesting that they've kinda stayed away from legal even maybe that's just the the the the dynamic of the sales process and the dynamic of that particular market. But, I mean, increasingly, you can you can just ask more and more questions of ChatGPT.

Speaker 1:

So the the consumer to business, like, bleed over, there's certainly a world where just giving everyone in your organization ChatGPT is a substitute for a bunch of different SaaS products. So it'd be interesting to see where that developed. What what are you thinking about?

Speaker 2:

Nir says there are concerns that the number used to represent our AI's intelligence does not in fact represent its intelligence. Worry not to address these allegation, we've added three new numbers.

Speaker 1:

Near. Yeah. Near is building something that's like not particularly benchmarkable. Right? Isn't it a companion?

Speaker 2:

It's beyond benchmarks.

Speaker 1:

Beyond benchmarks. Well, in completely other news, Anderol opens a Taiwan office and begin selling AI powered attack drones to Taiwan. Palmer Luckey has said he wants to turn Taiwan into a prickly porcupine. We're in the age of spiky intelligence. That spiky intelligence will be onboarded onto the AI powered attack drones and deployed in Taiwan to keep it safe.

Speaker 1:

What else is going on in the timeline while we wait for our next guest from OpenAI to join?

Speaker 2:

Spore says, raise your hand if you were not automated today. I'll raise my hand.

Speaker 1:

I was not automated today.

Speaker 2:

Not yet.

Speaker 1:

Yeah. We made it through. Sebastian Bubak says, here at OpenAI, we've crashed we've cracked pre training, then reasoning, and now we're experimenting with new set of techniques that maximally leverage their interaction. GPT five is just the first step in this direction. We're excited to incredibly excited to see where scaling this up will lead us, and it's the unicorn test, I believe.

Speaker 1:

And, the latest unicorn is really really good. That is that is a creative interpretation. And, I think it has a draw of this with like SVGs. Anyway, we can talk to our next guest about it.

Speaker 2:

Last post. Jira ticket says, went to the permanent underclass party and everyone knew you.

Speaker 1:

Anyway, back to the series interviews. Welcome to the stream Max. Good to see you. How are you doing?

Speaker 2:

What's happening? Nice to meet you guys.

Speaker 9:

Yeah. Doing well. It's a relief to have this launch out in the world. I think it's know, we've been working on this for the last few months now, and it's exciting to let the whole world see what we've had.

Speaker 2:

Yeah. Just a few months?

Speaker 12:

It's been

Speaker 15:

I don't know.

Speaker 9:

It's been a

Speaker 4:

little while.

Speaker 1:

What's the what what's the actual launch day like? Because you're actually getting this out into the world. The GPUs are on fire or about to be on fire, warming up. But is that is that out of your purview? There is a different team for that, fortunately.

Speaker 9:

So right. So I run I ran a lot of the research for g p d five. I don't necessarily handle the deployment, but I do get dragged in when the GPUs are on fire. Okay. I I think we're we're we're moderately burning right now.

Speaker 1:

Okay.

Speaker 4:

Okay. Like a

Speaker 9:

two alarm fire.

Speaker 1:

Yeah. Yeah. Yeah. Is it is it materially different? I mean, this is a launch day, but we'll probably discover like the Studio Ghibli capability once it gets out into the long tail of like, you know, hundreds of millions of people try it.

Speaker 1:

Someone comes out with some genius thing, then everyone's doing that, and then the GPUs. Because I I feel like the Studio Ghibli thing happened like a few days after the launch of Images in Chat GPT. It

Speaker 9:

it did. It was it was pretty fast, but it Yeah. Within about a week. Yeah.

Speaker 12:

I think,

Speaker 9:

in this case, we're gonna we're gonna see that here.

Speaker 1:

Okay.

Speaker 9:

I think coding, you know, if I had to take my bets for what the Studio Ghibli thing is gonna be, it's Yeah. It's coding. That's the place where I think g p d five is, like, most tangibly a huge leap ahead of g p d four, and ahead of o three.

Speaker 1:

Do you think there's a chance that that that the coding will mean a Studio Ghibli style meme or or kind of like and and what I mean by that is that is that, like, image generation is incredibly valuable in the context of, like, Hollywood will be using AI to chroma key and and rotoscope in a professional environment. But Yeah. What was special about Studio Ghibli was that anyone was making these custom images. And I could imagine a world where, you know, even going from like the levels.io example of like iVibe coded a flight simulator. If we wind up in a Studio Ghibli moment for coding, I would imagine it's, everyone built their own game today.

Speaker 9:

I think that's pretty much it. Yeah. Okay. So I I don't if you guys watched the

Speaker 2:

What what are

Speaker 9:

That was that was one of the things we had on the livestream. Like, you can just go into ChatGPT. If you try it right now, it might or might not work because the d p five rollout is still ongoing. But if you have five, you can just tell it like, basically, make me a game. Yeah.

Speaker 9:

And it will make it, and you can actually play it in ChatGPT.

Speaker 1:

That's amazing.

Speaker 9:

So I yeah. Discover that. And you don't the thing is, like, like with Studio Ghibli, right, like, for for Ghibli, you don't have to know how to draw to make it work. For this one, you don't have to know how to code.

Speaker 1:

Yes. But So can you share can

Speaker 2:

you share that chat and someone else can play the same game? How how does the kind of sharing mechanism work?

Speaker 9:

Yeah. There's a there You can do the share link. Yeah. We're we're I think going to try to make sharing for these a lot better over the next few days. Yeah.

Speaker 9:

That was p two after the p one and p zero of making the GPUs not completely melt.

Speaker 1:

Yeah. Yeah. Yeah.

Speaker 9:

But, yeah. I we will try to make it much more scalable.

Speaker 1:

Yeah. Yeah. I mean, the the Studio Ghibli thing is so interesting because it was it's not just that the model capability was there, but it's also like the prompt was two words, and and and it was so reliable that you always got a good result and you could personalize it. So even if it wasn't an I've seen people build Doom. I've seen people, you know, you can just buy Doom.

Speaker 1:

It's a real game. You can build it. But if you build it and I'm like, oh, that's cool. You did it in a vibe code environment or in ChattypuTi, like, that's awesome. But I don't necessarily wanna go do that for myself.

Speaker 1:

But as soon as it becomes personal, is what the studio I I had to see what I looked like as Studio Ghibli. I had to see what my favorite photo looked like. My favorite meme And looked like in Studio once that happens with games, people will will will eventually, you know, there'll be this like a memetic explosion and you'll see the GPUs will truly be on fire.

Speaker 9:

Yeah. I mean, I think even today, you could probably with GBD five do doom, all of the characters are like all the enemies are headshots of your friends, like

Speaker 1:

Here we're going. Now we're now we're real close. We're real close. It's gonna be something that's personal, something that, you know, you can express your own creativity through. Because I think people, they still latch on to that.

Speaker 1:

They don't just want, you know, a copy of what already exists. They want something new, and the Studio Ghibli moment was just new enough. Anyway, we should talk about actual research. We should talk about post training. What's the thing you're most proud of?

Speaker 1:

Like what can you give us on without, you know, immediately getting poached? What what what can you give us on on the actual innovation that went into GPT five from a post post training perspective? What are the like the the kind of keywords and and paths in the tech tree that we should be digging into over the next few years to understand how this works?

Speaker 9:

You know, I would say

Speaker 15:

the thing

Speaker 9:

that is most impressive to me about g b t five is how much getting all of the details right matters.

Speaker 1:

Mhmm.

Speaker 9:

Like, when I look at g b d five, you know, we had an early version of this thing a while ago that was kind of okay, but clearly did not meet our bar for revolutionary. Mhmm. And we're trying to figure out, you know, why why is that not as good as it should be. And the team basically just went off and did a deep dive over a couple of months of just completely rebuilding the post training stack for this model. And turns out that when you do that, you get what would have taken, you know, another order of magnitude worth of pre training improvements to to produce.

Speaker 1:

How much are you thinking in post training, in research about let's forget the benchmarks and just focus on user satisfaction like NPS score basically or like users user minutes or any of these other the real benchmarks.

Speaker 2:

Yeah. The intangibles about revenue.

Speaker 1:

But yeah. The feeling and the and the joy and the actual value that's delivered, because Studio Ghibli was a delightful moment. It wasn't a benchmark.

Speaker 9:

Yeah. I I think So that was something that we took very seriously for GBT five. Was like, look at what people are actually doing with chat g b t, and look at where the model is failing them. Mhmm. Either in the sense that the model is like, sort of like you said, it's not enjoyable to use.

Speaker 1:

Yeah.

Speaker 9:

And so we we did, I think, make a lot of progress on that.

Speaker 2:

Like Mhmm.

Speaker 9:

G b d five is much more engaging than our previous really smart models. Like o three, I I don't know if you guys talked to o three in the past. It's Yeah. A bit bland.

Speaker 1:

Sure.

Speaker 9:

And g b d five, I think, has a lot more character, is a lot more more interesting. But then also, like, I think for we really care about just actually being accurate. Like, giving if a user is trying to do something economically valuable with our model, we wanna make sure it lands correctly. Yeah. And so what we did there is just like, look at act the actual distributions of what people are doing with our models in the real world, figure out where the models are going wrong, build interventions to target it.

Speaker 9:

And that was where, you know, we got, I think, the most impressive improvements in g d five. Like, o three would just get things wrong and not tell you it wasn't sure it was it was incorrect, and g p d five is much much better about, like, actually being honest when it thinks it might not know.

Speaker 1:

Yeah. How explicit is are all the different pieces of the post training pipeline? Like, you have you have you know, post training. You have stop hallucinating. Give me the real facts.

Speaker 1:

You have make sure the text the the flavor the tone is pleasant. There's so many different things to optimize for. How much of that is, like, try and just blend it all up into one thing versus, like, explicit passes, chunk it out, like, split it up? How how much can you decompose the problem?

Speaker 9:

So, you know, my background is in reinforcement learning, and I think, you know, when you look at something like this, the magic is in the reward function. Right? It's in what you're actually telling the model to be good at. Yeah. And so, fixing things like hallucinations, to a huge extent, is essentially a function of just fixing the reward function.

Speaker 1:

Yeah.

Speaker 9:

Actually making it so that the model is reliably penalized for saying something that's false. And if you do that, all of a sudden, the model stops saying things that are false. Ditto for safety. Right? You know, on the livestream, Sachi talked a bit about the the way we've changed safety for this model.

Speaker 9:

And to a huge extent, it's it's just a function of we are we're actually putting out a paper today on the new safety stack for this model. And the core insight in that paper is just figure out what you actually wanna optimize for, which in our case is helpfulness conditional on not saying something that's actually dangerous or harmful. You know, write that down, figure out what that means as a reward function, then optimize it for it.

Speaker 1:

Mhmm.

Speaker 9:

It's really not magic at all. It's just again, it's it's what I said earlier. You gotta get the details right. You know, if at any part of that process you screw it up, the model will be unusable.

Speaker 1:

What's your current thinking on spiky intelligence? And is there some is there some flywheel that you can get started where you're identifying low points that aren't spiky enough and then you're like almost automatically setting up the infrastructure, the eval to then RL against to create a spike?

Speaker 9:

I think GPT five was a preview of what's possible in that respect in the future. Yeah. Mhmm. A step in that direction.

Speaker 1:

Do you think that there's a world where you get to a place where you're kind of it's weird because we're not hammering down the nails of the spikes, we're adding spikes.

Speaker 9:

Hammering up the spikes.

Speaker 1:

Metaphor that we're stretching a little bit too far. But is there a world where where where you can be doing post training or just adding capabilities in a more iterative cadence, so that as soon as you identify something the the response can be, yeah, we don't need to wait until GPT six to fix this. We can just add this capability because hey, we just found a pocket of users who are who are trying to do a thing, and they're not super happy with the results, and let's add this capability.

Speaker 9:

Yeah. I I think so. I mean, I think we, you know, we are gonna launch other models between now and GPT six. I think it's relatively common knowledge, but we do update the model in ChatGPT reasonably often.

Speaker 1:

Yeah. People talk about it all the time.

Speaker 9:

Yeah. Exactly. And, you know, I think we are now in a world where we can conceivably update that model and have it get materially better on capabilities too. Yeah. Not just on, you know, the personality is a little bit better than it was before.

Speaker 2:

Yeah. Going back to your note on the new paper that I guess you guys are releasing today, when you talk about optimizing for helpfulness, is there is part of that avoiding the model, you know, reinforcing? There's times when you wanna reinforce and and give kind of like confidence to the user that they're going down the right sort of like thought process and things like that. But then there's like a point where it can get too extreme in terms of maybe convincing a user of something that may be totally untrue. Is that is that what the paper gets at or or am I

Speaker 9:

just reading? So it's not specifically about this. Although I will say we we do explicitly train the model to not lead users down bad paths. That's something that I think we've started taking much more seriously over the last few months. As we've realized Sam talked about this a little bit, I think, back in May.

Speaker 9:

But ChatGPT is just way more important for people's lives now than it was a year ago or especially two years ago, and we do have to actually be very cognizant of what effects our models have on users. So, yeah, we we do very actively train models to not lead users down the right path. Don't fact check me on the releasing today. I know we're releasing it. I believe it is soon.

Speaker 9:

Soon. I think it's today, but I've also been in a hole dealing with launch all day.

Speaker 1:

Yeah. We're not big on fact checks here. We're big on the truth zone, which is just the vibes and

Speaker 9:

The vibes are we'll be publishing some information about the new safety setup.

Speaker 1:

At some point. That's great.

Speaker 2:

Yeah. I think I think a large part of the conversation around safety should be how reliant and and how useful the product has become to users and then the new level of care that you have to provide versus a while ago when it was just like people saying making a cute image or generating some text that they were gonna use in an email or an internal document and and realizing this this vector of usage, is this, like, companion confidant that is is becoming so prevalent.

Speaker 1:

Talk to me about post training for big big partners, enterprises, government organizations, what is transferring from the research that you're doing to something that can be offered as an enterprise level product?

Speaker 9:

Yeah. So we do OpenAI does partner with external companies to do essentially custom post training. Mhmm. That that is a thing that we do. And from that perspective, the stuff we do just directly transfers.

Speaker 9:

I'll also say that we we've put a lot of work into trying to make our models as general as possible. Mhmm. But to as large an extent as possible, if you wanna get really good results from our model, you can do it right on the API just by actually telling the model what you want it to do. Yeah. Right.

Speaker 9:

Like, g b d five, I think, is pretty comfortably our most steerable model ever. Yep. We we've heard a lot of really positive feedback about this, especially from, like, folks like Cursor.

Speaker 1:

Yeah. So if I came to you and I was like, I'm an enterprise and I need to generate a lot of Studio Ghibli's, you'd be like, what are you doing? Just prompt it. What are the examples of of companies organizations? Is it just is it just private information, private data sets that aren't available on on the open web?

Speaker 1:

Or is specifically, like, there is enough data out there, but there's just not the economic incentive aren't for your team to go and RL on, you know, gas station bench or whatever we're talking about here, hypothetically?

Speaker 9:

I think the answer is both. Yeah. It's definitely both. Yeah. Because, yeah, we're not going to target, you know, as you said, gas station bench.

Speaker 9:

Yeah. Of what people are doing with ChatGPT. Not not on our own right now, probably. Yeah. It's not mostly what people are doing with ChatGPT.

Speaker 1:

Exactly. But

Speaker 9:

if you have some application that's super valuable to you Yeah. Yeah. We can be convinced that it's important.

Speaker 12:

Yeah. Yeah.

Speaker 9:

Yeah. It's just not what our users are already trying to do.

Speaker 1:

What's the state of reward hacking and and and fighting that in in RL environments?

Speaker 9:

You know, I think we've actually made a lot of progress. There was some discussion of this around o three, that o three was like a little bit deceptive in ways that felt reward hacky, and Sure. GBD five is dramatically less deceptive than o three was.

Speaker 1:

What's an example of how that would manifest? Like, do you have like a canonical case study?

Speaker 9:

Yeah. I mean, the canonical thing is like you ask o three to write you some code and instead of actually writing some code, it changes the mean test so that

Speaker 1:

Changes the test case, right? Which is which is kind of hilarious. It's like one of the funniest things that AI has ever done. I understand it is very bad and it's not what we want, but it is just like, it's kind of cheeky in my opinion.

Speaker 13:

It's kind

Speaker 9:

of cheeky. It's also like, you know, I've I feel like if you spend enough time around real software engineers, they do actually do stuff like this pretty often.

Speaker 1:

I have 100% done

Speaker 9:

I I was gonna say I also have done that. For for for formal reasons, I won't say that I did it at OpenEye, but back when I well, I definitely did that.

Speaker 1:

Yeah. Of course. Of course. This is natural.

Speaker 2:

What do you think GPT six looks like? You guys You mentioned that you're gonna be shipping updates to five, but what are you most excited about? Where where are you most excited about going

Speaker 1:

And just really quickly, give us the date that GPT six launches?

Speaker 9:

Oh, man. Hopefully, we hopefully hopefully, six launches is a complete surprise to everyone.

Speaker 2:

I think that would be ideal.

Speaker 1:

Like a Beyonce album. Oh, yeah.

Speaker 2:

Hopefully, five just makes it and says, hey, it's ready now. You Yeah. Hit

Speaker 9:

think that would that would be a great thing for six, actually. I would love for Sixx to do all of the launch comms and to and to do the livestream. That would really great.

Speaker 2:

Live streaming is, that's the real AGI test. For sure. Right?

Speaker 1:

For sure.

Speaker 8:

I feel like

Speaker 9:

we're not that far off, actually. I don't know.

Speaker 1:

We're getting there.

Speaker 9:

I mean, video synthesis maybe, but, you know, talking through a script for thirty minutes, come on. Model's gotta be able

Speaker 5:

to do that.

Speaker 1:

For sure. Well, yeah, that'll be the the the the next Sora launch or something. We'd love to have you back on. But thank you so much for taking the time today. We'll talk to you soon.

Speaker 9:

Good to talk to you guys.

Speaker 1:

Congratulations. Cheers. Bye.

Speaker 2:

Congrats on the launch.

Speaker 1:

Let me tell you about adquick.com. Out of home advertising made easy and measurable. Say goodbye to the headaches of out of home advertising. Only ad quick combines technology, out of home expertise, and data to enable efficient seamless ad buying across the globe. And we have Scott Wu from Cognition coming in the studio for the fourth, fifth time.

Speaker 1:

I can't keep track anymore. Thank you for

Speaker 3:

taking the Thank you for

Speaker 1:

coming back. You.

Speaker 3:

It's great to

Speaker 8:

see you, guys.

Speaker 9:

How's it going?

Speaker 1:

It is fantastic. Gotta

Speaker 8:

be honest. Great week to be an application letter company, I gotta tell

Speaker 1:

you guys. I was about to say. You get a model.

Speaker 10:

Best thing

Speaker 1:

for you ever. Another win for Scott Wu. Wow. Wow.

Speaker 2:

Wow. 4.1.

Speaker 1:

Yes. So, yeah, how how big is this? Is it are we in the are we in the Uber, Lyft territory where, you know, you're gonna be, you know, in price competition between Anthropic and OpenAI going back and forth? Like, what what is the real benefit to your business right now from today?

Speaker 8:

Yeah. Yeah. For sure. So first of all, obviously, massive capability gains across the board. I think really, really impressive work that OpenAI has put together.

Speaker 8:

You know, people have talked about what's going on in the AI coding model race, and and I think by a lot of accounts, you know, Anthropic has generally been ahead for for for a lot of the last year, honestly. And I think at this point, OpenAI is is very clearly, you know, has very clearly caught up, and it's it's pretty neck and neck, I'd say, between the two right now. And so very exciting to to to see all this unfold and to see what's next. But but I think from our perspective yeah. I mean, code is is just such a such a core capabilities pill to use case, I'll call it.

Speaker 8:

Mhmm. And so, you know, being able to work with smarter and smarter models and do a lot of the work that we do, it just means that both Devon and Windsurf can be a lot more capable, a lot more intelligent, can predict what you wanna write or what you wanna do with a lot higher accuracy.

Speaker 1:

Yeah. It's it's almost it's almost, like, surprising that given the, like, cultural rigor at Cognition that you're not doing fundamental frontier research. So can you walk me through, like, what is the is the focus of being an application layer company? Is it is it UI go to market? I'm sure it's all of these, but in terms of the the hardcore software engineering, like what is important to get At some point, there's fine tuning and post training, but is that moving back into the purview of the foundation labs?

Speaker 1:

Or is there still work that you wanna do on top of the models or on top of the APIs?

Speaker 8:

Yeah. Yeah. It's a great question. Like, I I mean, I think the the core of being, you know, an an applied lab is is really just focusing on a very particular use case, on delivering real real just very direct results. And I think, you know, like, I I think the foundation labs are obviously, you know, incredible at training based models and and all this pre training and and and all of the work that they do there.

Speaker 8:

I think from our perspective, we we we want to work on a lot of very particular capabilities that apply to software engineering in particular, and then obviously, you know, run the whole stack from there to building a product, figuring out the interface, and and the UX, and then obviously bringing that to market and selling that. On on the capability side, there's a lot of particular stuff where you know, one way to put it is, I think the the base IQ is very much already there in the models, and you can see the the raw problem solving ability and I mean I mean, we've gotten some pretty insane results, you know, the gold medal at the IMO or all these other things. Right?

Speaker 2:

You called that, by the way.

Speaker 1:

Yeah. You called I think the first

Speaker 8:

I I mean, we were one point away to be fair a year ago. Right? So it was a it it it was on the way, I'd say. But but but but yeah. So so so, you know, you can really see the general intelligence improving with every single model model generation.

Speaker 8:

On the other hand, for Devin, obviously, you know, it's a very clear, like, step up in the general intelligence, but also you wanna be able to you know, if you ask Devin to to go debug your Kubernetes or to go and, you know, look look into your error logs and figure what figure out what went wrong or or or things like that. There's often a lot of very specific capabilities, and and that's where we find that, you know, the the post training of the RL is is is most effective. There and and and a lot of the kind of various work around the models that that turns out to be useful.

Speaker 2:

What about speed? A lot of people that have gotten access to GPT five are are, at least in our chat, are reporting that it just feels really, really quick. How how is that over time gonna impact the, I think a lot of people, you know, if they're using Devon today, task Devon with something and then maybe they go work on something else for a little bit or they're running multiple agents concurrently. But at some point, the agent could get so fast that you're just sort of like watching it and and work in real time and you actually wanna be engaged. But are we there yet?

Speaker 2:

Is it still a ways out? What do you think?

Speaker 8:

Yeah. It's a great question. I I think in general, I think async will continue on as a paradigm even as the models get faster and faster. One of the reasons that it should, by the way, is because there are a lot of real world thresholds that start to matter. Like, at some point, you're actually spending less time on token generation in the Devon life cycle, and you're spending more time on every time Devon runs the command to go install packages or Devon running the unit tests or, like, Devin pulling up the front end by itself or or or things like that that obviously take real world time.

Speaker 8:

Right? I think we are are honestly getting closer and closer to that threshold. But So so long story short, I I think, like, in the asynchronous mode, yeah, these things will get faster. You know, we'll see those gains, or we'll be able to spend a lot more time, for example, thinking about a single problem relative to the amount of, like, real world clock time that gets spent. I think for the synchronous use cases is where we'll see things really, really, you know, exploit with with speed, which is, you know, Windsurf and Cascade, for example, where where we see the speed gains really, really matter.

Speaker 1:

Speaking of windsurf, give us the update on the the chat wants to know about the windsurf tea and the eighty hour demand. How have the the buyout offers gone? What's the internal response been? Where'd that idea even come from?

Speaker 8:

Yeah. Yeah. Look. People people are stoked, honestly. Uh-huh.

Speaker 8:

And and I think I think from our perspective, it's, obviously, really important to to kinda just, like, unite and get to the point where we can just be one culture and and one kinda shared, set of values. And and this is how things are at Cognition is, you know, it's it's it's it's a pretty busy time. Like, we we we are at the inflection point of code, and we work like that too. Yeah. And and so I I think a lot of it for folks is is just kind of like, you know, we wanna make sure folk folks who who who who really wanna do this with us, you know, make that conscious decision to opt in.

Speaker 8:

And for for anyone who doesn't, obviously, we totally understand that there are a lot of talented folks that maybe that's just not the right thing for them right now or, you know, not at this time. And and so wanted to make sure that they worked well well taken care

Speaker 1:

of too. And to be clear with the buyout offer, that's on top of the actual acquisition deal that already went through. They already got the They're

Speaker 2:

already got fully vested. So so, yeah, I was thinking of the roller coaster. It's like you have OpenAI deal Yeah. Then the Google deal, the the the Cognition deal, and then they're like, wait, these guys work really really hard. Don't know if I'm cut out for this.

Speaker 2:

And they come back up again where they're like, wait, I can just go Yeah. You know, take a sabbatical and and figure out my next thing. So it's a great outcome.

Speaker 8:

Yeah. Yeah. No. And it's obviously is, you know, overall is a killer team that's been through a lot and so wanted to make sure that they're well taken care of.

Speaker 1:

Yeah. That's fantastic. Anything anything else you can tell us about the integration of Devon and Windsurf? How are the teams getting along? How do you see the products playing together in the long term?

Speaker 1:

Obviously, cross sell seems really obvious. They have the go to market team as well, but but how else are you thinking about the interaction maybe over the longer term there?

Speaker 8:

Yeah. Yeah. Yeah. For sure. Yeah.

Speaker 8:

A lot of obvious integration on the team, as you mentioned, with cross sell and so on. I I think the thing that's really exciting on products, which which I think actually comes along with the the these capabilities increases, is, you know, as the capabilities keep getting better, you start to take on harder and harder tasks with AI and with full agentic workflows. Right? And I think there's an interesting thing that happens where for a lot of the harder tasks, you really actually do wanna go back and forth between a synchronous and an asynchronous mode. You know?

Speaker 8:

And that's for a few reasons. You know, one of the reasons, obviously, is because there's a lot of review and and a lot of, like, looking at the pieces and and and thinking about the the you know, all all all the minutiae and the details of what you're implementing. I think another big reason for it is, you know, when you get started on on a larger project, you know, let's say you're you're sitting down as an engineer and you're saying, alright. I'm gonna go build this whole project today. You yourself don't actually know all the trade offs you wanna make, all the decisions that you wanna make, and so on.

Speaker 8:

Right? And so having a format where, know, for the decisions that need you to be there and you're involved setting the kind of the strategy or or figuring out high level what should happen, you're able to do that in a nice synchronous environment, which is naturally the the Windsor of IDE. Right? And then for the parts of the task that you can actually hand off and have an agent work on, you're you're giving that to Devin. Mhmm.

Speaker 8:

And figuring out how you go back and forth between those is is is super interesting. So wave 12 on the way soon.

Speaker 1:

Hope we'll

Speaker 8:

we'll have a lot more more to share.

Speaker 1:

Last question. Yeah. Hit the soundboard, Jordy, for that.

Speaker 2:

We're For wave 12.

Speaker 1:

Wave 12. Fantastic. Last question. We'll let you go. What is your probability that AI will get a perfect score on the IMO next year?

Speaker 8:

Oh. Interesting. So we by the way, we just had the IOI, which is the the programming version, like the programming Olympiad, when I think there's a good chance that we'll have a golden medal at the IOI for for this year announced as well. I think perfect score for

Speaker 1:

next We as year in humanity or we as in cognition?

Speaker 8:

As in humanity, yes. Yes. Yes. An AI perfect score.

Speaker 1:

Yeah. Okay.

Speaker 8:

Sorry. An AI gold medal. Right. Perfect score on the IMO next year, I think it's gotta be north of 50, honestly. I I would put it around like 75% or so.

Speaker 1:

Okay. We'll see. Well, thank you so much. I I will be following you closely. And good luck to you, and congrats on all the progress.

Speaker 1:

Very fantastic. We'll talk to you soon.

Speaker 8:

Awesome guys.

Speaker 2:

Thanks for having me.

Speaker 1:

Bye. Let me tell you about bezel. Get bezel.com. Your bezel concierge is available now to source you any watch on the planet. Seriously, any watch.

Speaker 1:

And we are joined by our next guest Claire Vo from Chat PRD. Welcome to the stream, Claire. How are doing?

Speaker 2:

What's going on?

Speaker 18:

I'm it's a fun day today, isn't It

Speaker 1:

is a fun day. What was your reaction to the stream? What was your reaction to GPT five?

Speaker 18:

You know, GPT five, the first thing I said and I got a little early access is I said, it's a developer for developers by developer. This thing is built to be a software engineer. You've seen a a long string of your guests come on and really speak about the coding abilities of it. And what I think is interesting about this particular model, especially because we're seeing them deprecate the old models in the ChatGPT experience, and we're seeing a lot of positive feedback. But I do think there are drawbacks to a model that's so clearly tuned to a developer use case.

Speaker 18:

And as somebody who's building an application that isn't focused on agentic coding, I have noticed some personality quirks that are gonna be really interesting to see how they shake out as we roll out this model to our users.

Speaker 1:

Walk me through those. What are the

Speaker 18:

Yeah.

Speaker 2:

How how much like how much time do you have to kind of move users over to five

Speaker 18:

Yeah. So I I mean, I think we have tons of time from the API side to move Yeah. Move users. And in fact, you know, our strategy at Chat purity is not to just upgrade to the latest model. I know Zach at warp said like, why wouldn't you want the latest intelligence?

Speaker 18:

And the reality is because we're doing a lot of business strategy and business writing, I actually wanna validate with our users that they're they're getting the quality of strategic thinking output writing that they really want. So we actually a b test every single model rollout and really evaluate for user quality, token generation, all those things. And, you know, looking early on, it yaps. Man, this thing just wants to go through tokens. Right now, I'm seeing four to 10 x the number of tokens generated between the, you know, four generation models and five.

Speaker 18:

Wow. And when you're in a business context, you do not always want longer Yeah. Words, you know? And so it'll be really interesting there. It is certainly focused on execution.

Speaker 18:

So I, you know, I've heard a lot from the OpenAI team. It's steerable. Yes. And. Its natural inclination is to drive you towards like, how, what, very tactical, very specific.

Speaker 18:

And so if you're trying to zoom back out at a, strategic level or focus on a business initiative, it's actually a little harder to tune in that direction. So, you I think there's a lot of positive things for me as somebody who uses AgenTic coding platforms, who writes a lot of code. It's my daily driver now. I love it. But for other use cases, I think it's gonna take some time to figure out if it really is optimal in use cases where intelligence actually isn't the differentiating capability.

Speaker 2:

Yeah. It's very interesting to think the the best the best product manager is not the one that writes the most the longest doc.

Speaker 18:

No. And you don't send your engineer into your executive meeting. Like, I and I I really am looking forward to the time where we're not getting these number based models where actually I can get like GPT developer

Speaker 1:

or

Speaker 18:

g t GPT strategist where they're pre tuned and trained for the role they're gonna play as opposed to general purpose, but clearly oriented towards a set of set of tasks. And I just think if you look at this model, it was oriented towards an engineer, software engineering, at least in my experience.

Speaker 2:

So have you been tempted to launch any type of agent, like agentic coding products? You are you guys are obviously response at Chet PRD responsible for creating documentation. And if you look at the other guests that have joined today, many of them are competing with each other in different ways and trying to own different parts of the stack. You guys have seemingly stayed really really laser focused and no one else is doing anything like you're doing at least on the show today. But talk about like picking your your lane and

Speaker 1:

Yeah.

Speaker 2:

Kind of like optimizing.

Speaker 18:

Yeah. We're integrated with a lot of those platforms. So a lot of the kind of like prototyping platforms, v zero dot dev, lovable, all those, we integrate. We just released our MCP. So I use Chat PRD pretty consistently inside Cursor through our MCP.

Speaker 18:

So I think of we we think of ourselves as the product pair to the AI engineer. Now what's really interesting about my experience with GPT five is the one place that actually does really well is technical specs. And that's a place where chat PRD has sort of bridged into engineering execution. Often, our product managers are generating a PRD or some sort of business document. They're actually going the next layer and developing a technical spec.

Speaker 18:

The g p t five technical specs fed into these agentic coding frameworks or prototyping frameworks outputs output much higher quality assets on that end. So I do almost think there's gonna be this kind of, like, right model for right use case, especially in our kind of business. And so we think of ourselves as integrating. The one thing I have thought about with GPT five, it's the first one where it feels really simple to just go ahead and roll your own agent encoding framework or prototyping framework inside of our application. So never say never.

Speaker 18:

It's something that we get asked for a lot. But we were we're friend we're good friends with almost all your guests on your show today. And so we like we like the role we play in terms of being the product manager pair to all these AI engineers.

Speaker 2:

Yeah. That makes sense.

Speaker 1:

What are you looking for next?

Speaker 18:

What am I looking for next? I mean, in terms of model capabilities, what I think is really interesting about OpenAI and why I'm really committed to the OpenAI ecosystem even though I test and use a variety of models is I think developer support is a real differentiator. So we spend a lot of time talking about model capabilities. And for application developers, certainly ones that are doing more complex applications like agentic coding, model capabilities really matter. Like Core IQ of the model matters.

Speaker 18:

But the other thing that matters, you know, somebody who has built developer tooling products, developer experience matters. The primitives in these APIs matters. And so what I'm really pushing the OpenAI team to think about, which is in addition to the core intelligence of the model, what are the developer tools you need around these models to really make them a platform on which a variety of applications can build. And I do think that OpenAI has disproportionately invested in developer experience, but I'm always looking for like, give me better out of the box tooling, give me more control over these models, give me more hosted services, all those things that as an application developer are just gonna make it easier to deploy these models of production beyond the core kind of intelligence of the models themselves.

Speaker 1:

What was your read on 4.5? Is there a world where, you know, I'm I'm I'm thinking about the the the product manager versus the engineer. You have your o three go crunch some really hard reasoning. And then you have four or five turn it into, you know, stronger prose or or like more, you know, a human language.

Speaker 18:

Yeah. So I did a lot of experimentation around four o, four five, and four one. Mhmm. Four five was my favorite prose writer.

Speaker 1:

Mhmm.

Speaker 18:

By far. It was was loved from a business writing perspective. I thought the prose was the most natural. It was really slow. Like Yeah.

Speaker 18:

Untenably slow. And so the the compromise we made in our testing is we ultimately ended up with four one as the fan favorite for business writing when we were balancing off both quality of prose and intelligence as well as performance, which for application developers is a real consideration. So I landed on four one. Four one is the model that's being tested right now against GPT five in ChatPureD. And one of the things that I have to go do now is figure out how to get chat GPT or GPT five to stop writing.

Speaker 18:

It it writes a lot and it only wants to write in bullet points. So I've gotta go back into our prompts and figure out how to direct it to be a little bit more business oriented.

Speaker 2:

Bullet point maximalist.

Speaker 18:

It's the new em dash. I'm telling you, you will not be able to stop seeing it. It it just all it wants to do is write a bullet point and call a tool like it I was using it in cursor and it just kept maxing out my tool calls. I'm like, you do not need to read 50 files to to do this. So I do think, you know, application developers are really gonna have to think about how they slot this into their current workflows.

Speaker 18:

There's definitely tuning that needs to happen. But I'm telling you, you're gonna see a lot of bullet points when this thing rolls out.

Speaker 2:

Yeah. In sixty seconds, where is product management going? A lot of people talk about the, you know, examples of product managers that are starting to ship code themselves, ship whole features, products, But I'm sure those are edge cases to date. But but where do you feel like it's going based on on your user base?

Speaker 18:

Yeah. I mean, it's gonna go one direction or the other. Product managers are either gonna develop the hard skills to do the design, the go to market, and the engineering job to some extent. Because some of these other jobs are definitely going away for product managers. Or my favorite use case, engineers and designers are gonna get tools like chat PRD or these prototyping tools or Cursor, and they're gonna be able to actually do the product management job.

Speaker 18:

And so what I think is we're gonna see a new type of role emerge, which is a much more generalist role where people maybe have a specialist capability and they're augmenting that product thinking or they're augmenting that technical thinking with AI. But I don't think there's gonna be product managers as they were, you know, five or ten years ago for much longer.

Speaker 1:

Makes sense. Well, you so much for stopping by.

Speaker 2:

Yeah. Great to have

Speaker 1:

you Thanks having We'll talk soon. Bye. Cheers. Up next, we have Brad Lightcap, the chief operating officer of OpenAI. Welcome to stream, Brad.

Speaker 1:

Also, Jordy, your post saying I'm updating my timelines. You now have four years to escape the permanent underclass has over 4,000 likes.

Speaker 2:

There we go. Absolutely. Bang. Thousand likes for every year. Love to

Speaker 1:

Anyway, Brad, how are you doing?

Speaker 2:

Brad, what's going on?

Speaker 11:

How are you? Good.

Speaker 1:

Congratulations on the launch. What are the biggest takeaways for today from your side? I'd love to know about what it actually means to be the COO of OpenAI. OpenAI does so many different things, consumer Internet company, API business, enterprise. There's all sorts of stuff, building data centers.

Speaker 1:

What what is your actual role?

Speaker 11:

My role is kinda whatever the company needs me to do. Okay. I play everything from, like, you know, PM when I need to to, like, you know, salesperson when I need to. That's kind of the fun part of the job for me. On this launch in particular Yeah.

Speaker 11:

It was really fun. I spent a lot of time last few weeks with customers, with partners, getting a feel for g p d five relative to what they were previously using. In some cases, those are OpenAI models. In some cases, there are other models. But, you know, I've been OpenAI a long time.

Speaker 11:

Been in OpenAI seven years. So I've seen g p t three. I've seen g p t four. And then to be able to see g p t five and, you know, just the I think the joy from from of people being able to use it in production and seeing how much better it is, that's the best part.

Speaker 2:

Greg told us earlier about the era of having to pay people to use the early versions of the product. You guys have come a long way since then.

Speaker 11:

Yeah. We had, like, three customers with g p t three or something like that. And so it was easy to manage, easy to talk to all of them. They actually were, like, tired of us calling them being like, is it is it good? Is it getting better?

Speaker 11:

And so now it's you know, we're fortunate that we've got more than that, but it's cool. I mean, the diversity of use cases, I think the number of things that people are able to use it for. We've got everything from the team at Amgen, you know, big pharma, life sciences using it for clinical workflows there. We've got teams at Ubers, you know, building it for customer support, teams at Notion and Cursor building it into products that people use every day. So I think that's the power of it is is it just more and more covers the service area of things people do, you know, with these tools.

Speaker 1:

I'm not sure how much you touch organizational design at OpenAI, but I'd be interested to hear your thoughts on how those companies that you mentioned should be thinking about AI changing their org structure? Is it sort of like a horizontal cross functional service layer like, you know, a finance team that touches a lot of different elements of the business? Or should most companies be thinking about standing up a dedicated, like, AI implementation team? How do we get a chat box on every product that we already ship? Like, how do you think about those trade offs if if you were talking to a, you know, a friend at a Fortune 500 company that was thinking about their AI strategy?

Speaker 11:

Yeah. You know, it's an interesting question. I think it was maybe said earlier on the show. The thing we see is just people can do more. And so there's, like, this much wider latitude that you get if you're an individual person at an individual company where especially as you get bigger, you know, maybe more bureaucratic organizations that have a lot of different functions, a lot of different levels, you have to rely on a lot of other people in the org to get stuff done.

Speaker 11:

You've gotta rely on your data science team to do data analysis. You've gotta rely on your design team to do mock ups. You've gotta rely on your marketing team to do copy. And I think what we see with AI is it just accelerates people to get to a great v one of everything. Mhmm.

Speaker 11:

So if you're a high agency individual and you wanna get stuff done, you're no longer gated on people that, you know, you otherwise would be. And I think that should enable organizations to move a lot faster, and I think it should enable the the people at organizations that really drive them to do a lot more. And we see that consistently. ChatGPT Enterprise, I think that is consistently what we hear, and we we seek those people out when we deploy ChatGPT Enterprise. We find those, like, you know, two or three people at the organizations who are just the, like, AI superstars and champions, and then try and actually use them as these kind of touch points for the rest of the org to learn from.

Speaker 1:

How are you you how are you personally using AI these days?

Speaker 11:

You know, I my biggest challenge I think day to day is context switching. If you look at my calendar from like top to bottom, it's like, I I joke, like, I, you know, with my wife, I, like, have to, like, show up to work, like, wearing, like, a lab coat, and then I, like, take the lab coat off and, like, put some, like, sunglasses on and a film school jacket, and, you know, then I'm talking to, like, a media company, and then I, like, take that off. And so I so I go through the custom changes, and I think what what I actually mostly use it for is just to help with bridging me from kind of thing to thing to kind of put me in the mindset of being able to work with customers, help customers. G b d five is incredibly good at this kind of structured reasoning of how do we actually take what is this very diverse set of things that models like g b d five can do and then apply them in domains that I don't think about every day. And so it gives me this launching off point to be able to talk with with, with leaders and with customers much more fluently about how we can help their organizations.

Speaker 2:

Within, let's say, a set of companies like the Fortune five hundred, what does AI adoption look like across the spectrum? Because I'm sure that there there's companies that you talk to that are truly, you know, opt adopting AI in the way that John was mentioning, like trying to become AI native, changing their entire organizational approach. And then there's companies that just wanna buy software to say that they can that they're becoming AI native. So what what does that spectrum look like in in practice?

Speaker 11:

Yeah. It is a wide spectrum. So at the top level, we're seeing just, like, amazing appetite for wanting to adopt tools for people. And I think that's, like, the easiest place to start. Typically, that's where we steer organizations if they're starting at zero is just give your people the best tools.

Speaker 11:

You may have seen we've, you know, we've grown ChatGPT work, which is our enterprise and team product from 3,000,000 seats to 5,000,000 seats now, from from June till till now. So, toward growth there, and we don't see any abatement in in demand there. If anything, it's accelerated from from last year. And so think people and organizations are starting to realize that, like, at a minimum, you need to make sure people have the best tools. What's cool about g p t five now is it also enables people to use the best tools at every point.

Speaker 11:

And so if you're in an organization, you're not fumbling with the model picker, you're not trying to figure out when to use a reasoning model, you're not trying to figure out kind of the art of prompting to get the perfect thing. All of that stuff is abstracted and it's kind of taken care of for you, and you can have confidence that your people are actually using the best models at any at any given point. Beneath that, it gets a little more complicated. So more and more organizations, I think, are starting to grasp how the tools can actually help in the business process. So whether that's in customer support, whether it's in research, whether it's in software engineering and data science, you're seeing these tools more and more adopted in the enterprise.

Speaker 11:

I think there's still a quality gap, though. I think we've we now are just breaking into what I would call the kind of era of models that have capabilities that are good enough to make a dent in the types of problems businesses care about. Businesses care a lot about things like reliability. Right? They think they care about accuracy.

Speaker 11:

They care about the the resiliency of the model to recover from tool use errors and to be able to string together these very long kind of multi tool, multi step workflows. So GPT five is a step on all those things, and I expect that that will enable us to be able to do more and more things in the business process.

Speaker 1:

Do you think those customers that you just mentioned will stick with this idea of, like, GPT four level workloads will stay on GPT four, and maybe there'll be cost savings, but those workloads will stick around for a very long time. And then you'll develop almost new capabilities, new workflows, new new workloads that will be additive, but the the enterprises will stick? Or will they wanna is is everything so fresh that they wanna just, like, rewrite everything with the latest and greatest?

Speaker 11:

More often than not, think it's the latter.

Speaker 1:

Think it's you

Speaker 11:

you wanna you wanna rewrite everything. One of the cool things we did here was we were able to keep the pricing on g p d five at the level of o three pricing. So, you know, if if you're cost sensitive, you don't really have an excuse to to not upgrade.

Speaker 1:

Mhmm.

Speaker 11:

G p d five is faster than than o three and and four one, so we've improved on latency for sensitive use cases that are speed sensitive, latency sensitive. And obviously, the intelligence bar has gone up. And so, you know, unless you've got really a very kinda narrow and specific specific workflow where you've got a model like four one that kind of is okay, there's really not a reason I think that people wouldn't upgrade.

Speaker 1:

Yeah. Do we need, like, a three-dimensional Pareto frontier right now that matches not just cost and and capability, but also cost capability and latency or something? Is that is that something that you're seeing a lot of demand from in the enterprise?

Speaker 11:

Yeah. A 100%. We actually measure it that way. So we we look at those three vectors and

Speaker 1:

Sure.

Speaker 11:

It's always kind of an optimization function along those three those three those three axes.

Speaker 1:

Yeah.

Speaker 11:

We think we found that here. It was actually in terms of where my work was over the last few weeks, it was a lot of I mean, it's a qualitative, you know, and kind of, you know, really, like, manual process of collecting feedback because everyone's got a little bit of a different preference, and we can only pick kinda one or two points on that curve. And so just trying to kind of dial customer feedback, namely developer feedback in for us on where that that balance of things are is a big part of our our process for picking picking all those all those points. And so we hope that we hope that people like it, and it it unlocks, you know, the kinda maximal use.

Speaker 1:

It's great.

Speaker 2:

Jordy, any How are you thinking about open source? Who, you know, who's been most excited to get access to it? And, yeah, where do you see it going?

Speaker 11:

Yeah. I mean, it's important to us. You know, I'm glad we've we've gotten this out. It's been a huge team effort. I think there was a kind of a thing that, like, you know, OpenAI doesn't like open source anymore.

Speaker 11:

It's like, no. We're just, like, really busy with a gazillion other things. So I think, hopefully, going forward, we've got more of a a a leaned advantage point on on on open source, but it it unlocks a huge number of use cases. I mean, if you think about kind of, like, you know, government use cases, you think about on prem, you know, use cases where you're you're handling sensitive data and very sensitive environments, you think about where you wanna run models on the edge. All these things right now are kind of inaccessible to us as a service provider to customers because we just don't quite have models that kind of fit at those points.

Speaker 11:

So this for us, we think, is huge TAM expansion, and we're excited to be able to work with enterprises on on implementing that model, which is is I think, you know, competitive hopefully with with our o three class of models. So

Speaker 2:

What is the landscape like for companies that are helping to implement OpenAI products at various enterprises? You have the, you know, big consulting groups that will give you an AI strategy. Maybe they'll try to take it a step further, but I imagine there's a cottage in industry of of, you know, firms that have sprung up to try to help organizations unlock the value beyond, hey. Let's just get everybody a seat with ChatGPT work.

Speaker 11:

Yeah. I think there will be this new industry that emerges that is kinda separate and apart from kind of the legacy set of SIs and, you know, consultants that is really AI fluent. They're very AI native. I think it's very hard to borrow, I think, paradigms from the last twenty years of software building and, you know, implementation that are gonna kinda map to what we're dealing with here. You're dealing with fundamentally probabilistic systems that are moving and increasing and improving at a rate, you know, of of now kinda collapsing to every few months.

Speaker 11:

And I think this the the the nature of use cases changes quickly. Where enterprises are focused on kinda deploying them changes quickly, and so I think it's just hard for kind of the legacy industries to keep up, frankly. We've had a lot of success working with some of this kind of new breed of SI, so the Distills of the world and others that really have been born, I think, in forging the fire, so to speak, of of this kind of new this new platform. And so we hope there's more of them. We'd we'd we'd be excited to work with anyone that that wants to work with us on it.

Speaker 11:

There's more business than we can handle, and so we're always happy to to spread the love.

Speaker 2:

Talk about the $1 ChatGPT product for the government. Were you involved in that at all?

Speaker 11:

I was involved in that. We we wanna do something that was meaningful for for US government. It's been a a real big focus of ours lately. I think our our view is the government has got to start to modernize. We've gotta make sure that the tools that we use in the private sector are also in the hands of folks serving us in the public sector, and we wanted to make that really simple.

Speaker 11:

So we made ChatuchPety, you know, basically equivalent to ChatuchPety enterprise free. It's a dollar per year per agency. Hopefully, we can afford that. And we wanted to make that, you know, available to anyone that wanted to use it and standardize through GSA. So we're super appreciative of the partnership with them and more, I think, that we can do on that front.

Speaker 1:

How is that different than just, like, if I'm a government employee, I can just go to google.com, and I have access to that and Google Oh, right now,

Speaker 2:

I Scott Scott Kapoor Yeah. Was saying that he can't

Speaker 1:

use Chateapizza. Yeah. So so yeah. Why yeah. Yeah.

Speaker 1:

Just just talk to me about how how it's different to to offer Chateapizza as an actual service with a contract that you're that you're, you know, vending in. You're actually they are a client versus just if you put up a website, every government employee can access the web to some degree, or would it be blocked? Like, why does it need to be, like, a deal at all as opposed to just, like, everyone just uses it?

Speaker 11:

Yeah. So part of it is just making sure that government employees can access it. So Sure. In some places, obviously, you know, you you can put blockers in place that wouldn't prevent access. Yep.

Speaker 16:

We hear a

Speaker 11:

lot of stories, by the way, of people, like, going out on their lunch break to their car in the parking lot and, like, you know, pulling up ChattyPT on their phone and, like

Speaker 1:

Of course.

Speaker 11:

Throwing a bunch of stuff in there just to, like because they know it'll get them through the day faster.

Speaker 1:

And Yep.

Speaker 11:

We've done work, by the way, with governments, with the state of Pennsylvania and other places where we've seen dramatic increases, you know, things like two to three hours a day saved per employee given the nature of the work that they do and and how helpful ChatGPad can be. And so this lets us have an interface into them as a customer. It lets our team engage with them in a a direct way. We can see how they're using the product and can help them use it better. And so that's that's very important for us is, like, we gotta build on that foundation with them.

Speaker 1:

And then presumably, it also allows the the government to define, like, security and privacy in their world as opposed to if you're just, like, some website out there. They their choice is only block or don't block as opposed to actually, know, communicate with you. This is okay to train on. This is not, etcetera. Like, keep everything private, etcetera, etcetera.

Speaker 15:

Yeah. I mean,

Speaker 11:

we don't we don't train

Speaker 7:

on on on

Speaker 11:

enterprise data at all. So Yeah.

Speaker 1:

And so you know,

Speaker 11:

you're safe there. But the yeah. I mean, for us, like, just being able to to treat them as a customer. Right? To treat them as a user, and you go you know, you mentioned earlier, like, we we were talking about kind of, like, there being these points of success at every organization that, you know, you've got people who are, like, way more sophisticated in using these tools than others.

Speaker 11:

We wanna be able to see those people and amplify them, and the government's no different. There are people that we've worked with in government who are incredibly sophisticated in how they use AI tools, and our goal is to get everyone there.

Speaker 2:

How do you think about the group of users that are active students? They've been on summer break. You guys have been busy over summer. You do you are you thinking about and and you recently launched I I forget the exact name for the product. I think it was, like, ChatGPT learning.

Speaker 2:

How are you thinking about that cohort and unlocking new capabilities for them this coming year?

Speaker 11:

Yeah. So we launched something called study mode, which was in our core chat GPT product, and it was a little bit of an experiment. We wanted to see if you change the way the model behaves when it can kind of when it knows you you wanna be in a learning mode, if that can actually enhance outcomes for students where we we have all these kind of studies that have been done very like anecdotally about ChatGPT's ability to to drive student outcomes and learning outcomes. So here we kinda took a little bit more of an intentional approach of you actually model the take the model and actually use it in a more Socratic style where it can actually kinda quiz you. It can withhold certain information that it wants you to be able to to empirically deduce.

Speaker 11:

It wants you to reason about problems, and it kinda reasons with you as a partner. So far, so good. It's it's really cool. And learning is kind of the killer use case of ChatGPT. And so I think, you know, to be able to actually launch something that is in some sense extends that kind of killer use case is it's been really cool, and and the student feedback so far even on summer break has been positive.

Speaker 1:

Well, we'll let you get back to your day. What's what's next on your agenda? Are you putting on the lab coat or the suit and tie and going to Washington?

Speaker 11:

Good question. You know, today, I'm I'm mostly with the team Okay. And and talking to customers, and maybe tomorrow, I'll get back to the lab coat. But Cool.

Speaker 3:

In the meantime, man.

Speaker 11:

Appreciate you taking the time to talk today. So.

Speaker 1:

Yeah. Well, thank you so much for taking the time to talk to

Speaker 2:

to see you.

Speaker 1:

We will talk

Speaker 3:

to Have you

Speaker 1:

a great rest And of your the timeline has been in turmoil because president Trump says he will be imposing a 100% tariff on all semiconductors coming into The United States. It started with widespread tariffs on chips and then turned into export controls. This is from the Kobe Yessie letter. Is this a red flag moment? Don't know why it had a red flag.

Speaker 2:

It felt like Ben was getting the Ben's

Speaker 1:

getting the flag. And potentially affected, but Taiwan says TSMC exempt from Trump's 100% chip tariff. Very unclear. The story is obviously still developing. And Dylan, a medic says, you're telling me that this level of monitoring the situation is free and it's a picture of you in front of the whiteboard monitoring the chat GBT versus the timeline Illinois has banned AI therapy making it the first state to regulate the use of AI in mental health services.

Speaker 1:

Interesting. Interesting headline that's coming out.

Speaker 2:

It's interesting because the product can just be used for Like the user can choose to do that. It's not necessarily it's kind of hard to ban outright. Like maybe you can ban it in a clinic clinical setting.

Speaker 1:

Yep. I wonder how they define this. There's probably a loophole if I know anything about how these bans are implemented. But yeah, maybe it's like if you're if you're in the clinical setting, you can't be you can't use it, but then people will just use it independently. I'm like, yeah.

Speaker 2:

Therapist just on their phone.

Speaker 1:

They're gonna be going to the car.

Speaker 2:

They're gonna be having They're gonna No. They're just gonna have it listening to the conversation. Yeah. They're gonna be like,

Speaker 1:

no. What should I do right now?

Speaker 2:

What should

Speaker 3:

I say?

Speaker 1:

What should I say? How does that make you feel? That's what was gonna tell you. Celsius nearly doubles revenue year over year. This is the energy drink.

Speaker 1:

Revenue of $739,000,000 versus 632,000,000 consensus. North America grew 87%. International grew 27%. But here's the real kicker. A Lani New acquisition is the primary driver of growth.

Speaker 1:

Lani New added $300,000,000 in revenue and retail sales are up. So wow, what performance. But yeah, I mean, that was the expectation when they bought A Lani New is that they would I I guess this is like the first moment they they rolled them in, probably. But huge growth for Celsius as they become multi product, multi multi consumer company. What else is going on in the timeline?

Speaker 1:

We have one last guest. I think you might have to hop on with Taipei. Yep. So feel free to jump when you need to. Tyler, anything going on on the timeline we should be monitoring?

Speaker 1:

We are of course monitoring the situation.

Speaker 3:

I've been So so when Max was on, was talking about like how you can like make a little game, right? So I've been working on like a Bloons Tower Defense game.

Speaker 1:

Okay. How's it going?

Speaker 3:

So it's going pretty well. I'm I'm making another change but then maybe I can screen record and share.

Speaker 1:

Yeah. That'd be great. You could share with the folks too. I like this post from Ray Sullivan. These GPT five numbers are insane and it's a chart of GPT version versus number and then once it gets to four it goes four point one four point two four point three four point four four point five.

Speaker 1:

So the the fifth one is a massive massive bar.

Speaker 2:

We need an analysis of the charts from today. It seems like there was multiple that were

Speaker 1:

Kind of odd or Odd. Hallucinated or or or off.

Speaker 2:

It's interesting that multiple of them snuck up.

Speaker 1:

Just in Sheets, the popular convenience store chain with 750 locations is now offering 50% off purchases paid with Bitcoin and crypto daily from three to 7PM. What a wild move by Sheets.

Speaker 2:

Well Well

Speaker 1:

Ben Highlax

Speaker 2:

is in the waiting Let's bring him in.

Speaker 1:

Let's bring in Ben Highlax. Are you doing? Good to see you.

Speaker 2:

Doing well. How are guys doing? We're doing well. I'm just gonna say hello. I gotta take off talk with Taipei.

Speaker 2:

Yeah. I'm gonna let John take it from here. Absolutely. I'll close out the Have a fantastic conversation.

Speaker 1:

Give me the update. How's the day been for you? What were your expectations? Did this meet exceed? Did it underwhelm you?

Speaker 1:

How are doing?

Speaker 4:

Well, so I've actually had access for a couple weeks. So we actually did a video. I'm not sure if you've seen it, but, OpenAI brought a couple of, folks from the Twitter sphere

Speaker 1:

Yeah.

Speaker 4:

Yeah. To their office a couple Just weeks

Speaker 1:

try it.

Speaker 4:

Some other folks. Yeah. Yep. Yep. Yep.

Speaker 4:

I think that, it pretty much exactly meets my expectation as far as, like, how, how it's been received. And I've tweeted about this as well, but I think that it's really, really good at, like, one shotting things. You know, I I think it's, like, it's better than I think other models we've seen. But I think it's actually sort of a distraction in a lot of ways. Mhmm.

Speaker 4:

I think that the things it's a lot better at are a, a lot harder to describe, and b, I don't think the the harnesses for it really exist yet. What mean harnesses? A lot. Harnesses. So, what the way I've been describing it is that I think I've seen, you know, web search existed in ChatGPT for a really long time.

Speaker 4:

Right? Like, it was able to, like, call a tool, search the web.

Speaker 1:

Yep.

Speaker 4:

Obviously, like, deep research was very different than that. Right? Like, what we saw was it was, like, actually, like, calling you know, searching the web. It was like reasoning about those results, changing its kind of course, like course correcting in the middle. So like intermediate reasoning is like what is the is the term for it.

Speaker 4:

And they really trained it how to search the web well. I think GPT five does that for, like, a whole plethora of tools. The interesting thing is that a lot of products, like, I think a lot of the AgenTek products that exist today were kind of built wrong. Like, they weren't built that they didn't build their tools the right way. And we've seen this before.

Speaker 4:

Like, if you look at, like, you know, the first, you know, kind of infrastructure for agents was LangChain, like, way back when.

Speaker 1:

Yeah. I remember.

Speaker 4:

'2 or three. Yeah. It was it was, you know, it was it was early, but it was wrong. Right? And so, like, anybody that you know, they they've iterated since.

Speaker 4:

Right? They have, like, lang graph. It was a better implementation. But the first implementation on lang chain was, like, again, early but wrong. And so if you built your product on lang chain, like, you had to, you know, significantly change it.

Speaker 4:

Yep. I think we will see a similar thing happen for GPT five. You know, it's not just like, you know, change the string and get, you know, from, you know, four out of five or something and push it and now you,

Speaker 2:

you know yeah.

Speaker 1:

Yeah. You know that meme about like, oh, like Sam Allman stood on stage and like just like, you know, killed 75 startups. Google just killed a 100 startups. Apple just killed Partifle with their new thing

Speaker 9:

or whatever.

Speaker 1:

Did any of that happen today? It feels like No. It it it feels like this is like the Lang chain needing to change their strategy. That happened a while ago. I haven't identified anything.

Speaker 1:

It feels like, you know, Scott Wu hopped on and said like, you know, great day to be an application layer company. The foundation models got better. It's, it's more tools in my tool chest. I'm extremely happy, and, and and I'm I'm more confident than ever. And I believe him.

Speaker 1:

I believe that he was he doesn't see today as, like, fundamentally needing to change his business model.

Speaker 4:

I I think that's true, actually. I think that, people have been you know, there's a lot of people building agents right now. I think a lot of them have not been feasible for some of the reasons that GPT five starts to address. So I think it is I think that what it means is that the entire architecture behind agents will get a lot simpler. Like, it feels like a a good day for people building applications.

Speaker 4:

Yeah. Yeah. It's not immediate that there's, some, you know, like, company or something that got killed today.

Speaker 1:

Yeah. Yeah. Yeah. I mean, in general, it feels like, you know, Dora Kesh up updated his timelines. There's just been a general idea that, like, we've we've maxed out pretraining.

Speaker 1:

We've kind of maxed out post training. We're now in the let's reap the reward of this, and we've seen it in Yeah. Like, the incredible financial performance, the incredible usage numbers. You know, millions and millions, hundreds of millions of people are using Chachapi thirty thirty minutes a day. I love the product.

Speaker 1:

And yet it feels it feel feels like Yes. The what have you done for me lately meme. Yeah. It's totally like, okay. Yeah.

Speaker 1:

We went from the iPhone four to the iPhone five today.

Speaker 4:

Yes.

Speaker 1:

Still really an important technology. Great company, but, like, I want another iPhone one.

Speaker 2:

Yes. Yeah. Yeah. Yeah.

Speaker 4:

I no. I totally get what you're saying. Yeah. Yeah. I think that, like, I I I wrote a piece about this with Swix, but Yeah.

Speaker 4:

I the it really actually changed the way I see that path to AGI. Like, I think before using it a lot, I kind of was like, okay, we need, like, bigger bigger models. They're gonna, like, get smarter or something. Mhmm. I think, like, I I had this realization.

Speaker 4:

So I was watching it, like, solve I had this, like, really weird, like, dependency conflict with Yarn. Like, we have, like, a mono repo. It's, like, the part of the problem also with this discourse is, like, the the sort of problems it gets good at solving are just, like, not sexy things to talk about. They're not things that even you'll understand. I'm like, oh, we have this issue with our like, the way we structured things and, like but, like, a a couple weeks ago, was watching it, like, I had this problem.

Speaker 4:

No other model would solve it. And, I watched it sort of, like, poke around. Like, started it running this, like, yarn y command in a bunch of different directories. In between, it's like reasoning and, like, correctly reasoning about, like, what what and why and what it was learning. And it you know, taking little actions in between, seeing what happened.

Speaker 4:

I think what I realized is that, like, you know, if you imagine, like, humans without tools, like, we never had any tools, we're never even able to write things down, like, would you be able to tell that we're intelligent? Would we have, like, you know, learned to speak, etcetera? Like, I I just, like, don't. Yeah. You know, even if we could not have ever invented fire, right, it's like it's like, where would we be right now?

Speaker 4:

There's that feels like there's a similar like, I actually think a lot of the next year is just going to be how do you get these models to do things better is, like you know, I think it's next year.

Speaker 1:

In your Yarn example, you you said, like, you were you were having it, I assume, GPT five, like, work on the problem. Was that wrapped in a coding tool? Did you just go to chat.com Yes. And give it your GitHub repo? Like like No.

Speaker 1:

No. Talk to me. Like, what was the actual user experience from your side?

Speaker 4:

Yeah. So this was in cursor.

Speaker 1:

Okay.

Speaker 4:

I think the Codec CLI, the new version of the Codec CLI, they just released today, is also really, really, really good.

Speaker 1:

Okay.

Speaker 4:

I think that you will really only see a significant difference in places where it can sort of, like, explore its environment is the way I would put it. Mhmm. Like, when I was watching it, like, go bounce around my repo, like like, felt almost like I was watching something navigate, like, a little, like, video game, like Pokemon or something. Like, that that's kinda what it felt like.

Speaker 1:

Like Yeah.

Speaker 4:

Yeah. It's kinda like, I'm gonna go over here. I'm gonna see this. Okay. Wait a minute.

Speaker 4:

That conflicts with what I just saw over here. Like, where should I go next? Do know what I mean? Like, the it it felt very novel is, like, what I would say. Yeah.

Speaker 1:

Yeah. Yeah. Yeah. What what so yeah. I mean, how are you using it?

Speaker 1:

What what what where do you see it going? Do you see it like just like a little bump of a tailwind today, or or what's your read on, like like, how you'll be using GPT five going forward?

Speaker 4:

I mean, yeah, there's two huge things. So, like, one thing that, like, really got missed today is that, they also released GPT five nano

Speaker 1:

Oh, yeah.

Speaker 4:

Which is, like, an incredibly good model, actually. So, like, we're not talking about it, but it's half the cost for input tokens than Flashlight or sorry. Yeah. I think it's actually half the cost of input tokens than Flashlight, and it's a really good model. Like, it's, like, four o level for a lot of, like, writing and stuff like that.

Speaker 4:

And so, yeah, we'll we'll be using that probably in the short term. I think it'll be interesting to see how other providers react. Like, I I'm sure Google will cut their prices as a result. Yeah. But it is the cheapest, like, hosted model, I think, that I I don't think anyone's serving any other model for those prices for that matter.

Speaker 1:

Yep. Yeah. That makes sense. What else are you looking for for the rest of the year? Probably no GPT six on the horizon, but what are you looking out for?

Speaker 1:

I mean, it seems like Google is expected to respond with Gemini three soon. But what else are you tracking in the in the world of AI these days?

Speaker 4:

It's a great question. I think that, yeah, that's gonna be wildly interesting. I think what Google does will tell us a lot. Yep. I think that they you've probably seen it, but the know, they released this, like, world model Yeah.

Speaker 4:

Yesterday. We're kind not talking about it anymore. I mean, like, if those videos I haven't tried it myself. If those videos are real, like, that's that's one of the most mind blowing things I've seen in the last, like, you know, decade or something. So, like, if that's real, like, that's extremely interesting.

Speaker 4:

And I think has all the stuff that's going on with role models right now has, like, huge implications for, like, everything, like, from robotics, just, like, so many different fields. So super, super interested in that. And the other thing is I actually just think that, like, again, I'm I'm actually really bullish on g d five. I think that the way it was received today is, like, just about how I expected it. Like and the reason is, like, when I say harness again, I'm like, I think that, like, Canvas and ChatGPT is pretty bad.

Speaker 4:

It's, like, my would be my take. Like, you know, it's it's a tough product to make, but, like, yeah. Like, it does really poorly with, like, long files crashes sometimes, like, that sort of like, I think that we don't have the the product layer around g p t five doesn't exist yet. So I think we're gonna see some really, really interesting products, that are built around it.

Speaker 1:

Yeah. It's always hard when you go from, like, a binary qualitative in your face improvement GPT. Like, ChatGPT was like, we passed the Turing test. And now the next test is, like, super intelligence. It self replicates.

Speaker 1:

It's smarter than every single person knows everything. It's like, the bar is like, we really moved the the goalposts. You know?

Speaker 4:

A 100%. I think that there was, like, a lot of, you know, discourse around the model as well, like leading up to it, which I think

Speaker 1:

didn't Yeah. Totally.

Speaker 4:

Help, you know? Yeah. Yeah. But like, the way that I would think about it is like, I think that, you know, depending there's some percentage of the way through automating software engineering that we've made it. Like, let's say it's like 70% or something.

Speaker 4:

75%. Yeah. The tough part is like that last, like, 25% is, a, the hardest it's like the least, sort of decipherable to, like, explain to people. Yeah. It's the least, like, universal.

Speaker 4:

Like, like, if I'm just like, oh, make a you know, one of the examples I did, I I made a personal website. It's, like, all Mac OS nine themed in, like, twenty minutes with g p five.

Speaker 1:

That's

Speaker 4:

fun. And so it's it's really fun. Right? You get it. Like, my mom gets it.

Speaker 4:

Like, I can show it. I I can share it. You get it. You know, my mom, I can't explain any of the, like, the very specific ways that two d five, like, helps in our specific code base, our specific problem, whatever. So I think that, like, it'll be less these launches will probably get less and less, sort of interesting from a soft like, from a what it does for software engineering as that gap gets closed.

Speaker 4:

Yeah. Like, I you know, what's the last 5% of software engineering? Like, I you know, like, I I it's probably not gonna be that interesting to me.

Speaker 1:

Do you think they'll be on an annual release cadence now? Like, Apple updated all of their iOS, all their operating system nomenclature to be like, we are now on '26 because it's the year. It's like a car model. Like like Jaguar

Speaker 4:

you can plan it. I don't think you can plan ahead. Like, that's the interesting thing is, like, I think that, you know, there there's people that say that GT 4.5 was supposed to be GT five.

Speaker 1:

Yep. Yep. Yep.

Speaker 4:

And, like, I think that it sort of came out and they're like, it's like, you know, it's I actually love 4.5. I think it's a really fun model. But

Speaker 1:

Well, it's true that, like, improvements come in many places just like with the with the iPhone, like, the latest iPhone, you buy that because it doesn't it's not just like the one with the new screen. It has a slightly better camera, slightly lighter, longer battery. Like, it's like an ensemble of improvements that then they add up. And I think that that feels like what we're getting here today and what we will get in the future is, like, this little like, we did a little extra RL over here. This tool is now sharper.

Speaker 1:

It has new capabilities. We added multimodal. Like, you know, the video generation got better and this feature

Speaker 9:

got better, etcetera, etcetera.

Speaker 4:

And I think that, like, what a model is is still going to change a lot and, like, how we value like, so just give an example. Like, four o was sort of this big thing, you know, where they talked about it being, like, natively multimodal Yeah. You know, taking in even, like, video at some point, video in, video out, like, audio in, audio out. Yeah. And, like, you know, you you haven't heard that from GT five yet.

Speaker 4:

Like, you can't talk to it on advanced voice mode.

Speaker 1:

That's interesting.

Speaker 4:

It doesn't doesn't doesn't generate image like, know what mean? It's there's no, at least yet, native image generation. We don't know much about how it works under the hood, but, like, it's still calling four o to generate images. Right? So it's like, do you start to see an unbundling of the these model capabilities?

Speaker 4:

Like, seems quite possible. Like, the best model for writing natural language might not or, like, writing, creative you know, creatively might not be the same model that writes, you know, really good Rust code. Like, it it might be different models. So I don't know. We'll see.

Speaker 1:

Yeah. Create image here is now tucked next to deep research agent mode, etcetera. Yeah. But I I I would hope that you can call that from the actual chat interface.

Speaker 4:

You can call it from the g t five chat. It's just using it's using, g t image one, I think, actually the name of the model. So it's a it's a dedicated image generation model, which I think is maybe four o. I I don't totally know.

Speaker 1:

Yeah. I I just I I I don't particularly care. I'm not looking for one model to rule them all. I'm fine if with models calling Yes. Different tools.

Speaker 1:

It seems fine. Yes. Anyway, fun day. Thanks for hopping on. Of course.

Speaker 1:

You soon.

Speaker 4:

Of course. Anytime.

Speaker 2:

Talk soon.

Speaker 1:

Have a good one. Bye. And that's our show today, folks. Leave us five stars on Apple Podcasts and Spotify. And thank you for tuning in to the GPT five giga stream.

Speaker 1:

We're on hour four and a half. We've enjoyed hanging out with you. Tyler, anything else from the timeline? Close it out for me. Timeline's still in turmoil.

Speaker 1:

If you want to

Speaker 3:

show the little game I made.

Speaker 1:

Okay. Yeah. Let's show Tyler's game. Can we do that? You got it?

Speaker 1:

Tyler's tower defense? Okay.

Speaker 3:

This was this was one shot.

Speaker 1:

Okay. Wait. What do mean? One shot. One prompt.

Speaker 1:

You said you were working on it.

Speaker 3:

I was, but then it's like What's

Speaker 1:

the definition? Oh, so you went back to a single prompt?

Speaker 3:

Yeah. I made a change, but then I've realized, okay, this is not as good. So I just went back to the first one.

Speaker 1:

Okay. No. Yeah. My my my question is, I mean, this this seems well, actually, like, it's it's that like, the game engine, I don't know what it's using under the hood, what it's do do you know did it write like WebGL code or do you write

Speaker 3:

like I think it's just it's just like JS.

Speaker 1:

Okay.

Speaker 17:

And it's

Speaker 3:

interesting, like, HTML canvas.

Speaker 1:

That's pretty crazy. Yeah. You'd think it would use some, like, two d engine off the shelf or something. But Yeah. My my question is, like, what that won't go viral because that is less impressive than just the tower defense app that I can get in the App Store.

Speaker 3:

For sure.

Speaker 1:

Yeah. But it's like maybe if I take my you you know how there's, like, ControlNet images went viral where people would take their corporate logo and then they'd throw that through ControlNet, and it would be like the TPPN logo overlaid over like a forest. And like the trees would look like Yeah. Or it'd be really cool. Yeah.

Speaker 1:

So maybe like it's tower defense, but it's my logo or something like that. And like the the the enemies are like like moving through something like that. I don't know. There's just gotta be a way to personalize it and make it so every single game is a unique snowflake that you wanna go and experience that one. You wanna look at it.

Speaker 1:

You wanna spend some time in it. I don't know.

Speaker 3:

Yeah. But It's hard because it's like it's still, you know, predicting the next token. It's not like image the four o image generation was like kind of a it wasn't novel, guess, because there was image generation.

Speaker 1:

Yeah.

Speaker 3:

It was like such a massive improvement. This is

Speaker 1:

Yeah.

Speaker 3:

Like, there's not any clear massive step change here. It's a little bit better in a lot of ways.

Speaker 1:

Yeah.

Speaker 3:

So

Speaker 1:

Oh, well. Well, we'll have to play with it more. Let us know what you think about GPT five, and we will see you tomorrow. Have a great day. Thank you so much.

Speaker 1:

Bye.