Chain of Thought | AI Agents, Infrastructure & Engineering | Why LLMs Are Plausibility Engines, Not Truth Engines

Dan Klein, co-founder & CTO of Scaled Cognition and ACM Grace Murray Hopper Award winner, breaks down why LLMs are fundamentally plausibility engines and how his team built APT1 for under 11 million dollars. He explains why multi-model checking fails, why benchmarks measure the wrong thing, and what it takes to ship AI that enterprises can actually trust.

Show Notes

Every few weeks at Microsoft, someone would build an AI prototype that blew everyone's minds. Three months later? Dead. "We can never ship that." Dan Klein watched this happen for five years before he decided to do something about it.

Dan is co-founder and CTO of Scaled Cognition, a professor of computer science at UC Berkeley, and winner of the ACM Grace Murray Hopper Award. His previous startups include adap.tv (acquired by AOL for $405M) and Semantic Machines (acquired by Microsoft in 2018), where he spent five years integrating conversational AI. His PhD students now run AI teams at Google, Stanford, MIT, and OpenAI.

At Scaled Cognition, Dan's team built APT1 (the Agentic Pre-trained Transformer) for under $11 million. It's a model designed for actions, not tokens, with structural guarantees that go beyond prompt-and-pray.

Dan makes the case that current LLMs are plausibility engines, not truth engines, and that the gap between demo and production is where most AI projects die.

Why prompting is a fundamentally unreliable control surface for production AI
How APT1's architecture gives actions and information first-class status instead of treating everything as tokens
The specific failure modes that kill enterprise AI prototypes within three months
Why stacking multiple models to check each other produces correlated errors, not reliability
How Scaled Cognition applied RL to conversational AI when there's no zero-sum winner
Why every S-curve in AI gets mistaken for an exponential — and what comes after the current plateau
The societal risk of systems that produce output indistinguishable from truth

Chapters
(0:00) Cold open: RL is about doubling down on what works
(0:28) Introducing Dan Klein and Scaled Cognition
(2:53) The demo-to-production gap: why AI prototypes die
(5:40) Why prompting is not a real control surface
(8:06) Modular decomposition vs. end-to-end optimization
(10:55) Are LLMs fundamentally mismatched with how we use them?
(14:26) What's wrong with benchmarks today
(20:27) APT1: building a model for actions, not tokens
(24:14) What makes data truly agentic
(28:02) Hallucinations as an iceberg — visible vs. undetectable
(34:16) Building a prototype model for under $11 million
(39:57) Applying RL to conversations without a zero-sum winner
(43:31) LLMs as a condensation of the web — and what happens when it runs out
(50:07) Reasoning models: where they work and where they don't
(53:04) Early deployments in regulated industries
(57:14) Why multi-model checking fails
(1:00:34) The minimum bar for trustworthy agentic systems
(1:04:07) Societal risk: when AI output is indistinguishable from truth
(1:13:33) Where Dan is inspired in AI research today

Connect with Dan Klein:

Scaled Cognition: https://scaledcognition.com
LinkedIn: https://www.linkedin.com/in/dan-klein/
UC Berkeley NLP Group: https://nlp.cs.berkeley.edu

Connect with Conor:

Newsletter: https://newsletter.chainofthought.show/
Twitter/X: https://x.com/ConorBronsdon
LinkedIn: https://www.linkedin.com/in/conorbronsdon/
YouTube: https://www.youtube.com/@ConorBronsdon

More episodes: https://chainofthought.show

Thanks to Galileo — download their free 165-page guide to mastering multi-agent systems at galileo.ai/mastering-multi-agent-systems

What is Chain of Thought | AI Agents, Infrastructure & Engineering?

AI is reshaping infrastructure, strategy, and entire industries. Host Conor Bronsdon talks to the engineers, founders, and researchers building breakthrough AI systems about what it actually takes to ship AI in production, where the opportunities lie, and how leaders should think about the strategic bets ahead.

Chain of Thought translates technical depth into actionable insights for builders and decision-makers. New episodes bi-weekly.

Conor Bronsdon is an angel investor in AI and dev tools, Head of Technical Ecosystem at Modular, and previously led growth at AI startups Galileo and LinearB.

FINAL TRANSCRIPT
================
Speakers: Conor, Dan
Duration: 1:18:11
Total Words: 13574
Generated: 2026-04-06

---

[0:00] Conor:
RL is all about doubling down on the things that work and doing less of the things that aren't working. And it's really that simple. Try various things, go in the direction of the things that are working. So if you're playing go or chess, even if you're playing yourself, even if you're moving randomly, one side wins, one side loses. And that means you have a gradient. The one side was better than the other. The rules of the game were crisp and the outcome of the game is clearly defined.

[0:28] Conor:
We are back on Chain of Thought, everyone. I am your host, Connor Bronstein, head of technical ecosystem at Modular. My guest today is someone that many of you may know. Dan Klein is co-founder and CTO of Scaled Cognition and a professor of computer science at UC Berkeley, where he leads the Berkeley NLP group. Dan previously won the ACM Grace Murray Hopper Award for his work on grammar induction. And his former PhD students now run AI teams at Google, Stanford, MIT, OpenAI, and several other places, I'm quite sure. Dan also happened to be a serial entrepreneur. His first startup, AdaptTV, was acquired by AOL. His second, Symantec Machines, was acquired by Microsoft in 2018. And then Dan spent five years integrating conversational AI technology there at Microsoft. He is therefore deeply familiar with the challenges between building AI systems and actually shipping them at enterprise scale. This gap between demo and production is what led Dan to start Scaled Cognition, where his team has built APT1, the agentic pre-trained transformer, a model designed from the ground up for actions, which has performed extremely impressively on agentic benchmarks. I'm sure we'll get into it a bit. We'll dive into his approach, his viewpoints on today's research, and much more. Dan, so good to see you. Welcome to DataThon.

[1:45] Dan:
Thanks. It's great to be here.

[1:47] Conor:
Yeah, very excited for this episode because you have such a depth of experience in the industry and knowledge on the research side as well, which isn't always a guest profile we get to have a conversation with. I feel like it's usually one or the other often these days. So it's, I'm interested, how have you balanced that over the years?

[2:05] Dan:
Yeah, it's always a bit of a balancing act. But I think one of the things that's been really motivating for me is being able to look at both what is happening on the academic side in the research community to be able to see what ideas are developing there, see the cutting edge at the same time to also see what's going on in cutting edge in industry and the interplay between those. And I think having both of those perspectives has really been one of the main things that has unlock opportunities in my career because there are so many great ideas on both sides. And there's often ideas and needs which often don't get communicated often on the academic side. We don't know what the real problems are. And on the research side, there's just so many people thinking about so many ideas, the connections don't always get made. So it's really exciting when you can put those two things together and leverage that.

[2:53] Conor:
Well, hopefully we can bring some of those together in this conversation. And before we truly dive in, I do want to say a brief thank you to our presenting sponsor of this episode, Galileo. You can check them out at Galileo.ai for AI evals and guardrails. So, Dan, you've described experiencing this pattern where someone would single-handedly build something mind-blowing. Everyone would get excited, and three months later, it was dead. What are the specific failure modes that cause this major gap between the demo that we all get stoked on and maybe overhype at times and actually getting it to something where it's reliable in production?

[3:30] Dan:
Yeah, I mean, there are a lot of reasons. And some of them are things that probably everybody listening to this podcast are familiar with in terms of hallucinations in the system, where the system will do something plausible, but not actually correct. So it looks like it has the right shape. It's just not telling you the truth and stamping that out can be challenging. Or you build a system which does something reasonable. It's just not what you want. And when you try to control it, it's still just persists in doing its original behavior. So I think a lot of the limitations have to do with the sort of limited ability to control through standard control surfaces like prompting, as well as just the natural performance characteristics of many of the models out there. And in general, the models that people are mostly interacting with today are, of course, they're highly generative models, they're operating in a token by token way. And Ultimately, they are assembling plausible outputs on the fly. And it's so easy for those things to be plausible, but not true. The bottom line is we have built not truth engines, not reliability engines, we've built plausibility engines. And that's great for building a prototype. But to get to that last mile, to get to something that you're actually comfortable shipping and that has the guarantees and performance characteristics that an enterprise needs, that's really, really hard. And I think a lot of people don't appreciate how big that last mile can be in an enterprise complex.

[5:02] Conor:
And I think often we kind of forget that this is what we've built because they are so convincing, these plausibility machines. They often get the answer correct or most of the way correct. And we think, oh, this is just answering my questions. And it's important to remember that, yeah, this is next character prediction that is occurring here. And that plausibility piece kind of gets factored out at times. And I wonder if you think that part of the problem here is the control layer as well, where we are using prompting as this control surface for production systems.

[5:40] Dan:
Yeah, I think that's definitely part of it. I mean, one of the things that is mind-blowing about the recent progress in AI is that we have these models that, in terms of their breadth, in terms of just their ability to be able to make a credible approach to kind of any topic. Fundamentally, they are distillations of all kinds of human knowledge from the web. And so they do know about an incredible breadth of topics. And what comes with that is the ability to kind of prompt them in flexible ways. And that's, that's, it's really, it's really something that it's really something that has changed the shape of how AI is used. At the same time, it's not a control surface that has any kind of precise semantics. It's not a control surface that comes along with any kind of guarantee at all. And so when you put some words in a prompt, you are, you know, it's like a hint. You're requesting that the model do a certain thing you have in mind. And the process of putting your, what you want into words is not a perfect process. Natural language is full of ambiguity, but then There's nothing really that says the system will listen to that. And I think we've all sort of had these experiences interacting with prompts where you prompt the system and it doesn't do what you want. So you change the prompt a little bit and it doesn't do what you want and you change it again. And what is your recourse if the system isn't doing what you ask? You can reword your instructions. You can put your important sentences in all caps. You can add an exclamation point. You can add a second exclamation point. And sometime around where you're adding the third exclamation point, you kind of get the sense that maybe this is not the path to a robust and controllable technology.

[7:35] Conor:
Yeah, in traditional software, we've obviously built complex systems that we can trust because we are designing them with this very clear language where we break them into pieces with defined inputs and outputs. They have contracts built in and we have compilers that have been built to faithfully translate our code to the machine. But we don't have that same layer of guarantee with LLMs. And it does create this seemingly kind of core problem for a lot of use cases.

[8:06] Dan: [OVERLAP]
Yeah, I think that's exactly right. And there's a bunch in what you just said. One is you're exactly right. The main tool we have had historically for building complex, reliable systems has been this notion of modular decomposition, that you take a big problem, you break it down into smaller problems. Each piece has, as you mentioned, contracts. If you give this output input, we'll give this output And you can say something precise about what it will do, what it should do. And, um, that has been very powerful in terms of letting software engineering result in large systems that can actually have a big impact in the world. One of the things that has historically driven machine learning has been end-to-end optimization, which can be at odds with that. You want to reinforcement learn, go system or something like that. And you put the signal in one end and then a whole bunch of

[8:59] Conor: [OVERLAP]
We're

[8:59] Dan: [OVERLAP]
complicated

[8:59] Conor: [OVERLAP]
just saying, oh, did it work in the end or work better kind of.

[9:02] Dan:
Yeah, and you don't have that same tools. And so what's happened is, I think, just structurally, fundamentally, given how these systems have worked, we have started to build systems which do not have the same kinds of modularity, they don't have the same kinds of ability to guarantee what they will or will not do. And that's actually The primary thing that motivates me in general is this question of how can we make sure that the AI models we build are truthful, are trustable, are controllable, and that you can make guarantees about them. And at Scale Cognition, that is our core mission, really. If you distill everything else down, we are a model lab building models about which you can make guarantees on their behavior. And I think that's an important piece when you look at this context of wanting to use AI in settings where its actions are going to have consequences. I think one of the things that people when they talk about hallucinations may be underappreciate is that in many cases, a generative model, the hallucination is actually the product. So if you're doing image generation, and you give a prompt and you get an image out, you want an invented image, you want something novel, you don't want something copyrighted, regurgitated back to you. And the, the actual purpose of that technology is about generating something new. And then if you take those same sorts of techniques and you turn around and now you want them to be giving you facts, which are vetted and verified and taking actions that are adherent to a policy. This is sort of a underlying fundamental challenge and a mismatch between how these technologies are designed and the uses we want to put them to.

[10:55] Conor:
Is there just a fundamental mismatch between how we're leveraging large language models that were originally built for speech recognition, I mean, machine translation, as these sources of truth? How do we square the origination of LLMs with how we're using them today? And maybe this is a question where we can look to Ian LeCun's recent massive fundraiser where he says, no, LLMs are not the way. I think there's plenty of opinions here.

[11:27] Dan:
I've been thinking about LLMs for a very long time. I started in natural language processing, you know, quite a bit back and it's really wild how far LLMs have gone from being a very, very specific technology for a very specific purpose to a general layer that many people are using as kind of the operating system of general AI. Originally, when language models were designed, they had sort of one very specific purpose. Their job was to take language and score it. And so this was originally conceived in a context, for example, like a speech recognizer. If you have a system, maybe it's asked you a question and you're going to say yes, or you're going to say no, and the speech recognition system has to figure out what you said. Whether you said yes or no, those sound very different. And the difference between yes and no and O is going to be shown in what frequencies are present in your utterance at what times. And that's what the acoustic model of speech recognition did. It figured out of all the things you might've said, which ones acoustically in terms of their sound match the input. But then there was the problem that the acoustic model could tell the difference between YES and NO because they're pronounced very differently. but it couldn't tell the difference between no-n-o and no-k-n-o-w.

[12:54] Conor: [OVERLAP]
Rrrr.

[12:55] Dan: [OVERLAP]
I think this is a good answer to the question, and one of them is a bit of a weird word to answer a yes-no question with. The language model was meant to tell you that. It was meant to, you know, a classic example, which I'll have to hyper-articulate to make it clear, because that's the whole point here, is that the sounds in it's hard to recognize speech very similar to it's hard to wreck a nice beat. And if I say those quickly, they're going to sound the same. And it's the language model that comes in and says, hey, acoustic model of all of these possible hypotheses that you have in terms of a transcription, which of them seem like something someone might say or might say in this context. So language models were just there to score plausible from implausible. And that core aspect of being a plausibility box really has just scaled up and up and up. And now they capture plausibility in a much deeper way. The correlations now aren't local frequencies and correlations of adjacent words. Now they're all kinds of long distance knowledge, topics, syntactic, all the correlations that are in not only the structure of language, but the contextual meaning and the real world plausibility. of the words you're using. And so this is all now kind of put into one big bucket in the language model. And as a result, we've built a system where the plausible answers involve translating from one language to another or answering a question. And that's how it sort of become a general broad attack on AI.

[14:26] Conor:
I think a lot of engineers listening would love to know your thoughts on benchmarks and how they should be thinking about basically getting their graveyard of AI prototypes into actually shipped production. Is it a wait for better models situation? Is it a pick a better model based off X benchmark? Or do we need to kind of fundamentally rethink the architecture in here?

[14:48] Dan:
A couple great questions in there. One has to do with, what should we think about benchmarks? And the other is, to what extent should we just wait for Model M plus one to solve the problems of Model M? And on the benchmarks front, I think this is one of these scenarios where a lot of people feel like the only thing worse than the benchmarks we have would be to not have benchmarks. the history of artificial intelligence, there have been major advances since we moved towards a very data-driven, metric-driven hill climbing, where we measured one approach versus another based on how well they did. But of course, You got to get your metrics right. Not having a metric means you can be making a progress. Having the wrong metric means you can be diluting yourself into how much progress you're making or what kind of progress you're making. And the biggest challenge with metrics traditionally has been that at first they represent, you know, someone's guess at what the important problem is, and then full climb around a metric becomes a game in and of itself. And that that has several problems, one of which is maybe that metric doesn't represent what you actually want to improve. Another problem is you start to learn that data set. And we can talk about what that means to be overfitting a dataset because there's a lot of ways to overfit other than you accidentally trained on your test data. And so what will happen is the benchmarks kind of lose potency over time and also they're probably measuring the wrong thing in the first place. It's very hard to get that right.

[16:33] Conor:
So if benchmarks today are becoming counterproductive at times, what's truly going wrong and how do we fix it?

[16:41] Dan:
You know, I think there's sort of two aspects to it. One is, you know, when they do crash tests on cars and like most cars pass with flying colors and every now and then they'll introduce a new one and most cars will not pass with flying colors because they've optimized to make sure that the tests are covered. That doesn't mean the underlying issues, the tests we're getting at are all covered. So that's one challenge in benchmarks in general. But then there's sort of qualitative issues, like for example, Today in agentic systems, one of the common things, for example, if you take a benchmark like Tau, one of the common things people do is they'll say, all right, well, this contains a bunch of scenarios. I'll run my model on all of these scenarios. I get some right, I'll get some wrong. How well did I do? Well, obviously, more right is better, right? And setting aside all issues of, we can come back to particular issues of how people tend to overfit these things. Even just the idea, how many of these scenarios did I get right, is probably not what you want in an enterprise context. In an enterprise context, you want something like, for how many of these scenarios will I get it right every time, a hundred times in a row, in such a way that I could ship a product that covered that scenario? Because if you have one system that gets sort of 80% of the things right, 90% of the things right, we can't really ship those things. Because if 1 in 10 customers gets You know, you know, you, you tell them you booked their flight, but you didn't like, that's, that's a real, that's like, that's a, that's a showstopper. But if you have some other system that gets 70% of the scenarios right every single time, no matter how they're approached, well, that's hugely valuable. And so a metric like that isn't measuring consistency in a way that would help you line up with what can be shipped. And that's sort of independently of the fact that over time, you know, there are other issues with metrics getting saturated, even when systems aren't necessarily making progress.

[18:40] Conor:
Yeah, I had, I had a Hamel Hussain on the podcast last year, which was a great episode on evals for anyone interested. And he argued for this idea of mindset over metrics when it comes to AI engineering. And you're basically saying something related, but differentiated here, which is that we need to be thinking about benchmarks at different tiers, almost of does this relate to actually delivering value? What's the risk of contamination? You know, we alluded to that. And it's pretty clear that there is this widespread, I think, challenge around one, contamination in variety of forms, and two, around benchmarks that are nebulous or not necessarily important to delivering real world value. And maybe that relates back to kind of where we started the conversation, talking about the challenges of going from demo to enterprise production. Are we simply going after the wrong problems to some extent?

[19:39] Dan:
Yeah, I think in some cases we are. You know, for example, at Scale Cognition, our model APQ1, the thing we are focused on is building a model that can reliably and repeatedly do the right thing and follow policies and avoid hallucinations. And for us, the important thing is knowing that if it does something, it's going to do every variation of it. Correct. And so when we have metrics internally, we're very focused on that sort of like that sort of reliability. And I think every company needs to decide what they need and make sure the metrics actually capture that because you can very easily convince yourself that something is going to meet your needs when it's not.

[20:27] Conor:
Let's talk about APT1 and your decision to build a model from scratch. You designed it for actions instead of tokens. What does that actually mean in practice and how is APT1 actually architecturally different in a way that makes it more reliable for enterprises?

[20:44] Dan:
Yeah, it's a it's a great question. So I mean, I think part of the core problem with a standard LLM is a lack of a lack of semantics. So the prompt is a control surface. It's just a bunch of words, and maybe some of them are in all caps, and some of them have exclamation marks, and you don't really know what you're gonna get from a prompt. And this leads to sort of this general prompt and pray approach, which is not really, it's not a reliable path to a technology that you can ship.

[21:21] Dan:
Then on the output, what are you getting out? You're getting tokens, which tokens? Who knows? And so you can be empirical about it. You can say, I ran this and I tended to get X or I tended to get Y. But it's very hard to make any kind of statement about what the system can or can't do. And we think in computer science, a lot of our ability to ship things in high stakes environments really grounds out the ability to able to say this can't happen or if this happens this will always happen and so we just came to this realization that you are fundamentally limited operating just in terms of tokens in and tokens out that tokens don't have semantics but actions do right an action can be allowed or not information flows one way or another way and that um really um what you need is you need a system which, who sort of structure, that sort of architectural structure of that system lines up with the needs that you're gonna place on it. And if all you need is a stream of tokens, then sort of an autoregressive token predictor is great. But if you need to be taking actions, activating a bunch of APIs and certain data flows where there's restrictions on where information can come and go, and like suddenly you need more than just hints through a prompt. And so our model is designed with a different architecture, there are different control surfaces, and it allows you to make much stronger guarantees about what it will do. And that, of course, requires that you make a bunch of changes along the way, the data has to be different, the model has to be different. In deployment, you need a different stack and so on, but you need to do a lot of work. But ultimately,

[23:10] Dan:
trying to coerce these sort of noisy token based models into having crisp,

[23:20] Dan:
robust, guaranteed semantics, it's just, that is just a challenging design pattern to try to build a reliable technology out of unreliable pieces, and it's just not necessary.

[23:33] Conor: [OVERLAP]
So you've mentioned a few different major variations from, let's call it the traditional approach to LLMs right now. So one, data, that sounds like you're treating data differently. Two, model architecture. And three, it sounds like you're training approach as well. Can you expand

[23:52] Dan: [OVERLAP]
Yeah,

[23:52] Conor: [OVERLAP]
on the data piece first? If conversations aren't just text, you've got the human with their goals and the context provided there, and the agents are representing some other entity with different goals, and then you have your backend tools that you're looking to call and APIs. How do you create the right data for this training set?

[24:14] Dan: [OVERLAP]
it's such an important question. I think one of the things, the word, the word agentic gets thrown around. now. What is that? Well, sometimes it seems like everything's

[24:22] Conor: [OVERLAP]
It'll probably be in the episode title for being honest.

[24:24] Dan:
better now. Right. So what makes something agentic? Well, a big part of, you know, what does an agent do? An agent takes actions to maximize goals and sort of the classical AI definition. And so people often throw around like, oh, here's a kind of rag-based question answering system. It's agentic. What does that really mean? It means you ask questions in kind of a loop that's maybe conversational. But to me, that's not agentic. To me, agentic means there are going to be goals at play. There's going to be actions, and those actions are going to have consequences. Otherwise, people wouldn't be wanting them to happen. So what does training data have to look like? any agentic interaction of this kind is going to have humans, and of course the humans are speaking natural language with all of its contextual richness. There are going to be actions, APIs, tools, whatever form that takes, which can be taken in general in complex orchestrations. Often to get something done, you need to take a bunch of actions and some well-formed flow. And then there is the agent you're talking to who has their own goals. So if I'm talking to my Sparx speaker and I, you know, ask it to play a song, it's going to play a song. It does what I tell it. When I go to a banking environment, it's not just doing what I tell it, right? There are policies and goals, and it is a conversation interacting my goals, the system's goals, the ambient policies, the APIs. And so it is a complex thing. And one of the things that we spent a long time figuring out is how do you get the data you need that describes what people say in context, based on what they want, how an agent should reply, what actions should be taken, how that telemetry is all connected. And that's data that just doesn't exist. So we had to make it. And you asked about data. That's the piece there. The next piece is, how do you get your model away from only operating in this sort of unspecified token-to-token way? And fundamentally, what current systems are missing is, they're missing, kind of boils down, you know, to use a fancy word, it boils down to metacognition. As a human, somebody asks you a question, you kind of know, you either know the answer or you don't know the answer, or maybe you have a guess, but you're not sure. Systems don't really, do this when they're in like a next token prediction mode. They're just throwing out tokens which are based on the probability of contextual plausibility and so on. And in an agentic situation, you need to know like, okay, can I do that? What information can I do that with? And you have to make decisions about what you do and don't know and what you can and can't do. And so the flow of the actual model itself needs to operate over a different structure, not just cranking out a stream of tokens. And Once you've done that, there's a bunch of advantages because you're operating over the structure. You actually have a distinction in your model between what you do and don't know, what you can and cannot do, as opposed to the situation that gives rise to hallucinations, which is, I generated some tokens and I actually have no idea whether they're right or not. If they're right, we call them truth. And if they're wrong, we call them hallucination. But the model does not internally distinguish between those. A metacognitive model does. Where did this information come from? Where is it allowed to go? And so on.

[28:02] Conor:
It's interesting hearing this perspective on really breaking down how to think about a model because I think often we just treat a model like a black box that does a thing and we hope it's right and we try to give it the right data to make it more right. But even when we are doing things like evaluations and we are measuring for hallucinations, there is this element of challenge around ground truth, and it makes it hard to do evils effectively if you don't know what your ground truth is. And we're also very, as you pointed out, opinionated about what hallucination is when we think it's wrong, whereas when it has essentially still hallucinated something else, it's still making it up, it's just got the answer right. Uh, we are like, Oh, great. You got the answer, right? We're happy. You know, obviously you can feed, you know, just in time information and help drive better responses there. There are plenty of things you can do here, but how are you using APT one, the model that you've built to try to structurally guarantee a higher degree of accuracy, as you put it earlier.

[29:12] Dan:
That's a great question. So I would say the fundamental thing that's different is because APT is architected around information and actions,

[29:24] Dan:
and it has that, in that sense, this metacognitive ability of tracking its information and moving that around, it really just gives you And it gives information and actions and actual first class status in the model. When we talk about hallucinations, like what is a hallucination? Like for a word that's thrown around, it's actually pretty unclear what people mean. So for example, we'd probably say it's a hallucination if you had a rag system and you retrieved the answer. And then the system went ahead and said something different than the answer. What does that actually mean? Like maybe it reworded it, but it still means roughly the same thing. Where's the line? And so for as important as hallucinations are, I think people don't do enough of drilling into what causes them. What are they? How would we architect our models differently as we've done to avoid huge classes of them? And, um, And I think that's, I think that comes a little bit from like anthropomorphizing these models, that when it tells you something that's not true, you're like, oh, it's hallucinating. It's lying. It's not doing any of this. It's always guessing.

[30:31] Conor: [OVERLAP]
Yeah.

[30:31] Dan: [OVERLAP]
Many guesses are correct. Many guesses are not correct. And we add these labels. If the system had some explicit notion of the truth and made a choice to not use it, well, that might be a deception. But that's not what these systems are doing. They're just putting tokens out there and sometimes they're the wrong tokens. And it is very hard to architect reliability into a fundamentally noisy system. That's why we've built our system that

[30:56] Conor:
What are the different classes of hallucinations you've been able to avoid with APT1?

[31:02] Dan:
way. Yeah, it's a great question. And it's really funny because when we go and we often we're talking to enterprises and they will tell us, yeah, we're not happy with this existing solution, this other model, because the hallucination rate is way too high for us to be comfortable shipping. And we'll look at it. And what's funny is hallucinations are a bit of an iceberg in the sense that they see these hallucinations and rightly are worried about them. But when we go in and we actually look, often the hallucination is Often the hallucination rate is five times that. And that's because there are different kinds of hallucinations, many of which they're just hard to detect, right? Because if it's plausible enough, it's going to be indistinguishable from the truth. And one of my biggest fears with the road that we're going down with these sort of token-based autoregressive LLMs that do not have any additional semantic pairing structure is that ultimately we talk about plausibility engines. I worry that we are increasingly building systems that generate output that is indistinguishable from the truth. that is not a good situation to be in, right? And historically, when systems make mistakes, their output is distinguishable as such. And now it can be very, very hard. You go to, you know, you go to a system that's backed by an LLM, you ask a question, you get an answer, and it sounds totally solid. But sometimes people will notice a hallucination. So one kind is just like a factual mismatch. You ask for your bank account balance, and it gives you a number, and it's not your actual balance. together the tokens and they don't happen to correspond to the truth but then there are hallucinations like It tells you a refund policy that is a refund policy. It's maybe even a plausible and common one. It's just not your refund policy or it's your refund policy, but maybe for someone with a different, you know, frequent flyer status or something like that. And when you look at these as a developer and you're like, okay, okay, let's look, let's, let's look through this, this output. Okay. Looks right. It looks right and is right are not the same. It is very hard to detect hallucinations when they are packaged up sufficiently fluently. So one of the tools that we ended up building, mostly because we need our data to be clean, but we built this really rich suite of evaluation tools, which dig for hallucinations. And you find all kinds of stuff that looked close enough to true that people thought that part was right. And they got upset about some hallucination. They only see the egregious ones. they see the ones that are detectable. And I think that is a really big issue with AI technologies more generally, is that their mistakes can be very hard to detect for humans.

[33:50] Conor:
Another interesting element of all this is besides the differentiated architecture, besides the differentiated approach here, you built your prototype model for under $11 million, which is a fraction of what the Frontier Labs are spending. What does that say about the path forward around this push for bigger models? Where do you think the axis of improvement is?

[34:16] Dan:
So

[34:18] Dan: [OVERLAP]
whenever you scale things up, scale is usually good, right? When you scale up models, particularly in machine learning, they do tend to get better. And of course, since then, our more recent models are increasingly large. When you scale things up, they do improve. But they tend, on whatever axis you're scaling, there are eventually going to be diminishing returns. So we got really great gains by training on more and more of the web until we sort of, as a community, exhausted the high-quality material on the web. And we can come back to that if you want. But pretty much any direction you take a technology in terms of scale, it'll go up, but with diminishing returns. When you can head off in an orthogonal direction, you aren't yet to the point of diminishing returns with those new ideas. And so rather than technology always following this path where things just kind of grow and grow and grow without bound, which is often how it can seem if you are outside the development of that technology. Most technology is much more like a sequence of ideas, each building on the last and bringing you from a new operating point to another one, to a new scaling law. And so, for example, you see this in chip development, where it wasn't just packing more transistors. There was idea after idea after idea that kept the improvements of computation increasing. And the same thing in AI, like pulling in more and more of the web is great, but eventually you run out and you need some new idea. And I do think often people have trouble telling at the beginning of a new technology whether it's an exponential curve they're on or an S-curve. And

[35:53] Conor: [OVERLAP]
Ugh.

[35:54] Dan: [OVERLAP]
people think it's an exponential curve. It's always an S-curve. And eventually, as that starts to flatten out, you want to go in a new direction. So what direction did we go in? We went in a direction that moved away from this token-based view of the world into a kind of richer model space and You know, as a result, of course, we needed data. What was able to drive that and took us a long time to get right was to basically figure out how to do for conversational AI what RL has done for things like game playing or math or code more recently and how to generate that data. Because that's something that's very easy to do if you want to build a system for playing a combinatorial game like Chess or Go. It's a lot harder if you want to build data that is going to show you how to do agentic AI. So that's why the data is very important. But by taking the model in a different direction, we're able to get on a different operating curve. Because the current operating curve is primarily about breadth, and about a sort of horizontal broad intelligence, the ability to do specific vertical capabilities is incidental. And that can be powerful because it, as it gets better, it's kind of getting better at a lot of things, but it can just be incredibly inefficient on many axes in terms of it's just hard to get very far because of diminishing returns. It can be expensive to deploy. Now, I sort of think about this as like, um, Here's an example, you want to learn French, and so you just start reading books in English. And every now and then, there's some French words, and you write it down. And eventually, in the limit of just unlimited English books, you'll learn some French. But one, this is very wasteful. This is definitely not the efficient way. And two, you maybe won't actually learn the full space of

[37:45] Conor: [OVERLAP]
My grammar

[37:46] Dan: [OVERLAP]
French.

[37:46] Conor: [OVERLAP]
might be pretty

[37:47] Dan: [OVERLAP]
Yeah,

[37:47] Conor: [OVERLAP]
crap, or merde,

[37:48] Dan: [OVERLAP]
exactly.

[37:48] Conor: [OVERLAP]
I should

[37:48] Dan: [OVERLAP]
And,

[37:49] Conor:
say.

[37:49] Dan:
and, and so you're like, there's not this narrow little slice at great expense. And so what we've done instead is we've said, if you want to be incredibly reliable, we're going to build a model that is architected around reliability. And I really do think one of the emerging trends we're going to see is there going to be some things where there's horizontal intelligence. is really great having wide ranging conversations that match on different topics. And there's going to be places where much more vertical capabilities are going to be important, making sure that whether it's medical or banking or something like that, making sure that the system is doing, is doing the right thing, is following the rules and is not going to hallucinate and mess up an action where that's going to have a high cost. And I think those are just different operating points. And that's important. There's not just one curve.

[38:39] Conor:
This conversation around S-curves and where we're at deserves quite a bit of time. And we're going to get back to it, but I don't want to lose a thread that you brought up, which is this idea of, you know, studying, training LLMs off of, you know, chess or Go. the self-play approach. And there is a ton of research that shows that, you know, when we apply RLs, you mentioned earlier, we say, OK, great, go play 500 games of chess and all the ones that you win, we're going to, you know, uprank those a little bit and you'll slowly get better over time. Thanks to Galileo for sponsoring this episode. Their new 165-page comprehensive guide to mastering multi-agent systems is freely available on their website at calleo.ai and provides you the lens you need to understand when multi-agent systems add value versus single-agent approaches, how to design them efficiently, and how to build reliable systems that work in production. Download it for free at the link in the show description to discover how to continuously improve your AI agents, identify and avoid common coordination pitfalls, master context engineering for agent collaboration, measure performance with multi-agent metrics, and much more.

[39:57] Dan: [OVERLAP]
I

[39:57] Conor: [OVERLAP]
That can work. We've seen it improve quite a bit in a variety of systems. I mean, there's a recent AlphaGo, you know, learned by playing bad games of Go against itself. But how did you make this work for conversations with an agentic model where there isn't a zero-sum winner to pick?

[40:15] Dan:
think this is, in general, I

[40:19] Dan:
could take a step back and talk a little bit about what makes RL powerful. RL is all about doubling down on the things that work and doing less of the things that aren't working. And it's really that simple. And there are a bunch of mathematical ways to approach that. And there are a bunch of different kinds of data collection environments that all really qualify as RL, but they all have this property of try various things and go in the direction of the things that are working. So if you're playing Go or chess, even if you don't know how to play well, even if you're playing yourself, even if you're moving randomly, one side wins, one side loses. And that means you have a gradient, the one side was better than the other. And in that game, the reason you were able to do that rollout is because the rules of the game were crisp, you might not know what good move is, but you know what the legal moves are. And the outcome of the game is clearly defined. And in In that sense, you might need to play a lot of games to get good at chess or Go, but you have a gradient default. If you want to be using RL outside of that, and people do, and they, you know, if you use RL for something, whether it's code or math or something like that, the key properties that have made RL work is when we can figure out what is the actual reward function we're climbing. And that can be hard when it's not who won and who lost. And then what makes a well-formed instance this is as you mentioned it's easiest for a combinatorial game it's next easiest for something like math where if you find a solution you can sort of then plug that into a theorem prover or something to verify it or something like that and when you want to apply it elsewhere the thing that typically holds people back is synthesizing data does not mean synthesizing good data in general if you have a model build data and then train on the data like you may not actually have gotten anywhere. And so what we had to think about really hard was, in this situation, what's going on? There is a user, there's an agent, there are APIs, there are policies and rules, and there are goals. And how we can do simulation of that in a way that we can take it and put it in an RL setting and then drive the training of the model. So part of that is getting the data simulation, part of that is making sure quality is high enough. It's making sure the data has the right structure to then feed into this kind of model. And there are a lot of pieces to that. But at the core, it is the same thing. It's just, it has been a challenging problem to the research community to apply these sorts of methods outside of these very sort of discrete combinatorial domains. And that's one of the big things that we've been able to do in building our

[43:05] Conor: [OVERLAP]
super super interesting. I love the breakdown of basically how to set up a RL instance in a less clear domain as well. I think that'll be really interesting for a lot of our listeners. I want to get back to this S-curve piece that you brought up. You know essentially I think what you're saying is You know, we spent a lot of time having LLMs consume decades of human knowledge that was

[43:31] Dan: [OVERLAP]
Right?

[43:31] Conor: [OVERLAP]
sitting in digital form. We helped them structure it. We've created some synthetic data to help supplement it. And this created this massive phase change as we threw more and more compute at it and more and more data. But we have now kind of hit a saturation point where, okay, like, yeah, we can keep throwing more compute at this problem. Yeah, there are techniques we can do to make things more efficient. But it's hard for us to scale quality data at the level we need to. Synthetic data will only get us so far. I'm putting words in your mouth, but here's kind of my summation so far. So if we've

[44:07] Dan: [OVERLAP]
I

[44:07] Conor: [OVERLAP]
run through most of that golden data, what comes next? Is it just getting better at synthetic data? Is it something else?

[44:17] Dan: [OVERLAP]
do think it's a really important thing to realize when you look at how rapidly LLMs have sort of burst onto the scene and brought this kind of horizontal intelligence. It's a noisy intelligence, but like it's very broad. into play, and it seems like it happened overnight in a lot of ways. But I do think it's important to recognize that although this is very reductive, there's a lot of things that go into training these models, at the core they are a condensation of the web, right? And It took us some number of years here to figure out how to scale everything up to the point where you could just kind of compress the web down, but that's basically what drives them. And that's also why, to a large degree, lots of different models have pretty similar capabilities and behaviors, because to a certain degree, they all do share that same connotation of the web. what people do beyond that, the data and usage may then diverge. But that core is the same for many of the models that are out there. And, well, how long does it take to train a model like that? Oh, maybe it takes a week or something. How long does it take to, did it take us to scale up the machinery? Well, it took a couple of years for that. But how long did it take to write down all that human knowledge in this readable form. Well, that's been like 30 years of writing things down on the web. And then

[45:45] Conor: [OVERLAP]
And

[45:45] Dan: [OVERLAP]
how

[45:45] Conor: [OVERLAP]
even

[45:45] Dan: [OVERLAP]
long

[45:45] Conor:
more before that, that just made its way onto the web.

[45:48] Dan:
did it take people to discover that knowledge in the first place? Millennia. And so it really is a case that even before LLMs were here, people were already learning things kind of distilling that down into language to put it into books, taking those, those books, making them digitally available, getting them onto Wikipedia, like, you know, all of this process. just let us tap into this vast structured representation of human knowledge. Because every time we write down something we know in a book, that's a structured representation of human knowledge. Language is the abstractions that we have developed over millennia for communicating that knowledge compactly and having the right abstractions and the right mechanisms for communicating about them. And so it was kind of already there. And it's not that our, when people just scale up the kind of obvious autoregressive thing, it's not, when we talk about hitting diminishing returns, a lot of that is just, there's no more books to read, right? There's no more, like, why are they gonna get to the next level of theoretical physics theories? Well, you know, when somebody writes a book and throws it on the web, they will definitely get there. And the question is, what do you do with the fact that once you've consumed all of the declarative knowledge on the web, the golden stuff is kind of used up. So you mentioned synthetic data. Synthetic data can be very effective or very ineffective depending on the situation. So in a case like Go, synthetic data is great. In a case where models just kind of throwing together some output and you hope it's correct, it's probably gonna be too noisy. Like it's very important that the synthetic data be the right kind, that it be clean, that you be learning the right way with the right structure and architecture. And so there's a lot of things you gotta do right to get synthetic stuff to work. But yeah, we've really just gone through a lot of that core data. And so what you're gonna see, every technology sort of, it's always the same super cycle. It always looks like this. People are, people are stuck. They're working on, they're grinding forward on some benchmark, on, on some criterion and things are getting a little bit better, but the systems are getting complicated and it's hard to make progress. And, and then some new technology comes out. Like, oh, let's build a giant R regressive neural net. And suddenly a lot of Suddenly a lot of benchmarks start falling and this new technology, it's kind of great at all kinds of things. And it's broadly breaking records and it's this explosive advance. And not only that, it's doing it more simply than the systems people had before. Or you look like old machine translation systems in the neural system, like it's better and it's simpler. And so this new technology comes out and people are like, that's it. Like we're on this exponential curve. And then that starts to level out because there's diminishing returns from that idea. And then people start saying, well, how do we get better? How do we get past the limitations of this new idea? And then often the answer is you're missing something. Sometimes that's something you threw out from the last cycle. So for example, AI was very much based on like representation and search, like if I had natural language, I would take a sentence, I would parse that sentence, or I'd take my machine translation input, I would carve it up and move it around according to some rules. And then suddenly everything was replaced with this purely data-driven autoregressive system, and things got better. But then people are like, well, maybe we could do better than this by having some structure to what we try, and maybe we'll try a couple things and use the one that works, which is sort of the simplest version of a reasoning model. Well, I guess that used to be the only thing we did was do this combinatorial search over stuff and see what worked. And so what will happen is you don't get the exponential curve, you get beginning of the S-curve looks exponential, it levels off. And then some new idea, which may well be a reincarnation of an old idea, comes in to take you to the next step and then the next step and the next step. And ultimately our technologies all follow this pattern. And we're at that point now where just scaling up, it's not that it will never help, but the S-curve is leveled off and the big gains are going to come from new ideas that are orthogonal to what we're doing already.

[50:07] Conor:
one of those ideas that is being widely discussed is reasoning models.

[50:14] Conor:
What's your take on reasoning models and their potential to accelerate AI development again?

[50:22] Dan:
Reasoning models can be really powerful where they apply. They don't apply everywhere. So they're good at a couple things. They're good. And of course, like, just like RLM is many things under an umbrella, reasoning models are many approaches under an umbrella. So like a oversimplification, like a caricature of a reasoning model might be try a bunch of things and pick the one that worked. Well, when does that, when is that better than the kind of baseline alternative. Well, when you can tell which one worked. So for like a lock and key problem, try a bunch of tactics to prove this. But once I find the right tactic, the proof goes through. That's why reasoning can be powerful for math, because trying multiple things is valuable. You can tell which one worked. Other reasoning models are are kind of more focused on figuring out how to do things for the first time. That can be powerful. It can also be totally unnecessary. So if you think about humans and how we go around in the world and take actions which we hope are intelligent, Sometimes we are thinking very hard about, like within our model of the world, we're planning, we're thinking, and we're doing a lot of sort of advanced thought about what we're going to do. And then we go try to do things. That's good for when you're figuring something out for the first time. But once you've done something a couple times, you just know how to do it. And what used to be computation and planning becomes memory and experience. And now you can do it without solving the problem again. And in general, we don't want our AI models solving the problem again every time from scratch. Maybe that's great if you're the one selling tokens, because it burns tokens, but you want to ultimately, in humans, that's eventually compiled into our memories, into our environments. We have signs everywhere in the world, so we don't have to search the world every time. We just write down what was at the end of that hallway or whatever.

[52:29] Dan:
But then humans take this even further. there's like a chip of rock flying at your face, you just blink, you don't stop and think, huh, it would be bad if this rock hit my eye, I can move my head, I can close my eye, you don't do that, you just blink. And so as humans, we use different amounts of computation and meta-computation based on the situation we're in. Reasoning is a very particular kind of use of test-time computation, which works for some problems and absolutely is not the right solution for others, and that's your only goal is to sell tokens.

[53:04] Conor: [OVERLAP]
I'm curious to dig a bit more into the verticalization piece you brought up earlier. So you mentioned that one of the areas that scaled cognition has focused early on has been regulated industries with the idea

[53:21] Dan: [OVERLAP]
I

[53:21] Conor: [OVERLAP]
being, okay, this is where there is this high degree of accuracy that is needed. And, you know, APT1 is built with tool calling in mind. You're looking to reduce hallucinations. This feels like a great fit. You know, there's there's low error tolerance. What have you learned from your early deployments?

[53:42] Dan: [OVERLAP]
wouldn't say specifically that the thought, the first thought in my head was, well, let's do regulated industries. The first thought in my head was, I want to build models that you can trust. Like, what does that mean? Well, it probably means something about them when they represent information to you that it's true. And that has to do with hallucinations. It probably has something to do with being able to follow instructions. And as we sort of thought through what a model would need to meet this criterion of trustability, which connects to other things. Like I said, it connects to controllability, it connects to auditability and explainability. If you take a standard model and it does something and you say, why did you do that? It will then start spraying tokens at you that like purport to explain why it did that. But like, there's no reason that the explanation actually is isomorphic to the computation in any way in a sort of naive approach. And so that's another thing that we thought really hard about in the model is like, it'd be really great if a model's natural operation was left an auditable trail. And as we started putting these pieces together, we started to notice, well, this is going to line up particularly well with a lot of the needs in regulated industries, where you need to be able to have you know, a traceable, an audible trace of what you've done, and you'd be able to point to where you got the information and what policy you were following. And moreover, like, and more fundamentally, when the system takes actions, because again, an inventive system is one that ultimately takes actions. Those actions were regulated for a reason. They were important to moving people's money around, medical decision making, all of these are regulated for a reason. It's because the mistakes have a very high cost. And when I see that, I think, okay, well, this is a case where I don't need a model which from time to time exhibits some kind of, you know, creative genius. I need a model which will reliably do the right thing every single time. And to me, this is ultimately about trustability and our focus being intelligence is important, but the thing you hear a lot about is super intelligence. I am more focused on super reliability. That's

[56:17] Conor: [OVERLAP]
Mmm.

[56:17] Dan: [OVERLAP]
a good fit for industries. But one of the things we learned is it turns out pretty much every industry feels like if you take the wrong action, the price is too high, right? Like, okay, it's worse if you make a medical mistake, perhaps, than if you, you know,

[56:35] Dan:
charge somebody the wrong amount for an online shopping order. But that also doesn't sound great. Or if you give one person a refund to one policy and then deny it to the next person because you're just making up refund policies on the fly, what enterprise wants that? And so one of the things we found is that the regulated industries were maybe more sophisticated about how they thought about these mistake risks, but that pretty much across the board, all enterprises find it important to be able to trust their systems, to be, to obey controls and to take truthful actions. And that's where we are today.

[57:14] Conor:
Why not just put a second model on top? You know, I use Claude to code this Gemini, you go review the code.

[57:21] Dan:
you know, to crib the joke about regular expressions, now you have two problems. So you have a model and it's unreliable. And so you bring in another model that's also unreliable to check it. And again, now you have two problems. And so it's sort of a constellation or chain of model kind of approach with models checking models. And it can reduce errors, but

[57:46] Dan:
First of all, it's not very effective. So what can happen is, OK, you have multiple models. They can all fail. Sometimes one catches the error. Sometimes it introduces an error. And then you need a third model. And suddenly, things are very complicated. And you've got like 10, 15 models. And everybody's checking everybody else. It's burning a lot of tokens. It's probably taking a lot of time. So it's expensive and it's slow. But worse than that, it's just still very hard to reason about these things. It's hard to guarantee anything out of a set system with this complexity. And ultimately, it just doesn't actually work that well. That's partly because when you, you know, you have a conversation, that conversation may have 20 turns, let's say, Well, if you've got a model that's like 80% right, and it gets checked by a model that's 80% right, and you think, oh, that's pretty good, right? It's a pretty good model checking a pretty good model. But that means, OK, well, 4% of the time you make a mistake, assuming they're independent, which, by the way, they're not. When you have a hard instance, the model and the checker are going to fail in exactly the same cases. So 80% checking 80% is going to be like 82%. Um, but let's just imagine they are independent, but then every single turn of the conversation is another chance to mess up. And it just, the combinatorics just works against you. Sooner or later you mess up, you make a, you know, you make a mistake that ultimately you just, another architecture would have just avoided. So I see those sorts of approaches as fundamentally being an attempt to take the hammer you have. which is this kind of noisy, unreliable, horizontal model, and trying to hit a nail, which is about vertical, reliable, truthful behavior. And it's just a bad fit. So can you take these sort of chaining actions, build constellations and make things a little better? You can, but it just empirically doesn't work very well. And one of the biggest reasons why it doesn't work very well is because, as I said, an instance or a situation which is hard for one model will tend to be hard for all. This is a lesson we've learned in machine learning, sort of technology after technology, decade after decade, which is that when you have multiple systems, you hope for independent errors and you don't get that, you get highly correlated errors. And that really kills this approach. Plus it's slow and it's expensive. It's just not very good in practice.

[1:00:10] Conor:
It definitely gets pricey, which makes for some challenges for sure, especially as you scale.

[1:00:15] Dan:
And that's why even if you could make this work, it's still better to have a model that in its natural operation doesn't make the mistake in the first place. And that's absolutely one of the advantages of APT1 is it gets it right in its natural operation rather than after seven other models have signed off.

[1:00:34] Conor:
Is there a minimum bar that you think builders should be trying to hit as far as trustworthiness for agentic systems?

[1:00:41] Dan:
That's a great question. There's definitely minimum bars. I think they maybe move around. So I personally feel like the overall direction is, I am at a societal level that we're building technologies. where nobody knows what to trust. They're going to kind of trust maybe nothing, maybe everything. And that this attitude of, it's AI, maybe it's right, is going to leak out. And people are just going to accept, yep, everything we build is going to be built on Jello. And if there's one thing I want to get out to people, it's you don't have to build on Jello. And when you are building a system that your users are going to be coming to as sort of a representation of an interaction with your enterprise. You don't want it to just like YOLO out something that may or may not work. I think all the bars need to be stricter, but they do vary. So for example, like I said, agenda gets thrown around a lot. A RAG system is not really an agentic system. You ask a question, you get an answer. Probably if somebody asks some FAQ and they get a wrong answer back, that's probably not great. You probably don't want that. You probably prefer a system that isn't going to lie to you. But that's different than giving you the wrong medication or something like that, or lying to you about your account balance, and then you make some very bad financial decision. I think the bar can be a little bit lower. And what you see out there is a lot of people are trying these RAG systems, a very low stakes system, because they just know that the bar needs to be much tighter to actually ship a system that takes actions. We are trying to directly meet that need, which is a real agentic system that's really going to do things. Because once you do things, not all things can be undone and the consequences can be much higher. And so I think you ultimately enterprises and users have to ask themselves, like, what's my tolerance for screwing this up? And if the answer is like, yeah, whatever, then OK, yeah, whatever, ship whatever you want. But if you care, if you have a low tolerance for screwing up whatever product you're shipping, then you need to have tighter guarantees and.

[1:03:02] Dan:
Ultimately. That's our, first of all, that's our, that's our goal is to provide models that give those guarantees. If you look at, you know, say chatbots that are out there go and you talk to a, an LLM to just have some freewheeling conversation and the things like lying to you probably often that's just how they're architected. Like if they stopped and they said, I don't know. half the time, which is probably the truth, it would be a totally different experience. Maybe it wouldn't be as successful. That's just not how they're built. Even in terms of how they seem to train, like if you ask people what they prefer, if you're driven too much by human preferences, people prefer you to give the answer than to say, I don't know, if they think the answer is true. And often people can't tell. So there's definitely a preference for a certain style of these. It's just not a good fit for enterprises where what you need is a system that for the things it can do, it can do them right and it will do them right every single time. So I think it's just important for people to figure out what is their bar? What is their tolerance risk going up? And are they in a situation where they want guarantees? And if so, they should select an architecture and a model accordingly.

[1:04:07] Conor:
We've talked a lot in this conversation about, you know, what do builders need to be considering? What do AI engineers need to think about? What do researchers need to think about? What do enterprises need to think about? Those are all important questions. It's, you know, key topics for this show. Much of our audience fits in one of those categories, maybe all. But you raise something that goes beyond this technical idea. What does it mean for society when AI produces output that's perfectly fluent but potentially wrong, and there's no obvious way to tell the difference? What's your perspective?

[1:04:44] Dan:
Yeah. Like I said, one of my biggest fears is that, as a field, we are building systems which produce outputs indistinguishable from the truth. That can be very corrosive in a lot of ways to society. And there are a couple pieces to that. One is the technology. Like, maybe we should be putting a little more effort into building systems that actually tell you the truth. Systems that even have a first-class notion of whether what they're saying is the truth. So that's sort of the technological side. But I think an equally important, if not more important, is the social side. And it's really ultimately a question of digital literacy and what we're ready for as a society. So in

[1:05:26] Dan:
earlier era, in earlier eras of language technologies, typically A few things were different. First of all, you could often tell mistakes. So if you used a machine translation system and the output was not very fluent, that tended to correlate with it being not very faithful. So you're like, I don't know, maybe I don't trust this. There were signs. If you have a technology that always sounds right, but often isn't, it becomes very hard as a user to know what to trust. Another analogy I can draw is if you imagine, you know, how did we find information before the internet? You go to some library and you look things up and each book was, you know, to some degree vetted to the point where it became a book and you did all that searching and then suddenly there was online search and now I could just type my informational need and cut out that whole step. And now web pages would come back to me and One of the things we had to learn is, well, the bar for putting up a webpage with some information is different than the bar for putting together a book and finding a publisher and getting a library to actually buy it. And so the information wasn't as vetted. We had to learn to deal with that. However, it was still the case that when you typed in your search query, you would just get a bunch of possibly relevant documents, and some weren't relevant, and the search engine messed up enough that you knew that. And some of those webpages, like, they didn't look, you know, super trustworthy. And although the search engine automated bringing the sources to you, you ultimately needed to evaluate what was credible and what was not. If you go to chatgpt and you ask a question and you get an answer, this has all been disintermediated. You are simply getting the end product. Maybe it's right, maybe it's not. And the tools that we have developed sociologically, in terms of digital literacy, to tell the difference, do not apply. It's hard to tell what's the source, so you can't check the source. It's hard to build a correlation between disfluent or otherwise suspicious output and the quality of the information, because all the information has been made indistinguishable from the truth. And all of these things combine in, I think, a very dangerous way that gives rise to one of two things. Either people believe all of it equally, or maybe they believe none of it. Neither of those outputs is really great. I think this is a thing where we can do a lot sociologically. We can demand of our technologies that they actually put the information, that they be supportive of vetting to a higher degree than they already are. And That is something we need to navigate as a society. That takes time. These tools are becoming widespread at a faster time horizon than we are able to socially react or push back. to make them have a shape and to give people the understanding that, yeah, that sounds fluent, but that doesn't mean it's right. And it's only going to get worse as we start reinforcement learning more of these technologies, because when you reinforce and learn a system, you tell it

[1:08:42] Dan:
not just mimic the web pages, which is not necessarily the truth, you tell it optimize this function. And so, for example, if you have a shipping bot and you tell it optimize customer feedback, or asking for satisfaction or something like that, and you call up and you ask, where's your package? And the system looks it up in the database, and it's lost. And then the system knows, I can say, your package is lost, and then you're going to give me a thumbs down. Or I can say, your package will be there tomorrow, and I'll get a thumbs up. And I was told to get thumbs up. And now, this isn't as lying to you. And now we've actually crossed a line. It's not just that the It's not just that the true and the false look the same because the system can't tell. It's that the system has been explicitly optimized to produce things which are not actually what you want the behavior to be. So there is a whole path here that I think is important for us socially to be very aware of.

[1:09:37] Conor:
It's a really challenging problem. And I would actually argue we're seeing both the trends you mentioned play out in real time, where we have a group of people who have increasingly decided that nothing is real. And I would put this towards like the kind of nihilistic the disbelieving, the conspiratorial, we're seeing this trend massively digitally

[1:10:01] Dan: [OVERLAP]
Absolutely.

[1:10:01] Conor: [OVERLAP]
the last few years. And then we have the opposite trend, and I'm not going to get into demographics, but like, you have the credulous, you have those who accept anything at face value, who see any deepfake and don't think to consider, you know, is this something that really happened? They just go, oh, look at this here. At this point, we probably all have a friend who has sent us a video and then gone, oh, gosh, like that was fake or or you look at it and you go, this can't be real. And there's been a few of these that have gone viral as memes. There was the the glass bridge meme that went very viral, I think, around like Soros launch or something like that. There's been a few of these, and I do think our information ecosystem is in real trouble currently. And not only is that a societal challenge, but it's also an informational challenge for LLMs that are continuing to train off the public internet. And it's making it harder and harder for teams who are looking to train models to train systems to identify the right information and to find ground truth. And I do think it's important we acknowledge that.

[1:11:15] Dan:
Absolutely. And I, to your point, I completely agree that we're seeing both people believing nothing and people believing everything. If a model is sufficiently good at producing plausible output, those are actually the only things you can do, right? To believe only the things that are true requires that there be some leverage into making that discrimination, right? How can you tell what is true from what is not? And as each tool you have to distinguish truth from falsehood is removed by the technology, you are forced into one of those two camps. And that's, that's a big challenge. And I think it's worth pointing out that there are factors beyond just how people interact with information, like just people's use of these technologies encourages. So for example, if you have somebody who is, you know, discover that, oh, hey, wait a minute, instead of writing this like tricky email, I'm just going to stay the email I want. And, you know, a chatbot is going to write it for me, and then I can go send that off, you know, and it's done my work for me. That encourages this, this sort of reliance, this delegation of even the responsibility to the AI. And The way I would look at this is what happens is people think, oh, I used to be an author and now I'm not. Now, now the AI is an author and I just send it on. I think we have better options. Like as a society, we could really try to get people to understand when you use an AI in this way, you have not stopped being like part of the work chain. It's just, you've gone from being an author to being an editor. And while people have much more experience being authors, which they delegate, than being editors, which they just kind of don't have a lot of experience with. And being an editor is in many cases harder than being an author. And so people look at this and they're like, all right, well, I could learn this hard skill, or maybe they don't even realize they could do that hard thing of editing it, or they could just delegate it. And once you do enough of that, and once you start just saying, I'm not even the editor anymore, Suddenly, the system now starts to be that's just what you trust. If you're not editing its result, you're trusting its work over your own abilities. And that is corrosive in a supplementary way to the same issue that you were raising.

[1:13:33] Conor:
I think Dan and I could probably go on on this topic for another 30 minutes, but we've already been chatting for 70-ish, so I

[1:13:41] Dan:
OK.

[1:13:42] Conor:
want to be cognizant of his time. This is a bit of a sour note for us here, where we've talked a lot about these interesting technologies, we've talked a lot about their flaws. And I think we both share this worry about, you know, there's a lot to be done to make sure this works out on the positive. There's so much positive that AI can do for us. There's so much it can help us do in the world, but we're not being consistent about that yet. So what I'd love to ask you as we wrap up here is, where are you inspired today by what's happening in AI research and AI development? Where do you think we have major opportunities?

[1:14:17] Dan:
It's a great question. What I am personally most passionate about is, and the answer is the same, whether I'm talking about from an academic research perspective or from our mission, which is I am inspired about building models, which you can trust. And that means building models that are controllable, that are truthful, and where you can place guarantees about what they will do and sometimes more importantly, what they won't do. And being able to bring that sort of technology into the world, I think is very important. to having AI that can be a force for good. Because I think a precondition to that is having that trust, the reliability, the robustness, and the control. That is the core of what I think about all the time. How can we build models that have these properties?

[1:15:30] Conor:
And I know a lot of that thinking is going into what you've built over at Scaled Cognition, which everyone can check out at scaledcognition.com. But where else, Dan, can folks find you on the internet to learn more about your research, to hear your perspective? Where should they follow you?

[1:15:45] Dan:
Like you said, if you're interested in what we do at Scale Cognition, scalecognition.com. If you want to check out the work that comes out of my academic group at Berkeley, you can look up the NLP group at UC Berkeley. And, you know, I think it's really interesting how AI has changed in the course of even my career and certainly over AI existing. Originally, the biggest challenge with AI is nothing worse. That was the problem is everything worked too badly. And now a lot of the challenges are places where things are working, where they're having downstream consequences because of what they can do. And so I just really want to encourage people to be thinking about these issues, you don't have to build important systems on Jello. You don't have to accept that you either believe everything or nothing. We can take actions to improve digital literacy, we can take actions to push the technologies we build to have better properties and not just accept them the way they are. And to me, that's really the most important thing is tools you can trust. And that's really where we're focused.

[1:17:01] Conor:
Dan, thank you so much for the time today. It's been a great conversation. I really appreciate you for bringing so many great insights and for the wide ranging thought process you've provided. I hope our listeners, I'm quite confident our listeners will love this episode. So thank you so much.

[1:17:16] Dan:
Thank you so much for having me. It's been, it's been a ton of fun.

[1:17:18] Conor:
I'm so glad you can say that. We always love to hear that. And for everyone who is listening, if you haven't already signed up for our newsletter at newsletter.chainofthought.show, make sure you sign up to ensure you are getting all the best of this show and deep dive essays into some of the topics we discussed here, like the god 6,600 word essay I wrote about block and AI layoffs and what that all means. If you want more thoughts from our guests, if you want more thoughts from me, that is a place to go. Other than that, thank you so much for listening and be sure to leave us a comment and let us know what you thought about the episode, whether that's on YouTube, Spotify, LinkedIn, or wherever else you like to interact with Chain of Thought. We appreciate you immensely. Thank you for listening to another episode.