Chain of Thought | AI Agents, Infrastructure & Engineering | Beyond Transformers: How Liquid AI Is Rethinking LLM Architecture

The transformer architecture has dominated AI since 2017, but it’s not the only approach to building LLMs - and new architectures are bringing LLMs to edge devices Maxime Labonne, Head of Post-Training at Liquid AI and creator of the 67,000+ star LLM Course, joins host Conor Bronsdon to challenge the AI architecture status quo. Liquid AI’s hybrid architecture, combining transformers with convolutional layers, delivers faster inference, lower latency, and dramatically smaller footprints without sacrificing capability.

Show Notes

The transformer architecture has dominated AI since 2017, but it’s not the only approach to building LLMs - and new architectures are bringing LLMs to edge devices

Maxime Labonne, Head of Post-Training at Liquid AI and creator of the 67,000+ star LLM Course, joins Conor Bronsdon to challenge the AI architecture status quo. Liquid AI’s hybrid architecture, combining transformers with convolutional layers, delivers faster inference, lower latency, and dramatically smaller footprints without sacrificing capability.

This alternative architectural philosophy creates models that run effectively on phones and laptops without compromise.

But reimagined architecture is only half the story. Maxime unpacks the post-training reality most teams struggle with: challenges and opportunities of synthetic data, how to balance helpfulness against safety, Liquid AI’s approach to evals, RAG architectural approaches, how he sees AI on edge devices evolving, hard won lessons from shipping LFM1 through 2, and much more.

If you're tired of surface-level AI takes and want to understand the architectural and engineering decisions behind production LLMs from someone building them in the trenches, this is your episode.

Connect with ⁨Maxime Labonne⁩ :

LinkedIn – https://www.linkedin.com/in/maxime-labonne/

X (Twitter) – @maximelabonne

About Maxime – https://mlabonne.github.io/blog/about.html

HuggingFace – https://huggingface.co/mlabonne

The LLM Course – https://github.com/mlabonne/llm-course

Liquid AI – https://liquid.ai

Connect with Chain of Thought host Conor Bronsdon:

Newsletter: https://newsletter.chainofthought.show/
Twitter/X: https://x.com/ConorBronsdon
LinkedIn: https://www.linkedin.com/in/conorbronsdon/
YouTube: https://www.youtube.com/@ConorBronsdon

00:00 Intro — Welcome to Chain of Thought

00:27 Guest Intro — Maxime Labonne of Liquid AI

02:21 The Hybrid LLM Architecture Explained

06:30 Why Bigger Models Aren’t Always Better

11:10 Convolution + Transformers: A New Approach to Efficiency

18:00 Running LLMs on Laptops and Wearables

22:20 Post-Training as the Real Moat

25:45 Synthetic Data and Reliability in Model Refinement

32:30 Evaluating AI in the Real World

38:11 Benchmarks vs Functional Evals

43:05 The Future of Edge-Native Intelligence

48:10 Closing Thoughts & Where to Find Maxime Online

Creators and Guests

Host

Conor Bronsdon

Creator and Host of the Chain of Thought Podcast | Technical Ecosystem Lead at Modular

What is Chain of Thought | AI Agents, Infrastructure & Engineering?

AI is reshaping infrastructure, strategy, and entire industries. Host Conor Bronsdon talks to the engineers, founders, and researchers building breakthrough AI systems about what it actually takes to ship AI in production, where the opportunities lie, and how leaders should think about the strategic bets ahead.

Chain of Thought translates technical depth into actionable insights for builders and decision-makers. New episodes weekly.

Conor Bronsdon is an angel investor in AI and dev tools, Technical Ecosystem Lead at Modular, and previously led growth at AI startups Galileo and LinearB.

Disclaimer: All views, opinions and statements expressed on this account are solely my own and are made in my personal capacity. They do not reflect, and should not be construed as reflecting, the views, positions, or policies of my employer. This account is not affiliated with, authorized by, or endorsed by my employer in any way.

[0:00] Maxime Labonne:
So post training is, is quite simple to define. So you turn a model that is able to do this auto completion stuff into a useful assistant that is able to answer questions and follow instructions.

[0:19] Conor Bronsdon:
Welcome back to Chain of Thought, I am your host, Conor Bronson. And today we're going to be going down the stack as I'm joined by Maxime Lebon. Maxime is Head of Post Training at Liquid AI, where he is leading the development of their liquid foundation models, which we're going to talk about a very interesting and fundamentally different architecture that's challenging the transformer monopoly.

[0:41] Conor Bronsdon:
He's also a prolific open source contributor. Maxim is the creator of the wildly popular LLM course on GitHub with over 66,000 stars and much more. But we're just gonna say, hey, Maxim, welcome to the show. Great to have you Hey, Connor. Hey, everyone. Thank you for having me. It's honestly our pleasure. We had several listeners that reached out to us and said, Hey, look, we'd love a deeper conversation with Maxime. We think he would be a great guest. And so it's a delight to have you on the show.

[1:09] Conor Bronsdon:
And we're going to get deep into architecture, some post training techniques perhaps, and the trade offs that have come from your experience with open source and so much more. But let's start with Liquid AI. In an industry where attention is all you need has been gospel since 2017, you've gone back to first principles with liquid foundation models or LFMs. Can you walk us through

[1:35] Conor Bronsdon:
what LFMs actually are for those who don't know, and how they differentiate

[1:41] Maxime Labonne:
at an architectural level from other models? Yeah. Thank you for this question. So, Liquid started working on architectures since the very beginning. This is something that we're very keen on, and we had this first version of LFM models, now called LFM one, that was released in 2024. And now we released LFM two in 2025, and this generation is open sourced. We have a ton of models.

[2:10] Maxime Labonne:
I checked just before this recording, and we've open sourced 17 models since July, so in in just, like, three months or something. And it's it's going strong. I can I can tell you we still have more? But to go back to the question about the architecture, what we wanted to do with this LFM two models is having a on device LLM that is fast and accurate. And the current breed of LLMs

[2:38] Maxime Labonne:
is accurate, but it's not very fast. So we wanted to go back to the architecture level to be able to design something that was truly optimized for this kind of hardware, a phone, like a wearable, etcetera. So the architecture that we ended up with is a hybrid that has some attention layers. It has six attention layers in the three fifty million parameter, 700,000,000

[3:08] Maxime Labonne:
parameter, and 1,200,000,000 parameter model. It also has a new component, which is a short convolution layer, and here we have 10 of them. So there are more convolution than attention layers, in these models, And what we gain from it is that inference speed is a lot faster. And the memory usage also is a lot lower when you have long context, so your KV cache doesn't explode because of the convolution layers.

[3:38] Maxime Labonne:
And all of that while maintaining the level of quality of a pure transformer model. So those are the three metrics that we look at, and usually there's a trade off between the memory usage, the inference speed, and the quality. And here, we we managed to kind of raise this priority frontier, of optimizing,

[3:59] Conor Bronsdon:
the model for the three task. And it sounds like one of the key goals for Liquid is to enable your models to run extremely well on edge devices so that anyone around the world can leverage an LLM on their phone regardless of whether or not they have a subscription to OpenAI or whatever else. Yeah. Exactly. We started working with customers

[4:22] Maxime Labonne:
in 2024 and even a bit earlier than that, and we realized that the model that was the most demanded was our one b model. And to be very honest with you, our one b model, the first generation of LFM, it was a bit of an afterthought. We said, ah, yeah. Like, we we can do it, so why not? Right? But it was not like a flagship model at all, and yet that was the most popular one. So that got us thinking that there was probably a really interesting market to explore because nobody is in this niche of

[4:57] Maxime Labonne:
small models that are really optimized for on device inference, and this was the reason, behind this choice of going on device LLM for LFM two. I love how unique the approach is because

[5:12] Conor Bronsdon:
I think most competitors here who are building models are really focusing on, okay, like, how big can we make this damn thing? How many data points can we get in it? And you're taking the exact opposite approach. You're saying, what do we need to do to get this on the minimum viable piece of hardware that someone is gonna have in their pocket? And I, I think that is going to deliver dividends long term.

[5:38] Conor Bronsdon:
And I've heard you describe these LFMs as being built with computational units deeply rooted in dynamic systems. Can you unpack a bit what this means in practice for people? Since I think some of us, and I'll put myself in this camp here, we hear that and we go, okay, like, what are we really trying to say here? Like, like, okay, great. We're trying to get it on an edge device,

[6:05] Conor Bronsdon:
but what what's what's really happening here? What's the goal? Yeah. So this is,

[6:11] Maxime Labonne:
people who work on architecture's lingo. And what what what they mean by that is basically, yeah, going back to mathematical operators. And when you go back to mathematical operators, you realize that the attention layer is super strong, actually. So it's very it's a very, very strong baseline. And that's another lesson that is very important when you work in this space is that

[6:38] Maxime Labonne:
theoretical gains might not be realized in practice. And it's not because you have an architecture that is supposed to run very fast because the algorithm, the math says so, that it's going to fully realize once you lower the architecture on the target hardware. That was a lesson that we learned with the first generation of LFM models. And for the second generation, something that we did

[7:06] Maxime Labonne:
right at the beginning was optimizing the models on a target hardware. So on a Samsung phone, for example. So we would not be misguided into believing that our our operator is is really good. It's very fast. No. Like, we could measure it and make sure that it's actually working in practice. And I think that part plus having a ton of pre training benchmarks, like over 100 different

[7:36] Maxime Labonne:
evaluations, also helped in converging into, like, the best architecture for this task. And I I think many of us are

[7:46] Conor Bronsdon:
much farther or much higher up the stack where we are we're using models and practice. Maybe we're coding with them. Maybe we're writing with them a bit. But beyond surface level architectural understanding, I, like many folks don't even know how to get involved in the model development space. Now that's not true of everyone. We have some very smart listeners. I'm not trying to call all of you out. But what I'm driving at here, Maxime, is I think you've developed a really unique and powerful skillset.

[8:14] Conor Bronsdon:
And I'm curious from your perspective, like what led you into this path and

[8:20] Maxime Labonne:
to this, not just edge compute, but I think edge of the entire space, this frontier? Thank you. First of all, I think it's about, like, the curiosity of knowing how these models are built. I think it's it's shared by a lot of people. For example, Hugging Face released a very long blog post about how to pre train models, how to push train them to actually, and that is widely popular because I think there's, like, curiosity in everybody that chooses LLM. Like, okay. Like, how do you train one? Like, how does it work in practice?

[8:56] Maxime Labonne:
And if you're interested in that, my recommendation would be, yeah, you can go into this rabbit hole and try to explore the architectural side or maybe the post training side. Like, it doesn't matter, like, what you're interested in in in particular, but just, like, make projects. And that's how I really started with my own projects in the open source. And then

[9:19] Maxime Labonne:
it leads to new opportunities that you wouldn't have otherwise. So yeah, mostly driven by the curiosity of understanding how you make these models. And now I really enjoyed this seat of being able to to drive the training, drive the the data, the evaluations, and iterate and release the models and see how they're used in real life by end users. I would love to ask you about just that. How are you seeing end users leverage Liquid models on their edge devices? So we we started doing hackathons

[9:57] Maxime Labonne:
with Liquid. So if you're interested, know that we have regular hackathons all the time, and people are super creative, like, extremely creative. The last hackathon that we had was in Tokyo, and there's a team, that won the second place, but they had, like, a crazy idea. They said, we're going to fine tune the LFM model and put it on a bike, and it's going to be an AI bike.

[10:23] Maxime Labonne:
And at the beginning of the presentation, I was this is the most stupid idea I've ever heard of. Right? This is, like, so dumb. And at the end of the presentation, I thought, these guys are absolute geniuses. Like, who am I to even think that they're dumb? Like, they just, like, outsmarted me completely. And, actually, it's a model. It's a vision language model built on top of LFM two one point six BVL.

[10:49] Maxime Labonne:
And what they did is kind of role play of a bike. For example, if you ask this model, what is the transformer architecture? It's going to tell you, I don't know. I'm a bike. And that's that's crazy. Right? You never have that in real life. Like, you never fine tune a model to refuse to answer a question. So people are extremely creative in the way that they use these models. This is just one example. There's another guy who made a an app on your phone,

[11:20] Maxime Labonne:
and it it it reacts when you use your phone at night, and it tells you, no. To stop doing that, basically. So you you browse Reddit, and it's going to tell you the best kind of subreddit is sleep and this kind of stuff. It's just popping on top of your screen. It analyzes your screen and the content that you see, and it creates a snarky message to tell you to stop using your phone.

[11:45] Maxime Labonne:
Those are might might not be like the most useful applications. So are

[11:49] Conor Bronsdon:
all of these applications snarky ones? We got a snarky bike. We got snarky Reddit. But you can test them. You can test them today. They're available online, and that's beautiful. Did they put giant googly eyes on the bike to indicate that it had a Vision model, as part of it? I think they did. I think they did. Alright. I I may have to try that myself. That one sounds really fun. I have to admit, I think my wife would be like, what are you doing in the garage today? And I'm like, don't don't worry about it. Okay.

[12:17] Conor Bronsdon:
This this is fantastic. I I love to hear the creativity people are using this with. And if if folks wanna join in on these hackathons, is there somewhere they can go to get more information? Yeah. The official account of, Liquid AI on Twitter, on LinkedIn, you will find all the information there. So as much as I'm excited to have an AI bike, and I think I'm only beginning to scratch the surface of the possibilities as I add more and more googly eyes to it and indicate that I can really see everything,

[12:45] Conor Bronsdon:
I'm certain there are other use cases that that folks are having, and I've, in fact, heard, you know, several businesses that I talked to who are excited about this. Where are you seeing

[12:54] Maxime Labonne:
liquid models in the wild outside of the hackathon? Yeah. There are a lot of different use cases. Right? Because we have different modalities as well. So I talked about vision language for the bike and for the app. But we also have an audio model that is really good and is able to do this kind of audio foundation model where you can build applications on top of it.

[13:19] Maxime Labonne:
It can do speech to text, but also text to speech or speech to speech or text to text. It can do, like, any combination of those, and that allows you to do a lot of creative applications. For example, if you're in a car and you want to talk to an assistant, it can replace Siri and make do a better job most of the time. And we also have Nanos, and these liquid Nanos are small fine tuned version of our models.

[13:50] Maxime Labonne:
So for example, you have a data extraction model, and this data extraction model, you give it some text, and it will return a JSON object with it. You can specify the JSON object that you want to use. And an interesting application, because we've been talking about on device deployment quite a lot, is that with these small models, like the three fifty million parameter model, actually, you can deploy it on GPU. You can deploy it at scale and do big data operations with it, and that unlocks a lot of applications that were not really possible before with generative AI models.

[14:24] Maxime Labonne:
So for example, if you are ecommerce, finance, all these kind of, fields and you have a ton of operations, you can use this tiny model and do, structured but unstructured to structured text, conversion with that. So that's one way of using them, and we have a list of these fine tunes that people can download and play with to understand, like, how to adapt them to their use cases. We have Rag. We have function calling

[14:55] Maxime Labonne:
and many others. Are you seeing

[14:58] Conor Bronsdon:
devs leverage liquid models within agentic structures? This is harder to do with

[15:05] Maxime Labonne:
small models, right, because agentic capabilities are pretty much frontier, but it's possible to do function calling and tool use. It's possible to add them as part of a framework. I've got good feedback actually about the functional calling capabilities of even the three fifty ms model. So it really depends on your task. It will not be able to replace cloud models quality if you need some kind of complex workflows.

[15:33] Maxime Labonne:
But we think of it more of a one task thing. You have one task that would a multi agent system. Exactly. A multi agent system would fit the description here. I'm curious more broadly about your perspective on agents.

[15:48] Conor Bronsdon:
Obviously, that's been maybe the the topic of conversation for almost the past year now within the AI community. And yet, I think there are many folks, and I'll include myself on here with Carpathi and others, who think, you know, we're a little early here still. Like, there's a lot of work to be done. What's your perspective on this? Somewhat debate. Yeah. It's it's funny because people have different definitions of agents as well. So

[16:12] Maxime Labonne:
it's also a very tricky topic to to talk about. I am a bit skeptical about the success of agents so far. I do like code assistance. I do like some workflows that are a bit more automated, but it feels like it's a engineering problem as much as a machine learning problem. And we mostly focused on the machine learning problem so far. So this is something that I would like to be able

[16:45] Maxime Labonne:
to power with LFM models on the phone directly and have some kind of simple agentic workflows. I think that usually agents are good when they're quite simple and straightforward. They tend to degenerate quite fast when it's about more complex workflows where there's a ton of reasoning involved. But just being able, you know, to directly change the settings of your phone

[17:11] Maxime Labonne:
with natural language or even just talking to it, I think it's already like a big improvement, you know, like adding, stacking up these little features is something that would change the way that we interact with these systems. And this is the core of this agentic, not really revolution, but evolution, in the way that we interface, with hardware. I think this is like the the main thing that excites me is understanding,

[17:41] Maxime Labonne:
okay, we can really give models for people to power their workflows. It can be agentic. It doesn't have to be, but this is a way to go forward. And I hope that it will not just be with CAT models, but we can also do it with more on device

[18:04] Conor Bronsdon:
LLMs. Are you seeing liquid models be used in REG systems in particular because of the injection of context that is occurring? Therefore, I guess it probably enables liquid models to succeed at these like extremely high levels with great response quality while not requiring the massive amounts of pre training and and datasets that would be required to, and kind of pull it off from being an edge device

[18:31] Maxime Labonne:
or edge device capable? Yeah. Rack is a good use case because it's not too complex, and a small model can do it very well. We have a Rack specific model actually to do it. And the other thing that is interesting, the RAC pipeline, so you have, like, the model doing the summarization or the question answering, but you also need at least one model to do the retrieval part.

[18:54] Maxime Labonne:
And the retrieval part is also super interesting. So we we released a cold BERT model, which is a late interaction retrieval model. And the idea here is that you have three types of embedding models. You have classic embedding with BERT where you give a an input to the model and it will output a vector, and then you can do some competition with the vector. You can calculate the similarity with other vectors, and this is how you do embedding and retrieval.

[19:26] Maxime Labonne:
You have the the cross encoder or re ranker architecture where you have the query and the document at the same time, and then the it's processed across all the layers, and then you get the similarity score. So this is good, but this is very expensive. And finally, you have this late interaction family where what you do is that you take kind of the best of both world, meaning that you can precompute your vectors,

[19:57] Maxime Labonne:
and, also, you can do re ranking in the same model. So I think this is a very exciting way of doing retrieval, and we thought, okay. Since we have this super fast LFM two architecture, this makes a lot of sense to try to do it because with this, we can have a model that is bigger, so it's a lot better, but it's as fast as a model that is a lot smaller. So you kind of get, like, a super good trade off.

[20:27] Maxime Labonne:
And, yeah, I think that RAG is a really exciting application for small models in general because you can get super fast retrieval thanks to it, and you can also get very decent quality, very good quality in terms of QA

[20:41] Conor Bronsdon:
because it doesn't require, like, a a ton of reasoning. I think the latency point here is particularly important because we see this with a lot of production reg systems where if they've tried to build on, you know, let's say the Anthropic API, for example, there's just latency challenges with that compared to using a smaller model, that's hosted on edge or or elsewhere because,

[21:04] Conor Bronsdon:
if you're continuing to try to, you know, call out to other sources, I mean, depending on your use case, that may not be feasible. Like maybe this is a a just in time delivery system for someone who's in a call and they need an answer immediately based off of your document corpus, so they can help get a deal. They can't afford to wait a minute and a half or, or for the model to generate the response. They need it to be much faster than that.

[21:30] Conor Bronsdon:
So I I I love that you're thinking about that. It seems clear to me that smaller models, models on the edge, are key parts of how we're going to bring AI into so many other opportunities, whether that's robotics or whether that's, you know, simple chatbots. There's, there's a lot of opportunities here ahead. But let's talk about your specialty a bit post training.

[21:54] Conor Bronsdon:
This is where models can go from general purpose to, you know, really useful for specific applications. We, we've talked about a couple of those applications here, but it's also where a lot of teams struggle. You know, that infrastructure work you mentioned earlier can be a challenge for people. And then I think there are some fundamentals that may be slipped by folks. So let's maybe level set there.

[22:15] Conor Bronsdon:
What would you define for audience is post training and why has it become so critical in your view? Yeah. So post training is quite simple to define. It's the step that happens after pretraining.

[22:26] Maxime Labonne:
So you turn a model that is able to do this auto completion stuff into a useful assistant that is able to answer questions and follow instructions. So this is the idea behind post training. It's okay. We have this model that is good at modeling language. Now we're going to turn it into something that I can actually use in practice. And to do that, we have a lot of different training techniques.

[22:55] Maxime Labonne:
The most useful one, like the first one, is supervised fine tuning. Supervised fine tuning is when you give the models the questions and answers that you expect, and this should closely mimic how users actually use your models in real life. And then there's a ton of other techniques. We can dive into them if you want, but more related to, like, optimizing for preferences, optimizing for reasoning,

[23:26] Maxime Labonne:
or having a teacher model that is able to distill some knowledge into the student model that you're currently training. How do you think about

[23:37] Conor Bronsdon:
the decision making around when to use different techniques? Obviously, supervised fine tuning is kind of industry standard at this point, but, you know, preference alignment and other other options you're mentioning here are also used. What are the trade offs that you consider, and and how do you make those decisions?

[23:53] Maxime Labonne:
Because, obviously, cost and time can be factors. No. Absolutely. I think it really depends on the end goal that you have. If you're creating a chatbot, you really want to optimize for preferences. This is not optional. It really makes a huge difference. So preference optimization techniques like DPO, direct preference optimization, or more elaborate one like PPO

[24:19] Maxime Labonne:
can be a good idea. If you want to do a reasoning model that will output a reasoning trace, it makes sense to use reinforcement learning in the loop at some point, which can also be used all the time, to be fair. But once again, it really depends on the capabilities that you're targeting. Does it make sense for you to do it for math, for example? Not entirely sure. And with small models,

[24:47] Maxime Labonne:
it also raises a lot of other questions like, okay. Will someone actually use a 1,000,000,000 parameter model for math ever? Do we need to actually care about this too much? So, yeah, depending on the target that you have, you might have a lot of questions like this. And there's a

[25:10] Conor Bronsdon:
a common maxim here, which is garbage in, garbage out, really saying that data quality is is everything. I mean, this is true for reg systems. It's true for for so many things. Frankly, it's true for education for humans. Right? And so data quality seems to be at the heart of effective post training. How do you think about the data generation and acquisition process here?

[25:34] Maxime Labonne:
How do you identify what a high quality training sample is gonna look like? Yeah. It's it's the most important question, I think, when we talk about training models is how to define data quality. Because, as you said, this is the cornerstone of everything that we make. Training techniques are nice, but they will never replace good data, and having the ability to curate

[25:59] Maxime Labonne:
the highest quality data is essential here. So it's very difficult to define it. I think about it in, like, two or three categories. There's the accuracy of the data because, of course, you want something that is factual. If you ask a question, you want the right answer right. So that one is obviously very important and sometimes very difficult to verify as well. And then you have the diversity.

[26:29] Maxime Labonne:
So this time, it's not about one sample. It's about the entire dataset, and you want to have a lot of coverage. It has to be, like, as diverse as possible. Diversity is so important that you might actually want to include wrong samples just because they're more diverse. And so, yeah, it's a difficult trade off to find sometimes, And there's a lot of other aspects. For example, multiturn conversations

[26:59] Maxime Labonne:
are really good, and all the data should be natively multiturn. Right? Chain of thoughts or some kind of reasoning, at least, when you output an answer is also very good for the model to have this structure of, first, I think about something, and then I give you the answer. Because a lot of models, they start with the answer, and then they retrofit an expert it's an explanation

[27:24] Maxime Labonne:
to to justify why they choose this answer. And you see that it tends to degrade performance, of course, because they didn't have this compute budget to think about the end solution. So, yeah, a lot of different ways to approach this topic. I don't think that anyone has cracked down what true

[27:46] Conor Bronsdon:
data quality is about because that would be, like, the most successful lab in the world. Right. They'd already have one. They'd they'd be at AGI if if we think that's happening. And, I will say thanks for the the chain of thought shout out here. Obviously, it's the name of the podcast for a reason. We're big believers of the in this idea of don't come in predisposed

[28:02] Conor Bronsdon:
to what you're doing. Instead, work through the problem, gather your data, and, be willing to dive into complexity. And you alluded to complexity in data generation and its importance, having multi turn conversations. I know you've done quite a bit of work around synthetic data generation, which feels like a topic that, you know, people go, oh yes, of course, you know, we will do synthetic data generation,

[28:25] Conor Bronsdon:
but they don't necessarily dive deep into often in these conversations. When do you feel like synthetic data generation works well? And then where are you seeing it fall short today? It works well all the time, to be honest with you.

[28:40] Maxime Labonne:
Like, I think that 99% of pursuing datasets are synthetic. There's not much human in them, maybe for the prompt actually, but that's pretty much it. The the problem with synthetic data generation is about the diversity because it's very easy to just collapse into something that you think is diverse because you have, like, a ton of different, seed data or a ton of different ways to modify it. But, actually, it's the same data generation process. And because it's the same data generation process, it will limit the diversity.

[29:18] Maxime Labonne:
So to me, this is where synthetic data generation falls short. It's really in this idea that it's very difficult to reinject some diversity. There are some techniques that are super interesting. For example, there's persona sampling where you give the model a character to role play as. For example, you are like a truck driver and you're from this country and you have, like, this background, blah blah blah, and the model will then act as this truck driver.

[29:50] Maxime Labonne:
This is very surprising. This is, work from Alien AI, for example, that discovered that it works even in math. You know? Like, your truck driver, role play actually improves math performance. So there are these techniques to reinject, some diversity, but still, you probably want to have if you want to cover some domain like math, you probably want some various

[30:15] Maxime Labonne:
many different data generation processes to be able to have a good coverage of it and not just one sampling technique that might just be restricted. And this role play technique has been

[30:29] Conor Bronsdon:
used in quite a bit of prompting that you'll see as well. If you see someone's prompt that they're posting on LinkedIn, this is so great. Uh-huh. I mean, this is particularly true a few months ago, but often you'd see them say, Oh, let me start by defining who you are. You're like, you are this incredible mathematician who can do every problem I need you to do. So we'll see people do that even just in their individual prompts that they're leveraging.

[30:54] Conor Bronsdon:
So it makes total sense that we're seeing that be successful within synthetic data generation and, and the diversity of the dataset as well. I'm curious to dive into a bit more of where you've seen challenges in this new generation of LFM models. As you were doing post training, what has been the hardest problems to solve? You mentioned diversity. Are there particular areas? That's

[31:22] Maxime Labonne:
a very good question. I don't think, like, we had, like, one major problem that we we really spend a lot of time solving. Because we had this experience with the first generation, I think it was a bit smoother this time. But there there's definitely, like, some issues around, for example, function calling. Function calling is really difficult to nail down. So I use function calling in tool use indifferently. To me, it means pretty much the same thing.

[31:55] Maxime Labonne:
And for this, you need to have, like, a specific formatting, right, in your chat template, and you also need to have data that is extremely diverse, even more than for math, for example. We realized that function calling data is quite difficult to generate at a level of quality and quantity that we we want to have. And here, the solution is pretty much the same. It's really about

[32:24] Maxime Labonne:
reading the data, understanding when you have benchmarks where this fails, trying to solve it with new data. And this is very the iterative loop, the cycle that you have in post training. When you start with the data, you have to spend, like, most of your time on the data, honestly. Then you go through training, you get the models, you get some evaluation, some feedback, and then you can use that to go back to,

[32:49] Maxime Labonne:
the dataset and see how it's possible to improve the results. And, usually, what people don't do enough, in my experience, is reading the responses from the models and reading the samples of of training data. There's no shortcut, from what I could see. You can try to, ask JGPT to help you with that, but it's not going to be so good. Yeah. So you really need to do this manual groundwork of reading the data and understanding, okay, like, why what is the problem here? Why did the model

[33:27] Maxime Labonne:
what was it the wrong answer? Or even, like, why is it the right answer? What makes it easy for the model to be able to succeed here when it was not able to do it for the other prompt? There's not a lot of magic here. It's it's a ton of groundwork and understanding of the data quality and and complexity.

[33:48] Conor Bronsdon:
I resonate with that because I think it's a very common problem in software engineering where everyone loves to submit a PR and get their PR merged, but, they may not want to spend a ton of time reviewing others' PRs. They may not want to spend a ton of time editing others' PRs. And that might be the most important part of the entire software development life cycle. And it's certainly something where I think all of us need to iterate and provide feedback, whether it's in our day to day lives, whether it's in our software we're building and clearly in our LLM

[34:20] Conor Bronsdon:
driven architectures. I love that you explicitly mentioned something that sounds like you're going to carry forward to your learnings for, I presume the future LFM three, which is, look, we need to approach function calling this way. We're going to learn from this experience. How, what, what did you learn from LFM one? What learnings are you now taking from LFM two,

[34:46] Conor Bronsdon:
you see fueling the future?

[34:48] Maxime Labonne:
Yeah, it's interesting. I think there's all the architecture work that is really important, like, besides post training. Right? I think there's a ton of inference work and compatibility with the open source community that we got a lot better at, and we want to double down on that. Because if you create your own architecture, the problem is that it's not compatible with anything.

[35:14] Maxime Labonne:
So you have to reimplement it from scratch in every library that you want to use. Hugging Face transformers, it can be VLLM, it can be SGLANG, LAMA CPP, etcetera, etcetera. So this part was a big lesson, I think, with the first generation of models. The second generation, it was still very costly to do it. So with the third generation, I hope that we can find operators that are maybe a bit more friendly

[35:44] Maxime Labonne:
to to implement. And I think honestly, this might be one of the biggest lessons that we've had with the LFM series. But about burst tuning in particular, I think there's been a lot of lessons about data quality, but also going from copying training techniques that we see from papers and from, like, other people into making our own in house that are customized for our models

[36:18] Maxime Labonne:
and also small models. I think this is really nice as a scientific topic because everybody is kinda focused on the big models, on the Kimi k two and Deepsikar ones of the world. And I think this is very interesting. Right? There's no issue with that at all. It's just that I think there's not a lot of interest in general for these small models, but they are truly interesting

[36:43] Maxime Labonne:
because you have a lot more complexity that you need to work around. Something that we found is, like, knowledge is really really depends on the number of parameters, and you can try to squeeze as much knowledge, for example, with knowledge distillation into one b model. It will never be as smart as a three b model. That's a fact. And you need to find fixes around it so it doesn't completely hallucinate

[37:10] Maxime Labonne:
when the user ask a simple question. So this is one of the core lessons that I think will will drive forward with the next generations is this expertise in working with small models, knowing their capabilities, but also their pitfalls, and you need to play on their strength, basically. I think that's the main lesson here. I think that's a fantastic

[37:38] Conor Bronsdon:
point. And too often, we see AI engineers come in and say, oh, I want use the most powerful model. And actually it's finding the right model for your use case and understanding your constraints. Maybe your constraint is it needs to be on an edge device. And so maybe you can only run that 1B model there. Or maybe you're actually better off with, you know, we mentioned multi agent systems briefly earlier. Maybe you have three 1B

[38:05] Conor Bronsdon:
model, agents that are working together and maybe that outperforms a more generalized agent. There there's a variety of considerations here. And I'm sure that an another one is occurring when you are doing your post training here, as you're doing your evaluations of success, as you're understanding, essentially what what you want to deliver with each of these models,

[38:30] Conor Bronsdon:
it sounds like you're having to customize quite a bit from the more generalized benchmarks that we're seeing. Can you talk a bit about that process as, obviously, it's something that's near and dear to our hearts as well? Yes. So in terms of benchmarks, we have our own stack internally

[38:47] Maxime Labonne:
because there are a lot of really good open source evaluations, but you want your own to be able to be a bit more narrow in the skills that you focus. For example, it can be a specific type of function calling that you want to be really good at because you think it's it's very important for small models to be good at that. An example of this is web search. We think that because you don't have the knowledge that is needed to ace some questions,

[39:17] Maxime Labonne:
it's fine. You can give a web search function to your one b model, and that might allow it to just Google it, Google the the the the the topic of the question and retrieve the right, answer. So this is the kind of evaluation that we want to design and focus on. And, also, we try to repurpose evaluations for frontier models that are a bit older, because now that's you know, there is the frontier models,

[39:48] Maxime Labonne:
that create all these fancy evaluations, but it doesn't make sense to evaluate a one b model on it. It would get, like, 1% at best. But

[39:58] Conor Bronsdon:
the funny thing is that they age Have you had any liquid models played a werewolf game or anything? I am curious. Not not yet. Not yet. Okay. Okay. Sorry. I'm derailing, but I was just like, wait a second. I gotta know. But this is interesting what she's mentioning because

[40:11] Maxime Labonne:
I think that now small models might be able to do a better job at this kind of stuff. Because in the past, they were really bad at multisten conversations, for example. They were really bad at long context length. But as we progress, it's not only the frontier models that progress. I think there's even more progress in terms of small models. Like, they're catching up also quite fast. They're still going to be,

[40:35] Maxime Labonne:
disabled by some, elements like knowledge that I mentioned. Sure. Still, we can do a lot with them. And for example, if you have a benchmark where you need knowledge, then fine. Like simple QA, for example, from OpenAI. Fine. You can give it tool use, and then it becomes a kind of tool use benchmark instead of a pure knowledge benchmark. And I know a lot of what has fueled this, as you alluded to, is open source evaluations,

[41:00] Conor Bronsdon:
open source models, open source techniques coming, I I mean, out of DeepSeek and and many others that you're learning from, you're customizing, you're applying to your own, viewpoints. You've obviously also been an incredible open source contributor, whether it's your LLM course, which, as I mentioned, 39,000 plus stars on GitHub. Maybe by the time this comes out, it'll be 40,000 plus. I don't know.

[41:24] Conor Bronsdon:
You've worked on open source models, tools, courses, books. What drives your commitment to open source, and how do you see it as integral to

[41:34] Maxime Labonne:
the AI and machine learning community? Okay. First, let me fact check here. It actually has 66,000 stars. Oh,

[41:42] Conor Bronsdon:
man. So this is actually funny because this is what I get for relying on an AI model to help me draft something. So clearly I clearly pulled something. I didn't I didn't fact check the number of stars. So this is I wish probably leave this in actually. So you've released an incredible amount of open source work. You've got 66,000 stars on GitHub with your LLM course,

[42:04] Conor Bronsdon:
as you've done so many other things though as well, whether it's models, tools, courses, books, what drives your commitment to open source and how do you see it impacting

[42:15] Maxime Labonne:
AI and machine learning going forward? Yeah. So open source for me is quite selfish, honestly. It's mostly me trying to learn stuff online, and then it gets picked up a lot. Mhmm. But I think it's it's actually quite important to do it for yourself and not for other people because, well, this is your own interest. Right? And this is an incredible way of learning new things.

[42:40] Maxime Labonne:
Whether you write articles about a topic that interests you, I think this is the most powerful way of really learning about it, especially if it's technical article where you also provide code, you also provide a notebook. There's a ton of times where I was sure I knew something, and then you write about it, and you're like, actually, I need to double check the implementation.

[43:00] Maxime Labonne:
It can be your own thing as well. Like, sometimes I have to reread my own articles to make sure I I still understand something.

[43:09] Conor Bronsdon:
No. I I feel like until I write or build something, I truly don't understand a topic. And if I write about it, I will very quickly understand where I don't understand things. I'll be like, oh god, these five areas, I I gotta go dive deeper here. I totally get it. Exactly. And this is the beauty with open

[43:25] Maxime Labonne:
source work is that because you know that they're going to be a ton of strangers looking at your work, you really do not want to mess it up, so

[43:36] Conor Bronsdon:
you double down. You're making me sweat over here, Maxine.

[43:39] Maxime Labonne:
You double down on it, but to be fair, like people are generally nice online, except in some communities, but usually, they're they're really nice online. Maybe not on Twitter. Yeah. Twitter is not the worst, honestly, I have to say. But, yeah, this is a good way to learn, and it's the same with datasets and models and tools that I've made. I did them for myself, honestly. Like, the tools, I made them because I needed them. I needed a nice way to run benchmarks because it was too complicated back then.

[44:10] Maxime Labonne:
I needed a way to merge models automatically because I was writing an article about it, and I wanted to run ablations to understand better. Okay. Like, do I need to use this merging technique, this merging technique, etcetera? And it's funny because then it was used in scientific articles. Like, some authors reached out to me and said, like, oh, thank you. Like, actually, we run

[44:33] Maxime Labonne:
our experiments with your tool, and this is this is beautiful, like, because you you started doing it for yourself, and then, yeah, maybe if there were better options, they would have used another one, but they used mine, so this is cool. I I think that's fantastic.

[44:46] Conor Bronsdon:
Is there a particular open source project that you're most proud of?

[44:50] Maxime Labonne:
I tend not to maintain them very well, to be honest with you, but I would say That's a lot of work. Yeah. Exactly. The LLM course is, like, what I'm proud of because it it's more popular than me, honestly. People tell me, oh, you're the guy from the LLM course. Okay. I know now. And there's also the LLM datasets for people who are interested in fine tuning. It's another repo

[45:16] Maxime Labonne:
that has a lot of datasets for different stuff. It can be for math, for function calling, and, even preferences. So this one, I think, is truly useful,

[45:29] Conor Bronsdon:
and I I will try to update it soon now that I've talked about it. Yeah. We're we're holding you accountable. I'm making you talk about it on the podcast, but we'll definitely link it too in this, the show notes, so be careful here. But as we wrap up, I want to look forward a bit. Obviously you have this unique vantage point. You're building next generation architectures.

[45:49] Conor Bronsdon:
You're working across the full stack in many ways, and you're deeply plugged into the open source community. What emerging trends in AI are you most excited about or most concerned about? I

[46:03] Maxime Labonne:
think something that excites me right now, and I've started thinking about it more and more is how we're going to build operating systems in the future, operating systems for phones, tablets, wearables, laptops, whatever, because you probably cannot do it without AI anymore. And it feels like these tools are becoming so useful that either you do it on an application level and each application

[46:36] Maxime Labonne:
downloads and loads its own LLM or has API call, but it's it's not always possible. Right? Latency to your point, cost. Yeah. Exactly. And online connectivity. If, for example, if you do it in a car, well, like, goodbye. It's not going to be very useful most of the time. So there's a way that we might have a foundation layer of AI models. It can be LLM, but it can also be other things that you provide to developers.

[47:08] Maxime Labonne:
So they can build their applications with this, knowing that they always have this model that they can call on a OS level. And, yeah, I think this is really, really exciting because it will deeply shape the way that we create apps. And I hope that in the future, the apps get a bit smarter and, you know, you don't have to fiddle with your settings. And, yeah, you just have better interaction

[47:35] Maxime Labonne:
with these systems.

[47:36] Conor Bronsdon:
I think that could be really, really cool. Maxime, this has been an incredible conversation. I've had a ton of fun. Thank you so much for diving deep with me. Two questions to close us out. One, where should listeners go to to follow your work?

[47:50] Maxime Labonne:
You've got a lot of places where they can find you. Yeah. You can search Maxim Laban on LinkedIn or Twitter. This is where I am most of the time. Fantastic.

[47:59] Conor Bronsdon:
And I'll recommend your GitHub website as well. I will link all of those in the show notes, but you've got a ton of great links and some blogs in there. The final question I'd love to ask you is what you're seeing in the future. So you've alluded to a few things like, oh, you know, obviously, you believe in this idea of models on the edge, of changing how we interact with them at times.

[48:22] Conor Bronsdon:
But what about new architectures? What about, you know, new tools that are coming out? What are you seeing in the coming years or or months that you expect to change how we all think about AI and build with it? In terms of architectures,

[48:37] Maxime Labonne:
it's it's sad to say, but it's always a question of trade offs. So having a new revolutionary architecture like the transformer seems to be unlikely to me, honestly. But there's a lot of interesting work with tiny recursive models and that kind of stuff for reasoning. And I like these models because they they correlate knowledge from reasoning, and I think this is a powerful

[49:05] Maxime Labonne:
paradigm. The problem is I would like to integrate them into LLMs. So there's been this paper by Francois Laurier called the free transformer that tries to integrate reasoning, that this reasoning engine into an LLM. And I think that could be, in terms of architecture, something that we see more and more because it's not just about building blocks. It's like another idea of kind of offloading the reasoning parts of LLMs into another component,

[49:36] Maxime Labonne:
and that

[49:37] Conor Bronsdon:
might be a huge change if it's successful. It's interesting you you say that because I I I don't know if I believe there's gonna be a short term change around transform architecture. I I agree with you. The next couple of years, I think we are where we are. There's gonna be, you know, adaptions, hybridizations. Obviously, we're seeing that already. Liquid and many others.

[49:56] Conor Bronsdon:
Well, actually not many others. Y'all are really at the bleeding edge here. But I I do think long term, we're gonna see a change from the current transformer architecture. But I I I think it's a question of how far out is that? Is that ten years? Is that five years? Is that is that forty? And we've seen this happen throughout AI's history. Right? Because really it goes back to, I mean, back to the fifties and elsewhere where we've, we've had these continual

[50:20] Conor Bronsdon:
moves forward. So I don't know. I, I, I look at how humans learn and how LMs learn, it's still so different that it feels like we are just scratching the surface of the possibilities to me. But maybe maybe I'm a little too out of my sci fi here.

[50:39] Maxime Labonne:
No. Think I I agree. Yeah. Like, there's definitely better training algorithm that we haven't found yet, and that might be also

[50:47] Conor Bronsdon:
even more important than changing the architecture really. Well, like RL, for example, right, where it's like we we basically go and say, like, the the model succeeded on this route. Let's upvote everything it did in this route regardless of whether all the steps to get there actually worked as well as they ought to. And I mean, we see I look at like, I don't know, silly videos of, of kids where they, they all have to reinvent crawling from first principles in order to

[51:14] Conor Bronsdon:
begin to move about the world. And sometimes they do it a little differently. Some people roll, some people are kind of scrabbling around before they, they start crawling. So yeah, I don't know. There's just such an interesting open opportunity here around learning. And then I will say it makes this field so fascinating because not only are we learning all the time,

[51:33] Conor Bronsdon:
but we're basically trying to teach machines to learn, right? So, it's, it's, it's a lot of fun and, Maxime, I'll say I've, I've learned a lot just talking to you here. So thank you so much for coming on the show. It's been a distinct pleasure. We will, we will link, your LinkedIn and everything else, in the show notes, your, we'll definitely link your Hugging Face profile as well. I know you've got quite a few things on there.

[51:54] Conor Bronsdon:
And of course, the LM Engineer's Handbook. Listeners, if you enjoy this conversation, deliver it, and then hopefully deliver the technical depth you're looking for, let us know. Please drop a comment on LinkedIn when you see these this posted or Spotify telling us what resonated or YouTube. And if there are other deep dives that you wanna see on Chain of Thought, we are listening. Maxime, we're gonna have to have you back some time too because this was a ton of fun. Thanks so much for joining Thanks a lot.

Chain of Thought | AI Agents, Infrastructure & Engineering

More episodes

Chapters

Show Notes

Creators and Guests

What is Chain of Thought | AI Agents, Infrastructure & Engineering?