Chain of Thought | AI Agents, Infrastructure & Engineering

Loïc Houssier, VP of Engineering at Superhuman (the email client Grammarly acquired for $825M in July 2025), explains how his team retrofitted AI features into a product whose entire brand promise is sub-100ms speed. He breaks down the model-routing strategy, the eval framework his PMs own, and why his team auto-drafts every reply but refuses to auto-send any of them.

Show Notes

Loïc Houssier leads engineering at Superhuman, the email client Grammarly acquired for ~$825 million in July 2025. Before Superhuman he was CTO of OpenTrust (acquired by DocuSign), ran engineering at ProductBoard, and started his career in applied cryptography for France's defense industry, including work on nuclear submarine systems. Loïc joined Superhuman in early 2024 and within 30 days was leading a six-week sprint to ship AI Inbox.

Superhuman's brand is built on speed: every interaction under 100 milliseconds. LLMs do not run in 100 milliseconds. So Loïc walks Conor through how his team retrofitted AI into a product that was already winning without it: pre-caching context for the mobile voice feature, starting every feature on the smartest available model and only then fine-tuning down to cheap dedicated infrastructure, treating "look foolish" as a P0 bug class, and refusing to auto-send any email even when their agents could.

This is a practitioner's tour of what it actually takes to put AI on top of a product that has to stay fast, stay quiet, and never embarrass the user.

We cover:
  • The model-routing strategy: Opus and frontier models to prove a feature, then fine-tuned BERT classifiers on dedicated inference
  • Pre-caching voice and tone context separately from dictation to keep the mobile voice feature feeling fast
  • Why eval engineering at Superhuman is owned by PMs, and how a single "how much time did I spend in Waymo last month" query exposes the eigenvectors a feature has to cover
  • Why "look foolish" is a P0 bug class, and where the boundary between agent agency and agent laziness actually sits
  • How Superhuman's pod structure (PM, tech lead, designer) and a central AI platform team support aligned autonomy
  • Hiring for AI fluency: how interview questions are changing and what self-augmenting engineers look like
  • Pattern detection as the leadership skill that transfers from nuclear submarines to AI email
Chapters:
(00:00) Cold open: pattern detection beats new tools
 (00:18) Loïc's path: cryptography, OpenTrust, ProductBoard, Superhuman
 (02:13) Retrofitting AI into a 100ms product
 (04:08) Voice on mobile: pre-caching LLM context to keep the feel fast
 (07:46) Frontier first, then fine-tune: model strategy across features
 (11:04) The "double-dipping" trick that worked on GPT-4 and stopped working
 (12:25) Cognitive load and staying current as a leader
 (16:59) Balancing YC founder urgency with peer CTO grounding
 (19:28) Pods, AI Guild, and aligned autonomy
 (23:15) Managing models vs. managing people: delegation in reverse
 (28:27) The Waymo example: eigenvectors of evaluation
 (32:15) Day 30 onboarding: leading the AI Inbox sprint
 (35:04) Why email is the killer agent use case
 (38:51) Auto-draft, never auto-send
 (39:57) Agent agency vs. agent laziness
 (43:07) Hiring for AI fluency
 (45:55) Pattern detection is the leadership skill
 (47:21) Nuclear submarines as engineering reference points
 (48:37) Closing thoughts
 (49:38) Superhuman is hiring

Connect with Loïc:
Connect with Conor:
More episodes: https://chainofthought.show
Thanks to Galileo — download their free 165-page guide to mastering multi-agent systems at galileo.ai/mastering-multi-agent-systems

What is Chain of Thought | AI Agents, Infrastructure & Engineering?

AI is reshaping infrastructure, strategy, and entire industries. Host Conor Bronsdon talks to the engineers, founders, and researchers building breakthrough AI systems about what it actually takes to ship AI in production, where the opportunities lie, and how leaders should think about the strategic bets ahead.

Chain of Thought translates technical depth into actionable insights for builders and decision-makers. New episodes weekly.

Conor Bronsdon is an angel investor in AI and dev tools, Technical Ecosystem Lead at Modular, and previously led growth at AI startups Galileo and LinearB.

Disclaimer: All views, opinions and statements expressed on this account are solely my own and are made in my personal capacity. They do not reflect, and should not be construed as reflecting, the views, positions, or policies of Modular. This account is not affiliated with, authorized by, or endorsed by Modular in any way.

[0:00] Conor Bronsdon:
What I know for a fact is that the one thing that is helping me a lot is pattern detection. At the end of the day, it's like a new set of problems, but the way to manage those problems are usually through people. Harness your management skills

[0:18] Conor Bronsdon:
Welcome back to Chain of Thought, everyone. I'm your host, Connor Bronston. My guest today is Loic Gousier. Loic is the CTO of Superhuman Mail. Many of you might be using it to send your emails. And he started his career doing applied cryptography and security research in France's defense industry, including work on nuclear submarine systems, which I might have to ask him about if he's allowed to talk about it, because honestly, that sounds fascinating. He previously was CTO of Open Trust, which was acquired by DocuSign, ran engineering at Product Board, and now leads the engineering org at Superhuman, which Grammarly acquired for around $825 million last July. What I wanted to talk to Luik about is the architectural reality of retrofitting AI into a product that was already successful without it. Superhuman built its reputation on speed, every interaction under 100 milliseconds, and quality. Then LLMs showed up, and the team had to figure out how to bolt on AI features without breaking the things that made the product special in the first place. Loic, welcome to Chain of Thought. Where are you joining us from today?

[1:19] Loic Houssier:
I'm from San Francisco. Thanks for having me, Connor.

[1:22] Conor Bronsdon:
It is absolutely my pleasure. I'm excited to talk about what happens architecturally and organizationally when you retrofit AI into an already successful product, because it's a really fascinating discussion. It's very relevant to a lot of engineers and product leaders today. But before we dive in, I want to give thanks to our presenting sponsors, Galileo, I believe I may have said this last episode too, but I believe it's going to be the last episode that Galileo sponsors with us after their long running commitment. I'm very thankful to them for being the founding sponsor of the show. You can check them out at Galileo.ai and they build evaluation tools that help teams ship reliable applications and were recently acquired by Cisco. So congrats to them. Luik, let's dive into the meat of our discussion. Superhuman's entire brand promise is speed. Every interaction under a hundred milliseconds, as I said, then you now need to integrate LLM features. How have you solved this tension architecturally?

[2:13] Loic Houssier:
Did we? I don't know if we did. Let's be like, set

[2:18] Conor Bronsdon:
I love the

[2:18] Loic Houssier:
the

[2:18] Conor Bronsdon:
honesty,

[2:18] Loic Houssier:
baseline.

[2:19] Conor Bronsdon:
honestly. Yeah.

[2:20] Loic Houssier:
No, but we're living in that constraint. We are living with that DNA in us. And here I'm sitting on the shoulders of the giants that built the company before me. I joined Superhuman a year and a half ago. And a lot of work has been done on the architecture of the product to bring that speed. And the speed is coming with some heavy architectural decisions. 100 milliseconds, you cannot rely on network for everything that you do, which means that everything needs to work offline. by default, which means that the business logic should work on your desktop, should work on your phone, so there's a lot of replication, and then you need to have all those asynchronous jobs to sync everything as soon as you're online, so that it feels good. The good piece is that email is asynchronous by design, so it's sort of like okay. But still, that was a huge endeavor for the teams initially to build that architecture. Think about search. You're on the plane, you want to search your email. How do you make it work without a network? It means that we need to have the data closer to you on device so that you can do those searches so um lots of lots have been said and done on this architecture before uh and and then ai and then ai and um changing completely the way we think about things because those android milliseconds how do you make an LLM, like the inference, I would say run at like under 100 milliseconds, doesn't work. It doesn't work. So we need to adapt. We need to adapt our stance. We need to understand that there are some limitations. That doesn't mean that we should not work on making it faster. A good example is like we just released our voice feature on mobile a couple of days ago. and we were not satisfied with the speed of the interaction because everyone feels fast when you use superhuman. If the transcription of what you said takes time and then is sent to an LLM for like the voice and tone and all of that, takes too much time, you won't use the feature. and it will feel so differently than the rest of the product that you need to work on it. So we had to make some changes. We had to think about pre-caching part of the LLM call to the transcription, providing the context, I would say, for your voice and tone independently than your dictation, I would say, pieces, for example. So instead of having one call with a huge context, we pre- I would say load the context as much as we can and boom, then we send just like a payload of the transcription. That's one example of things that we have to do before launching a feature because just making it work is not enough for our users and for us.

[5:05] Conor Bronsdon:
I love this example. It's really interesting and I'd love to understand a bit more about it. Are you able to share what you're using under the hood for transcription and more about how that process works? I think these kind of architectural details are fascinating when you consider how to approach product integration with LLMs.

[5:22] Loic Houssier:
So that was one piece that is like the speech to text and like getting your intent and like the pure dictation. What are you saying? Which is surprisingly the easiest thing that like so many ways to do it. You can even do it on device now with like some really good results. So that's not really what is hard. What is hard is to transform your dictation into something that makes sense and that the outcome of it becomes like a well-written email. We expect our users to say, oh, reply to Connor that, yes, I'm ready to come to this podcast and I would love to chat about him. Not sure about X and Y. Oh, no, in fact, I'm sure about X and just like giving some slots about my availability. This is the way you expect people to speak to their phone,

[6:11] Conor Bronsdon:
Right, I probably don't

[6:12] Loic Houssier:
to

[6:12] Conor Bronsdon:
have the whole

[6:12] Loic Houssier:
say

[6:12] Conor Bronsdon:
thing

[6:12] Loic Houssier:
an email.

[6:13] Conor Bronsdon:
written exactly in my head. I'm gonna edit

[6:14] Loic Houssier:
Exactly

[6:14] Conor Bronsdon:
as I

[6:14] Loic Houssier:
not.

[6:14] Conor Bronsdon:
talk.

[6:15] Loic Houssier:
Yeah, exactly. And when you think about this, so there's one, trying to understand what has been dictated. From that dictation, what is the intent? And sometimes it's less because you speak about something and then you're like, no, no, no, no, in fact, not that. And you can correct. So you need to have this intent being really well framed. And that intent being transformed into a draft email, and that draft email, understand the context of the thread, understand the context of how you voice and tone with that specific person. Because maybe you're answering your mom, maybe you're answering your boss. And the way you talk to your mom, and the way you talk to your boss, different people. Slightly. All of that needs to be a set of small breaks, but a lot of that can be pre-fetched, prepared for the LLM to be just waiting for the dictation. All of that is happening behind the scenes, so we did a lot of caching. which sometimes has a cost because maybe you hit the voice button, so we sent all the context already, but we're like, no, in fact not, I will write it for whatever

[7:22] Conor Bronsdon:
Are

[7:22] Loic Houssier:
reason. So the impact for us is like, yeah, maybe more cost. Maybe more cost to build this feature, but at the end of the day, the quality of the outcome is better than for our users. I mean, it's a no-brainer. So for us, it's a no-brainer.

[7:39] Conor Bronsdon:
you using a single model under the hood for this or is it one model for transcription and another that's kind of doing the rewriting? How do you have this set up?

[7:46] Loic Houssier:
So we have models that are dedicated to like every single use cases. So it's not like one to rule them all. We usually do that initially to understand how the feature can work potentially.

[7:56] Conor Bronsdon:
Eh.

[7:57] Loic Houssier:
So we always go with one expensive model, like the best of the models. Let's try if the feature can even work.

[8:05] Loic Houssier:
But that's part of the process of the development. We, I would say, iterate with that. I would say trying to find how it feels. And then we have different models to do things. So, yeah, pretty, pretty dedicated. Technically, I think we have some, ranging from some fine-tuned models that are just one specific use case to some not so smart models, but cheap, but they do one thing that is really good. And then some stuff like the agentic framework, for example, that we're using, we're using the latest version of Opus to do this. So depending on the use case, and usually we start by expensive, the best. And then if the feature is successful, how can we optimize? How can we drive the cost down? A good example being, so we have this auto label features. So, identifying if an email is an FYI, identifying if an email is a pitch, identifying if an email is from one of your partners, or if it's a newsletter, whatever. Classification problem. At first, let's use, at the time I think it was a 4.0 model, and doing the classification. Expensive, because we run it on every single email that you have. But then we start to understand what are the right labels that make sense, and for those 10, 12 labels that we know would be useful for our end-users, then you can, I would say, fine-tune dedicated BERT models and maybe running on inference into dedicated infrastructure that are just doing cheap inference at scale. So this is usually our mental model. Try with expensive, make it work, useful for the end-users, try to make it efficient.

[9:53] Conor Bronsdon:
I like this approach where it's like, okay, we can use a frontier model, a closed source model to nail in the draft of the feature. Let's get a demo ready. Let's try it out. And then, okay, how can we fine tune, I presume open model to then make

[10:06] Loic Houssier:
Yes.

[10:07] Conor Bronsdon:
this much more efficient for us cost wise. And it sounds like

[10:11] Loic Houssier:
Yeah.

[10:11] Conor Bronsdon:
potentially from a caching and latency perspective as well.

[10:15] Loic Houssier:
Yeah, totally. And yes, you mentioned open source models. Yes, at some point when our autolabels, like we knew which labels, it was just like a typical BERT classifier type of models. Let's do it with an open model and fine tune it a bit, put it on a cheap infrastructure, dedicated, so like not the typical infrastructure as a service providers or like those like LLM providers, but like dedicated So people want to know a bit cheaper to do the work. And it's good enough for those use cases. But we always start with like the most, not most expensive, that's the wrong way to put it, but like the smartest models to get like the best quality so that we can really try the feature. And if it's working with very high quality, then we can start optimizing the cost.

[10:59] Conor Bronsdon:
How do we then engineer what this smart model has already done for us, essentially?

[11:03] Loic Houssier:
Yeah, exactly.

[11:04] Conor Bronsdon:
And my understanding is that you often use double dipping instructions where you repeat key guidelines in both the system prompt and final user message to ensure that context gets carried through. Is that still accurate in how you're designing?

[11:17] Loic Houssier:
It's not still. It was used a lot when we were using especially the 4.0 models. 4.0 models, for whatever reason, reacted well with this approach. The base assumption was that the weight or how the model is parsing your context to provide the answer was overly weighting the beginning and the end of the message. So everything that was in between was sort of like lower what was awaited, so less meaningful for the LLMs. So if you wanted to counterbalance that, you were just making this double context and double request. And so everything that was in the middle becomes higher up and higher down. And we were seeing better results. But now it's a matter of where you put the instruction. Sometimes we're putting this instruction at the end. So there's some stuff like this that can help us bring better outcomes. Repeating the message was surprisingly good for 4 models. It doesn't work as much on newer models.

[12:25] Conor Bronsdon:
I love that you're iterating architecturally because I think it's easy if you're not careful to get stuck in a certain mindset of like, okay, this works for us in this last model series, but the model development is moving so fast, the available tools and techniques are moving so fast. How are you staying up to date and driving this iteration to ensure your team is actually continually optimizing?

[12:46] Loic Houssier:
Interestingly, I don't know if we do well there, but we had a session today with our engineers and talking about the cognitive load and how this is freaking hard to stay up to date, to not feel like a fraud, to have Twitter, sorry, X, open all day and seeing all those guys that are not touching the code for the last seven months. And one of my engineers said, I put hundreds of codes written in my hand this morning. How should I feel because of that guy? So the session was all about this introspection and how no one knows. No one knows. This is a paradigm shift. No one has the answer. There are some companies that are talking better about it, but I think that every single individual engineer is facing these moments of doubts. So to answer your questions,

[13:36] Loic Houssier:
pretty much all the team is always on Twitter. We complained about it. You don't know if half of it is true. but you're still detecting patterns. You're trying to understand, ooh, yeah, that specific GitHub repo is making some noise. And more people are showing use cases of these ways to build tools and skills and all of that. Oh, that setup seems to work in this. So we share ideas, we share stuff. And we have an internal guild, an AI guild, the same way you could expect to have a front-end guild, a back-end guild. We have an AI guild that is talking on a bi-weekly basis about everything they learned, like their pitfalls, things that were working, things that were not working, but that are working now with the new model, and things like this. But it's freaking hard. That's freaking hard. For me, I'm not in the code day to day, and I'm supporting the organization. This is hard and you see like example of companies, they move super fast and you're like, damn, do I move that fast? How can I bring that sense of urgency to the team? But they already have too much of that already. So how can I reduce the cognitive load so that they can focus on things? But they should work in parallel because that's the new standard. What about my factory? and all of that. Surrounded by questions and trying to focus one step at a time is probably the answer. But for sure, maybe to answer your question, we never take anything for granted. We know that every two weeks or three weeks there's a new thing that is coming. that might change the way we build, that might change the way we implement a feature, that might change the provider that we're using, because even the providers are changing super fast.

[15:25] Loic Houssier:
Trying to cope in a world where you need to be nimble and ingesting information is crazy hard. So I don't know. I don't know if I'm doing anything well there.

[15:35] Conor Bronsdon:
And I resonate so deeply with these emotions because I get to talk to incredible leaders like yourself on this show once a week. And every time I'm going, Oh God, okay. Am I, am I doing this right? Like, Oh wait, there's a new thing. I haven't done that yet. I got to think about it. And then to your point, there's just this constant stream, the information on the website formerly known as Twitter acts. And it is, I mean, half of it's BS totally, but then there are incredible innovations happening. There's amazing stuff. being done. I have a friend who just showed me his agent factory the other day and I'm seeing how he set it up and I'm going oh god like I'm so behind like a month ago I was feeling really good and now I'm like oh I need to improve. And it just it feels constant and it's hard not to emotionally feel that but I do try to remind myself that We're all on this learning curve together and no one has that much more experience in the current model development. Maybe they have a couple months more, maybe they have a year's more, but things are changing so fast that they like, if you lean in and you want to learn and you're excited about it, there is such, such an incredible opportunity. And honestly, it's one of my favorite things about doing this show is I get to talk to folks like yourself and just. have a conversation because I think it can be like really kind of scary almost if you, if you don't have those conversations.

[16:59] Loic Houssier:
Yeah, to double down on this, one thing that I love to do to have the balance is I'm deeply involved in the startup world and YC companies and all that. And I try to, for one, try to invest from time to time. But two, especially listening to those people that are founding companies, because they are seeing something. They have detected one pattern, two patterns, three patterns. they are dedicating something and they believe there's a space for them to be like the next anthropic or whatever. And you see those people and age can range from like teenagers almost to like more like gray hair people like me, but they share the same threat. They just don't care about what was existing before. They don't live in legacy. They just live in this new world because they are building from scratch and they build with that mindset in mind. And this is highly refreshing. It's creating a sense of urgency because every time I discuss with those people, I'm like, oh damn, I'm so cooked as a leader. They're moving so fast. They're thinking like so differently and everything. So I'm self-creating the sense of, as you said, I'm so behind. I need to catch up. I need to catch up. I need to catch up. But I balance that with talking with more people like me. and this time I'm not talking about gray hair, but much more like, okay, you're a CTO leader, you know, like in a tech space, your company has a success or like a relative success, and now you need to manage customers, you need to manage a code base, you need to manage some SLAs, you have some compliancy, and so you're facing, and you have like an existing team, And like, oh damn, it's not only two people anymore. So like, you cannot just do things the same way. And discussing with those people that are facing the same problems that I face is somewhat reassuring. Like, okay, I'm not the only one. I'm not the only one trying to catch up here. But having that balance, for one, those startupers, like very YC founders, like they don't care, they're already there. Like, okay, let me like try to piggyback. Come, talk to my team. Oh, I love your solution. Maybe not for us today, but maybe it will be like in three months. And this is also creating for my team a sense of what's happening on the market. Beside the semi like PS that's happening on X, those are real people building real stuff with some real use case that my team can be inspired of or at least understand that this is where this is going, which is so much better than just me talking and trying to be like the wise guy in the room.

[19:28] Conor Bronsdon:
Luke, I love that perspective because I think it is such a balance. And I'll say if you do find the next Anthropic, please call me. I'm happy to co-invest with you. You can just give me a call. I'll send you my number after this session's over. Let's refocus a bit on what you're doing with all that information. You're driving a lot of insights both from people on the bleeding edge, folks who are trying to work through this at scaled orgs. How have you adapted your team structures, if at all, to, you know, chase this new reality of software engineering?

[20:02] Loic Houssier:
I think we have a relatively standard organization. We're pretty lean, so I think that I have something like 13 direct reports, both engineering and product. PMs are reporting to me right now because we have a founder that is very product-centric, so I'm not claiming that I'm anywhere close to a CPO. but I think that I'm a good operator and working with a product-focused founder that works. So pretty lean organization, very pod-focused. So Acura Pod is led by a triad, PM, tech lead, and designer. So very, very typical. I have a platform team that is supporting the overall organization. My platform team is focusing on the core AI pieces. When I say core AI, it's like, you know, the router, like open router type of like internal implementation, like which LLM to use in what use case, how to flip from one LLM to the other LLM, how to plug your like evals in the system, working on the evals infrastructure so that the future teams that are not feature teams and they are focused on the experience. So I have a team dedicated to the calendar experience, I have a team dedicated to the mobile experience, I have a team dedicated to the triage experience, and a team dedicated to crafting email experience. So each of those are fairly autonomous, supported by a platform team that is working on the foundations of the AI pieces. Like, hey, you have an idea of what you want to build? You build it. We support from API standpoint, like prompt management standpoint, versioning and all of that, AVAS is associated to it, we don't care about that. You have like something there, there's a golden path, but each team is fairly independent. And on top, we have like a product-led growth organization that is helping. So to answer your question, I think the best move that we've done was one, to keep ownership of those teams, like you own it. This is like my EM on that world, he's a CTO. You own your partners, you own your providers, you own the shit. Same for the PMs. I expect them to know the YC companies that are working on this domain. Example being Mobile. You're working on Mobile? you need to know all the companies that are working on on-device models. Maybe it's not ready for us, but once it will be, you better know which company to work with. Maybe you have some POCs, maybe you have some incubation work. Your job to do it. So very pushing down at the team level this autonomy. So we call it like aligned autonomy. We call it like, anyone could would call it like aligned autonomy. So my job is to align the team and driving the autonomy this way. And just to finish, sorry Connor, I know I'm super verbose, but we have this AI Guild team that is like standardizing the way we use AI, working on the factory, identifying things that needs to change to be agentic first. And all of that seems to work okay.

[23:15] Conor Bronsdon:
Do you find that your experience aligning, I'll call them human models, people, helps you when it comes to getting our agentic models up to speed on what we need from them? I'm asking this somewhat jokingly, but I think it's important to acknowledge that a lot of what you're doing here is structurally similar, organizationally similar, to the work we need to do as engineers and architects to build great AI systems. You're creating autonomous nodes within your organization that will act as second, third, fourth brains for you around specific topics. You are spinning up sub-agents, so to speak, And I just want to remind everyone listening who has experienced leadership or has thought through these topics that this knowledge is applicable across these disciplines. And yes, there are clearly differences in how you manage a human and how you manage an AI. But the documentation work you do, the enablement work you do can be very similar in some cases. And it seems like you are taking the same iterative and I guess I would say specific approach Uh, in your product design where you're going, okay. You know, I have identified the best way to do this or the easiest way to do this with a frontier model. How do we then jump to what's the, what's on the bleeding edge? How do we then break this down into its component parts? And I'm curious to get your perspective. Do you feel like your. Leadership experience over the years has informed these decisions or am I conflating two things?

[24:48] Loic Houssier:
No, no, no. It's surprisingly thought-provoking, but now that I think about it, there's this delegation framework that we always think about, which is like, depending on the seniority of the person, at first you ask to do X, then you explain why you need X, and you ask them to do X because they better understand the context and why this makes sense. And then when the person gets more and more senior, you give them an explanation of the context and they might be able to understand what type of solution they need to bring. And the last space of delegation is you just give them the problem and they will figure it out. And the same approach comes when you think about it now, or you force me to think about it. The same approach, I would say, works backwards, interestingly, with models. Because we start with the smartest model, and we can be just dumb in the way that, hey, this is a problem. Okay. the smart model will figure this out, we'll get some great output, we can build on top of it. Great. But this is freaking expensive. How can we provide more context, more hardness in this Dumber model to get the same outcome and come back to moving backward from the cheapest model, but by providing way more hardness and way more context about things. This is an interesting point, Colonel. Thanks.

[26:16] Conor Bronsdon:
you

[27:01] Conor Bronsdon:
And I actually think it's something that's evolved for the last couple of years. So anyone who's followed me for a while knows that I, I've talked about this a bit where like, I don't know, call it two years ago. I was like, uh, you know, LLMs are interns for you right now. They're useful, but you have to really walk them through things. And that started to change. And now there are times where an LLM is more like a senior engineer and who can say, okay, like describe the problem space. Let me go figure it out. Um, and even at times going beyond that. So I think this is one of the biggest things that I've seen as a leap where I feel very comfortable now, even if I haven't given an extensive prompt saying, oh, I want to use auto mode or dangerously skip permissions or, you know, autonomous fully, just go do this, depending on the model, obviously. But it is important to vary it because I mean, the research you talked about there about basically growing employees to be able to take a problem and solve it themselves, that's the ideal employee you want to get to. But they are typically much more expensive because they know their worth in the marketplace. And the same is true of a frontier model. So. How do you evaluate success on the model side? Obviously, we have years of experience doing this for humans. Whether or not we're good at it, you know, we have modes for it, right? And I think there is an opportunity for us to get really good also, though I would argue maybe we aren't yet, at evaluating both frontier closed source and open source models using canonical queries. I know this is something that you're doing at Superhuman. What's your approach, Ben?

[28:27] Loic Houssier:
time, trying to understand what we were trying to do. So everyone was doing evals. So we were doing evals and sort of like using both real data, synthetic data and like input output. But pretty fast, we faced some issues because the

[28:45] Loic Houssier:
field of possibilities was too big. I will give you one example. we work on a feature to auto-draft a response to answer those questions that you may receive. Maybe it's someone pitching you, maybe it's a friend of you that is asking for your availability next week, maybe it's like, I don't know, whatever that can be. But when you think about it, like the possibilities, are tremendous. So we try to think about, and that's the role of the PM, interestingly, we're discussing that. Is it the role of QA? Is it the role of engineers? Is this the role of the PM? I think it's the role of the PM to identify the dimensions of evils that you want to solve for. So I will give you an example. Finding your availability and responding to someone with your availability. This is one specific dimension. I need a bunch of like, I was at test, like hundreds of tests, like of type of questions, different ways to ask for your availability, different ways to answer your availability with voice and tones and everything, and different type of availability that you have. But that's only one dimension. You can also receive questions about, like one query that we used that was fun was, how much time did you spend in Waymo last month? It's a very, I would say, simple query. It's a very simple question. So it's very easy to understand the intent of the person that is asking that. How much time? And the answer is you and your emails. So you need to understand what are the emails that are received from Waymo. And you receive a ton of emails. And Waymo is like sending me, sure, like receipts, but they're also sending like marketing content and all of like survey content and whatever like things. So you need to have this to be solved. Then you need to understand, oh, he said last month. And we know that LLMs are really bad with dates. So, like, what is last month compared to today? So there's some part of, like, that you don't want to let to the LLM and you want to say, no, no, no, last month means this. And, like, you provide the date. There's also the piece associated with finding a date, like the duration. So you need to find the duration. What is the duration? The concept of a duration. So there's a specific ontology associated to it. and then you have the duration and then you do summation. You need to summarize, like sum, all of those bills that you receive from Waymo to eventually answer your questions. So you take one query and you understand that to answer those query, for that specific query, you need to cover from an eval standpoint so many different dimensions. that you didn't have before. And it's not a matter of having like 10 times this or like hundreds of examples of this type of questions. It's going back to these eigenvectors, like what are those dimensions that would be used by most of the queries? And this is something that all the eval's application are not managing for you and we have to build on top of our own provider right now. So we have an eval's provider, but like all that level of abstraction, understanding what are those dimensions that you want to cover, can be then reused for other features. I don't know if I'm clear, but it was like Significantly painful, but we are happy to have this framework, because for PMs, it's framing what they want to cover. Like, what is the definition of DOM? What is quality? Well, once you have those dimensions that you want to cover, all of a sudden it's simpler. And because you focus on those eigenvectors, you don't have to focus on the full space.

[32:15] Conor Bronsdon:
I appreciate the explanation about your approach here and I love this example with Waymo because it does speak to a variety of tasks that have to be achieved to be successful and there's a clear right and wrong as well. And I imagine that designing for this was an early decision you had to make when you came into Superhuman because my understanding is that when you joined the company the very first day they had just kicked off a six-week sprint to launch the AI Inbox feature and that completely changed the product roadmap and ultimately the company's trajectory. How did you structure that sprint and then what did you learn from that experience that you've now applied into your approach over time?

[32:55] Loic Houssier:
The first thing is that that was the best and the worst onboarding experience that I've ever had. It was the worst because I had no clue, no time to ramp up, of course, but I had to help and support a team that will have to deliver a significant project. which is not easy, like you want to at least take the time to understand what you're talking about. But it was also the best experience because I had no choice but to be in the trenches with the team, acknowledge the fact that I don't know shit, pardon my French, and I hope it's okay for YouTube. But I didn't know anything. And I had to build trust with those people. And I don't have a lot of opportunities to do it. So from the get-go, it's like, guys, we're in this together. This is the mandate. This is the context. This is the mission. How can I help? And the reaction from the team was just phenomenal. And I think that me being direct and honest about what I can and can't do, and just managing the expectations with the leadership, supporting when needed, making the call in terms of like a move of people towards this project, was key. But yeah, like I heard about it, I think it was like January 30th, and I started like January 2nd, and I had a call from Raoul basically saying, Loic, we're changing the roadmap. It will be wartime when you start. Cool. Sounds fun, sounds fun. But at the end of the day, that was the best onboarding I could dream of because we were in this together and the team worked hard and rise to the challenge. Hopefully, I was helpful and useful and they were able to better test me in the first weeks to understand how I was reacting, how I was supporting. how I was managing up with Raoul, the founder, how we were working together. And after that project, I would say sweat and tears because it was like sweat, not sweet, sweat and tears. But after it, there was like a level of pride of the achievement that basically started the year on a very high note and build a strong level of confidence in our capability to execute. So it was really cool, actually.

[35:04] Conor Bronsdon:
It does sound like a very intense, but exciting and valuable onboarding experience, though I can imagine quite stressful. And I'm sure one of the things you were fighting during this experience is something I heard you describe, I think on the Legion Space podcast in December last year. where you drew this distinction between building tools versus agents and talked about fighting agent laziness. And anyone who listened to our recent episode with Anoush L. Ngovin over at AMD, you'll know that he called agents quite stupid and lazy. And I think this is something we've all experienced at times, despite how useful they are. I'm not trying to kill them. You have said, and we've heard people like Satya Nadella talk about the inbox as the ultimate AI agent experience for the future, but you're not building a single monolithic agent. Can you unpack what this direction for the company means?

[35:54] Loic Houssier:
Wow, there's so much to be said around this. The first thing about the... I do believe that email will become more and more important in this agentic world. Just think about the move from Cloudflare, where they basically, I would say, build this email for agent so that email can discuss with each other.

[36:16] Loic Houssier:
It's clearly the number one use case that open cloud or town are those very your personal agentic assistant are trying to solve for, it's like this email piece. Because email is draining. And this is what build the success of Superhuman. Email is so draining that people are willing to invest money and time to make it better. So yes, this is probably where people expect the most impact. It's also where people are the most fearful because it's very easy to look foolish. If you totally delegate your inbox to an agent, and you don't know what will basically be answered, or you don't know how this will be maybe archived without you knowing, and damn, you didn't reply to this like super important email of yours, it's both super important and super tricky. So to some extent, this is something that needs to be dealt with like high precaution, even if it's like the first use case that you're thinking about. I will give you one example but it's look foolish is part of our QA pattern. So like something that makes a user look foolish is a P0 bug for us. So it's a category of like QA bugs like look foolish because we archive an email and they missed it And it was something that was important. Paying school, for example. Oh, shoot. Or school email that is basically saying you don't have daycare tomorrow. We archive it. You don't know about it. Shoot, you forgot to pick up your kid at school. Not only

[38:02] Conor Bronsdon:
That's

[38:02] Loic Houssier:
you look

[38:02] Conor Bronsdon:
like

[38:02] Loic Houssier:
foolish,

[38:02] Conor Bronsdon:
a worst case scenario, honestly.

[38:04] Loic Houssier:
No, but this is giving you a little of how critical emails can be and how you cannot just be too fast in doing things. That's why building those AI features takes time and you need to think about all those dimensions that I was talking about to make sure that your features is happening well. For example, we do not want to send emails in the name of our users, because this is too risky. We do not send, we auto-draft. It's here for you to just hit send, but we won't send any mail for you, for now. For now, maybe one day, maybe one day, but like right now, our users are not ready to just let that go.

[38:51] Conor Bronsdon:
I understand that completely. I have to admit that my personal email is under Gmail. Please don't judge me. And when I was building some tools for it for honestly like post-production of the podcast to build out different lists of, hey, here I need to follow up with the guests, you know, let's automatically draft some stuff for them. When I did that, I explicitly set up an MCP server using the Google Workspace CLI, and I did not include the ability to send the email because I was like, I don't trust this. I need to have to review this. And yes, it's a bit of a headache, but continually I will find errors. And is it very convenient to have this draft made for me? And does it help me a lot with my workflow on the show and in other situations? Absolutely. But there is a high degree of importance to some of these emails, particularly ones from when it comes to like archiving your inbox or something like that, to your point around missing important emails. I mean, I started to tense up just hearing you talk about like, oh God, I miss picking up my kid from school or I missed a notification. Yeah, 100% understand. And I appreciate the attention to detail here.

[39:57] Loic Houssier:
And one thing, like, sorry to rebond on what you said about agent laziness, but there's like the agent laziness, but the other side of the spectrum, it's the agent agency. And if you allow, by design, some features like data dilation, or like sending an email, or managing your money, or whatever, what about the number of harness that you put around the agent? It's probabilistic. It's probabilistic. So there's one way or the other that it will basically mess up with your money, mess up with your emails at some point. And we had an example over the weekend where some harness was there to avoid the agent to do some, I would say, important things with money. like damn, we had the harness, yes, but like probabilistic. And right now, if you want to have like a real true gate to protect things, you don't expose them to agent. That's the only way. So very much aligned with you. And the agent laziness was a problem for us initially because agent were hesitating to do stuff because the model providers were like very, very like, asking always the user, should I do it? Should I do it? Should I do it? Should I do it? If you see Cloud Code, it has been designed to always ask you the same questions and you need to really change the configuration to say, no, no, no, don't ask me this, I'm good, I'm good. So I think that even the model providers are becoming less, like optimizing for laziness, because laziness means you decide and you can override, because if you have too much agency, you cannot override agency, it's done, it's over. And it's interesting, I made some mistakes about this agency, like in my own Obsidian setup in Cloud Code, in my second brain.

[41:48] Loic Houssier:
I have scars now, and I don't want my users to face the same.

[41:52] Conor Bronsdon:
Yeah, it's it's really interesting you bring that up. I have moved to keeping like all of my notes, all my personal context in a Git repo because I don't trust the agent not to screw it up. I want to be able to restore and I want to have like clear. Catalog of what's happened and it creates some building this, but yeah, I similarly have had some scars there. I'm like, OK, everything has to live and get if I need it. If it matters to me.

[42:19] Loic Houssier:
Yeah, and when you see that even PRs can be reverted, even without you knowing, like when you see the latest brands messing up with data correctness, this is scary. If they cannot do it,

[42:36] Loic Houssier:
it's tough.

[42:37] Conor Bronsdon:
This conversation is not meant to scare y'all. It is a very exciting time, but there are risks, like we have to consider them. And I appreciate the attention to detail you're providing here too.

[42:47] Loic Houssier:
Yeah, and I think that to go fast and be efficient, you need to understand the risk and the limits. The more you understand the risk and the limits, the faster you are, I believe. So that's why also we need our people to understand what can and can't be done. So that you just don't guess. You just don't guess. Especially when you manipulate people's email.

[43:07] Conor Bronsdon:
How have you changed your hiring practices to match this current era?

[43:13] Loic Houssier:
That's very interesting. So for one, SuperHuman male hire product engineers. Compared to SuperHuman Corp, Grammarly and Coda, we are optimizing for software engineers. So we have a DNA in that sub-business unit, if you will, that is very user-centric. So we have already like a hiring process that was different. A great software engineer was not enough. We needed to see someone that had an appetite for design, even if you're a back-end engineer. If you're designing a back-end system, if you're not asking about the user, the experience, and how this can be reflected in their own experience, you're probably not an engineer for super human make. So for one, we already had this very user-centric approach to things. But now we're trying to understand how we can bring that in. Because yes, we need the fundamentals. We need people to be able to debug. We need to have engineers that understand what they're doing and talk in depth about the work that they've been doing. But how to test AI fluency is something that we're still figuring out. I've seen that a town, especially like they've implemented, like they just bring their GitHub repo and they said to the engineers coming in, you plug whatever you want in there and try to build this and help us like go through your thought process. They look at the way the person is prompting, the person is like preparing the plans, the way the person is discovering the code itself to measure this AI fluency. We're not doing that right now, but it's about to start probably like this quarter. That being said, I can tell you how this is changing my own interview. So I ask people where they are in the journey, not to understand if they are using the latest tool, but how they change their approach, how open they are, how they tweak themselves without waiting for whoever the CTO is telling we need to use X, Y, or Z, how they are trying to self-correct, self-improve. Some people are just happy with whatever the company will be providing. And in this world, I think it's not enough. You need people that will augment themselves because there's no way one size fits all and will be supporting all engineers.

[45:41] Conor Bronsdon:
It's great advice. Um, and it makes me wonder, what would you tell other leaders who are listening about what you think the future looks like for building an engineering org?

[45:55] Loic Houssier:
You mean the future in two weeks or your future in

[45:57] Conor Bronsdon:
Uh,

[45:57] Loic Houssier:
two years?

[45:58] Conor Bronsdon:
let's try for like, uh, six months if we can. That's still a ways out, but yeah.

[46:03] Loic Houssier:
I don't have a clue. I don't have a clue. What I know for a fact, what I know for a fact is that the one thing that is helping me a lot is pattern detection. At the end of the day, it's like a new set of problems, but the way to manage those problems are usually through people. So at the end of the day, harness your management skills, like the noble way to say it, like the real management piece, like how to deal with people, how to convey a vision to people, how to change behavior, how to, I would say, support people in detecting patterns. I think this is the one thing. So it's interesting you mentioned the fact that I was working in the defense industry and I cannot talk about everything that I've done. But, and nothing crazy, but one thing that helped me understand is that your knowledge can be put into different areas. And one thing that I've done there can be applied to tech industry. One thing that you've done in Cobol with a team focusing on Cobol can probably be applied with people working with agents. At the end of the day, you're managing a set of people with different tools. The tools are changing, fine. But as a leader, don't focus on the tool. Focus on the core aspect of your job, which is helping people to find the right tool.

[47:21] Conor Bronsdon:
Well, uh, Loic, you brought it up, so I have to ask. Is there anything you can tell me about nuclear submarines that I don't know that you're allowed to talk about?

[47:29] Loic Houssier:
I think this is probably, so besides the international context and all of that, so I take everything that I say with a grain of salt, but for one, this is probably the one piece that is able to bring peace, because you never use it, you hide it, but you, I would say, fearful enough that you're frightening people so that they don't mess with you. So I think it's a good tool for peace. From an engineering standpoint, this is freaking crazy. It's a nuclear power plant inside a tube that is in the sea, moving like this. And engineers needs to work for this. So like, sometimes it's like, when you have some engineers with strong ego, working in tech, like, hey, there's some people that build some crazy shit out there. And sure, tech is fun, tech is like, very rewarding and everything. But like your ego, keep it for yourself. So like, just like the sheer amount of engineering that needs to happen to just have like a nuclear submarine, be safe for like the 200 people that will be in it. That's mind blowing.

[48:37] Conor Bronsdon:
Thank you for indulging my curiosity, Loic. It's been a pleasure having you on the show. Any closing thoughts for our audience?

[48:44] Loic Houssier:
Yeah, maybe one. We're struggling as leaders. You have access to mentors, you have access to like peer leaders and all of that. You have access, maybe like you're consuming content that is slightly different. You have more experience, you have gray hair, maybe, maybe not your team. So like, I think that, yes, it's tough for us as leaders, but I think it's even tougher for our teams and we need to be conscious of that.

[49:07] Conor Bronsdon:
a great note to end it on. And, Loic, thank you again for coming on and sharing your insights. I wish we had more time. It's been a distinct pleasure. I need to pick your brain more about the lessons you've learned building critical infrastructure for nuclear submarines, your experience with cryptography, and so much more about how you've managed acquisitions and are building teams today. It's been great chatting with you and definitely recommend anyone who is looking to Join a great engineering org that I suspect Luik probably has a couple of roles open or will have them at some point. So

[49:38] Loic Houssier:
Always, always.

[49:40] Conor Bronsdon:
what's the right website for folks who are interested in checking that out?

[49:43] Loic Houssier:
superhuman.com slash career, a bunch of roles opened. They are more than happy to be a referral, ping me with some context on LinkedIn. And I shouldn't be a stranger, should be fine.

[49:55] Conor Bronsdon:
Amazing. Well, Luik, thank you so much for coming on the show. It's been great having you.

[49:59] Loic Houssier:
Thanks for having me.