Chain of Thought | AI Agents, Infrastructure & Engineering | Context Poisoning Is Killing Your AI Agents: How to Stop It

Michel Tricot co-founded Airbyte, the open source data integration platform with 600+ free connectors that hit a $1.5 billion valuation. Now he's building the company's next product: an agent engine, currently in public beta. His thesis is that agents don't fail because models are bad. They fail because the data feeding them is wrong: context poisoning is killing them. Michel demos this live. A simple Gong query through raw API calls burned 30,000 extra tokens and took three minutes. Chain of Thought is hosted by Conor Bronsdon.

Show Notes

Michel demos this live. A simple Gong query through raw API calls burned 30,000 extra tokens and took three minutes. The same query through Airbyte's context store ran in one minute and used a fraction of the context window. Conor and Michel dig into why RAG alone won't cut it, what a "context engineer" actually does, how Airbyte tracks entities across Salesforce, Zendesk, and Gong without embeddings, and whether the SaaS apocalypse playing out in public markets is overblown.

Chapters:

0:00 Intro
0:20 Meet Michel Tricot, CEO of Airbyte
2:27 Data Got Us to the Information Age. Context Gets Us to Intelligence.
4:48 How Context Poisoning Breaks Agents
7:49 Why Airbyte Customers Stopped Loading Into Warehouses
10:12 Live Demo: Context Store vs Raw API Calls
10:38 What Does a Context Engineer Actually Do?
14:14 RAG Isn't Dead, But How We Build It Will Die
16:41 30K Wasted Tokens Without Proper Context
22:22 Cross-System Joins: Zendesk, Gong, and Salesforce
26:12 The Open Source Agent Connector SDK
29:45 The SaaS Apocalypse Is Overblown
36:09 From Data Pipes to Agent Infrastructure
38:51 What Agents Need to Get Right by Summer
40:48 Memory Is Just Another Form of Context
43:07 Outro

About the Guest:

Michel Tricot is the CEO and co-founder of Airbyte, the open source data integration platform used by thousands of companies to move data between systems. Before Airbyte, he led data ingestion and distribution engineering at LiveRamp. Airbyte raised at a $1.5 billion valuation and offers 600+ free connectors. The company recently launched the public beta of its agent engine, which includes a context store, agent connector SDK, and MCP integration.

Guest Links:

Connect with Chain of Thought host Conor Bronsdon:

Newsletter: https://newsletter.chainofthought.show/
Twitter/X: https://x.com/ConorBronsdon
LinkedIn: https://www.linkedin.com/in/conorbronsdon/
YouTube: https://www.youtube.com/@ConorBronsdon

More episodes: https://chainofthought.show

Thanks to our presenting sponsor Galileo. Download their free 165-page guide to mastering multi-agent systems at galileo.ai.

Creators and Guests

Host

Conor Bronsdon

Creator and Host of the Chain of Thought Podcast | Technical Ecosystem Lead at Modular

What is Chain of Thought | AI Agents, Infrastructure & Engineering?

AI is reshaping infrastructure, strategy, and entire industries. Host Conor Bronsdon talks to the engineers, founders, and researchers building breakthrough AI systems about what it actually takes to ship AI in production, where the opportunities lie, and how leaders should think about the strategic bets ahead.

Chain of Thought translates technical depth into actionable insights for builders and decision-makers. New episodes weekly.

Conor Bronsdon is an angel investor in AI and dev tools, Technical Ecosystem Lead at Modular, and previously led growth at AI startups Galileo and LinearB.

Disclaimer: All views, opinions and statements expressed on this account are solely my own and are made in my personal capacity. They do not reflect, and should not be construed as reflecting, the views, positions, or policies of my employer. This account is not affiliated with, authorized by, or endorsed by my employer in any way.

FINAL TRANSCRIPT
================
Speakers: Conor Bronsdon, Michel Tricot
Duration: 43:42
Total Words: 7452
Generated: 2026-03-17

---

[0:20] Conor Bronsdon:
Welcome back to Chain of Thought, everyone. I am your host, Conor Bronsdon, Head of Technical Ecosystem at Modular. It's great to see everyone. My guest today is Michel Tricot, CEO and co-founder of Airbyte, the open source data integration platform that became a $1.5 billion company by doing something the industry said couldn't work, giving away 600 plus data connectors for free and building a business on top of it. But Michel isn't here just to talk about ETL pipelines. He's here because Airbyte has launched the public beta of their agent engine. The thesis behind it is one of the more interesting bets in AI infrastructure right now, and I think a great extension of the work Airbnb has previously done to build their data advantage. And that thesis is that the reason agents fail in production isn't the model, it's the context. We have all probably seen this. I know I am seeing this in many of my personal agent building activities where context poisoning occurs or I provide too much context or simply I'm just pulling in too many tokens and suddenly I can't run the damn thing. Michel, welcome to Chain of Thought. Where are you joining us from today?

[1:21] Michel Tricot:
Hey Keno, great to be here. I'm actually in downtown San Francisco, very close to the ferry building for those who know the area a little bit.

[1:30] Conor Bronsdon:
fantastic location. I was just there a few hours ago and just got home to Seattle. So great to see you. I'm sad we couldn't do this in person. But I'm excited for this conversation we're going to have today about solving data pipelines for humans and also solving data pipelines for agents and how they are fundamentally different but related problems. What it takes to actually make agents constantly effective at scale is an open question we're all figuring out together, so this should be a great conversation for it. And one of the things that certainly helps agents to scale are our presenting sponsors Galileo.ai. Galileo helps with evaluations and guardrails to improve your agentic reliability. Check them out again at Galileo.ai. Michel, you've written that data moved us into the information age and context will move us into the intelligence age. That's a strong claim. What do you mean by that and why does it change everything about data infrastructure?

[2:27] Michel Tricot:
Yeah, I think the first thing is trying to understand what is the growth that everyone is trying to get with AI by creating agents. One of the main goals is to delegate, automate, and many tasks that are currently owned by humans and make sure that you can have agents that can work independently and autonomously on top of any kind, of course I come from data, so any kind of data that your company might own and might generate. So when we look basically at the history of how data has moved and you look at You move data into a warehouse, but all the context about the organization, all the understanding about how data relates to each other, this is the intelligence of the humans and the data people that are working on top of these warehouses to understand what to make out of that data. And when you come to AI, like one of the agents leveraging the data, the idea is like, is there a way to structure and to make data available to agents so that they can do a part of understanding what is going on? Is there a place and a world in which, well, maybe you don't need to go through all this very complex ETL pipeline where you need to clean the data, map the data, identify like joint keys between different data silos. And that to me is what I mean by powering more of the intelligence now. It's like an agent has the ability to treat and to process all that information and to make relationships between different entities that exist in your system. You know, if you take a very simple example, It would take time from someone to just map from a field called position on one system and a field called job title on another system. This is something that an agent can do automatically because it's the same concept. And I think that's why it means to create a context search, just giving enough metadata, enough information, enough description about the data so that the agent can actually autonomously create relationship between concepts.

[4:48] Conor Bronsdon:
And I've heard you make this distinction before between data and context, because I think we often use the term somewhat interchangeably and increasingly are realizing that if we just think about feeding as much data into an agent as we would like to, it actually can cause problems. Can you talk a bit about what failure modes you're seeing and how context windows and data pipelines are interrelated and intertwined?

[5:18] Michel Tricot:
Yeah, so the first thing is data is stored across hundreds and hundreds of systems. So that's a common, I would say that's the common ground for any kind of data access is you need to have a way to access that data. It used to be and it still is like fit as much data into a warehouse so you have centralized view of across all of that. And a phenomenon that I see a lot with agents is forgetting about the fact that there is a reason why warehouses and centralization was built. It is to remove the complexity of accessing data, reliably accessing it. You cannot run an analysis by connecting through an API. you will have issues with API rate limits, you will have issues with, hey, maybe they don't offer the right way to access that data. And I think here we see the same type of failure mode when it comes to AgentWare. Yes, you have an API, you can have your agent to connect to the API. But what happens when the API doesn't support the access that the agent needs to get access to the data? So there is a lot of little things like that, that in a way we're reinventing. And yeah, the warehouse is great for BI, doesn't have enough of the context that is necessary for the agent to actually create these different connections.

[6:44] Conor Bronsdon:
Yeah, and there are a lot of conversations happening right now with different agent builders. And I'll say this, this probably signposts us a bit on when we recorded, which is going to be a week or two from actually release. But currently, Twitter is all abuzz about MCP is dead, and you know, go back to API's and it's it I think speaks to a I think they're wrong but I think it speaks to the the fact that there is a lot of confusion about how to get data into our agents most effectively and and frankly we haven't truly figured out like what's the gold standard going forward there are a lot of different methods they it can depend on you know what you're trying to achieve maybe you need a fallback method maybe you start one way and move to a separate as you realize what kind of context and data you need to feed in but Let's get really clear on context and why it matters. You know, we've talked a lot about on the show about, you know, context is crucial for agents. What are you seeing in the agent builder work that Airbyte is doing that is driving your iteration around context and data?

[7:49] Michel Tricot:
Yeah. So the first thing, like one of the reasons we started on this new product is that we started to see a change of pattern in how people were using, uh, the current, uh, AirByte product, which is instead of pushing data directly into warehouses, which was, I would say the gold, the golden path for people using AirByte, they started to fit it more into blob storage. And,

[8:17] Michel Tricot:
As we were chatting with these customers, we realized that they were not building an analytics and a BI system. What they were building is actually an infrastructure to power agents, and they have already gone pass the demoable state of an agent where you just directly connect to the API, and they quickly realize that, yes, that doesn't work. When people say MCP is dead, MCP is not dead. MCP is just an interconnect. It's just a way of connecting an LLM to an external system. You can make the LLM as complex as you want and as good as you want. Of course, if you just do a thin layer in front of an API, MCP is going to suck.

[9:02] Michel Tricot:
it doesn't have to be the case and that to me is where I see the change happening and is people are realizing that yes there is a need for like post-processing that data so that the agent can work faster with more relevant data because if you get stuff from an API you might just get like a list of records that you might not need. And all of that is just going to pollute your context. And, you know, if we get to that, I have a demo that shows that very, very clearly. Like the moment you don't have proper handling of the data that is fed to DLLM, DLLM falls on its face and gives you the wrong results. And so that to me is what people are realizing and that's why I think we're going more and more toward this thinking about, yeah, how do I actually create proper processes to get my data into my agents with all the guardrail and all the accuracy that we used to have in the past more with warehouses and databases.

[10:04] Conor Bronsdon: [OVERLAP]
I was honestly planning to ask you to showcase this demo a bit later, but if you want to jump in right now, I

[10:10] Michel Tricot: [OVERLAP]
No,

[10:10] Conor Bronsdon: [OVERLAP]
think

[10:11] Michel Tricot: [OVERLAP]
we

[10:11] Conor Bronsdon: [OVERLAP]
we're

[10:11] Michel Tricot: [OVERLAP]
can,

[10:11] Conor Bronsdon:
on the

[10:11] Michel Tricot: [OVERLAP]
we

[10:11] Conor Bronsdon: [OVERLAP]
topic.

[10:11] Michel Tricot: [OVERLAP]
can.

[10:12] Conor Bronsdon:
Let's show it off.

[10:12] Michel Tricot:
Yep. Maybe we can do that. Okay.

[10:15] Conor Bronsdon: [OVERLAP]
And while you're pulling that up, another thing I'd love to ask is about context engineering, this idea of context engineers, this new term that has emerged in which I've seen you as a proponent of. We've had prompt engineers. We've had

[10:30] Michel Tricot: [OVERLAP]
Yeah.

[10:30] Conor Bronsdon: [OVERLAP]
automation engineers, we've had data engineers, now we've got context engineers. What does a context engineer do that data engineers don't?

[10:38] Michel Tricot:
I honestly think that it's more a role for the next one, two or three years, because a lot of that is going to become very mechanical and self-learning, like on how you actually make your context better and enrich your context with additional data and additional rules. So I think the same way we were talking about prompt engineering, like the title doesn't exist anymore, but in point in time, that was something where people needed to really dive very deep into what a prompt should look like, how do you optimize it for the better outcome. I think we're going to get to something similar when it comes to context engineering. And my guess compared to pure data engineering, I would say it's going to be maybe more tied to pure software development rather than pure data engineering. But it is still going to be people that are fairly technical, who understand how to pull the data and how to shape it or to help the agent shape it in the right way. But if everything goes according to plan, agents should be able to do it at some point.

[11:51] Conor Bronsdon: [OVERLAP]
Yeah, I think it's just a timing question. That's for sure.

[11:53] Michel Tricot: [OVERLAP]
Yeah.

[11:55] Conor Bronsdon:
Um, so let's show an example here. I know you've got an example of when data in context can fail an agent and how to solve it.

[12:03] Michel Tricot: [OVERLAP]
Yeah. So let me show a little, maybe I should just start by a quick run through on the product itself, just so that it's easy to understand what is happening behind the

[12:16] Conor Bronsdon: [OVERLAP]
Perfect.

[12:16] Michel Tricot: [OVERLAP]
scenes.

[12:16] Conor Bronsdon: [OVERLAP]
Yeah. If you want to talk us through it, I think that works great.

[12:19] Michel Tricot:
So the way the agent engine actually works is, the first thing is, it needs to have access to all the places where you have data. And that won't feel too dissimilar to what Airbyte has been since the beginning, which is, you just have a very simple way to start connecting any one of your systems into the Airbyte engine. Now, the thing that is different, though, is that we offer what we call the context store, which is a way of centralizing the data and to actually make all of this data not just available directly through the API, but also through this context store. And all of that can be directly integrated either through pieces of code that you can embed into your agent directly. So you just need to provide all the right credentials, which connectors you're targeting, etc. And then you can make your agents basically data aware.

[13:31] Michel Tricot:
These connectors basically operate in three different modalities. The first one, which is leveraging the context a lot, is what we call discovery and search, which is, it is the ability for the agent to discover what is the universe of data that is made available to it. What are the fields? What are the entities? What are the different documentation that exists on top of that data? The second one is really the ability to retrieve information. So getting, searching, etc, etc. And the last one finally is just how do you interact with upstream and downstream systems? So how do you write information back into these systems?

[14:11] Conor Bronsdon: [OVERLAP]
So let me ask a couple of questions here. So I

[14:13] Michel Tricot: [OVERLAP]
Yep.

[14:14] Conor Bronsdon:
think one of the previous, you know, X is dead narratives, uh, that has emerged on X and hash Twitter has been the idea of, you know, reg is dead. Uh, and now people are going, Oh, right. It's actually still useful. Um, what about folks who would say, Oh, well, reg solves all of these data problems you're having here. Just chunk and embed your data. You know, it'll be fine. What's your response? Why are you building something more complex?

[14:35] Michel Tricot: [OVERLAP]
Yeah, because actually building RAG is very complex. That is actually the thing that is happening.

[14:40] Conor Bronsdon: [OVERLAP]
underrated complexity there for sure.

[14:42] Michel Tricot: [OVERLAP]
It's like, people, when you start, like, RAG is, to be clear, like, the value proposition of RAG, which is you have a query, you embed it, and you search that embedding through your RAG database. Well, that works, like, you are able to figure out and get pieces of information. But what you quickly realize is that the accuracy is wrong, is off.

[15:07] Conor Bronsdon: [OVERLAP]
And

[15:07] Michel Tricot: [OVERLAP]
And

[15:07] Conor Bronsdon: [OVERLAP]
how

[15:08] Michel Tricot: [OVERLAP]
so

[15:08] Conor Bronsdon: [OVERLAP]
do you

[15:08] Michel Tricot: [OVERLAP]
what

[15:08] Conor Bronsdon: [OVERLAP]
correlate

[15:08] Michel Tricot: [OVERLAP]
do you end

[15:08] Conor Bronsdon: [OVERLAP]
the data

[15:08] Michel Tricot: [OVERLAP]
up

[15:09] Conor Bronsdon: [OVERLAP]
together? How do you actually search it? Are you using a basic search? Like what, is it semantic search? Like, are you having relational databases in there? Yeah. There's a lot of detail that goes in very

[15:19] Michel Tricot: [OVERLAP]
And

[15:20] Conor Bronsdon: [OVERLAP]
rapidly.

[15:20] Michel Tricot:
what people start doing is they start, okay, they start chunking the things, but they start putting metadata to make the search more accurate. So they start saying, oh, this sales conversation was talking about pricing, but you know what, let me also annotate that chunk saying that Conor was in the call. So that now when I search it for, data about what calls Conor was having that we're talking about pricing, then you have the right search. But if you don't annotate the data with Conor, well, you're not going to get anything. And that's, to me, what is happening with RAG is that you realize that the accuracy is wrong and you try to fix it You

[16:02] Michel Tricot: [OVERLAP]
try to fix it before the data gets ingested and you start building very complex machinery that is pulling structured data and unstructured

[16:08] Conor Bronsdon: [OVERLAP]
You create

[16:09] Michel Tricot: [OVERLAP]
data.

[16:09] Conor Bronsdon: [OVERLAP]
an infrastructure problem that you have to solve essentially.

[16:11] Michel Tricot: [OVERLAP]
Exactly. Exactly. So, Reg is not dead. It's just the way we're doing it is probably going to die. And what the contextual actually gives you is like this ability to both look at all the metadata that is available and attach it to chunk and more like unstructured type of data. Now what I'm going to do is like show what does it look like when you start I'm going to do a bit of

[16:41] Michel Tricot: [OVERLAP]
before and after. I'm going to start with the after because the before takes a little bit of time.

[16:45] Conor Bronsdon: [OVERLAP]
That's fair, because it seems like the thought process is like, look, an agent needs access to maybe my Salesforce, my GitHub, my Zendesk. How can we actually get them there so it's an effective use of their time and they're not ruining their context window by pulling in too much data?

[16:59] Michel Tricot:
Absolutely. So here, what I've done is on the cloud product, I've configured a Gong connector. So I'm going to run some analysis on core, very basic ones for now. Let me make sure my MCP is properly configured.

[17:12] Conor Bronsdon:
This is a large percentage of agent engineering these days is, is my MCP properly configured.

[17:21] Michel Tricot:
What we have here is basically I've pre-configured a connector on Gong. Gong is a very popular transcript sales tool that a lot of sales teams are using. And now what I'm doing is I've added this connector and made it available through MCP, through Cloud. One reason why MCPs are good is just it's universal in terms of how do you access resources. And I'm going to start by just showing what the baseline context looks like. And here you look at it, you see that from startup, Claude uses like 16K token already. And now I'm going to ask it to do a very simple query. on A, retrieve all nice GONG calls since February 1st. Very simple and very common query that any agents might want to do if they want to start running analysis, understanding like objection, understanding

[18:21] Conor Bronsdon: [OVERLAP]
So it's leveraging

[18:21] Michel Tricot: [OVERLAP]
how

[18:21] Conor Bronsdon: [OVERLAP]
the Gong MCP, I see.

[18:24] Michel Tricot: [OVERLAP]
Yeah, which by the way, behind the scene, when you see all these search actions, is actually leveraging the context store. So it's able to convert NAI into like with NAI in the GONG system. And after that it starts, it finds it and it's able to actually get the listing of all the different calls.

[18:43] Conor Bronsdon:
And the contact store is this database that you're replicating hourly into a managed data store. So agents can actually search it instead of hitting APIs directly to increase basically contact efficiency. Is that correct?

[18:54] Michel Tricot:
That is correct, especially like if we take Gong for example, when you want to search and get calls for a particular users, well, you cannot filter by the user. So you end up pulling like lists and pages and pages of calls

[19:09] Conor Bronsdon: [OVERLAP]
Intrigue.

[19:09] Michel Tricot: [OVERLAP]
until

[19:09] Conor Bronsdon: [OVERLAP]
So their API doesn't actually allow you to filter by user.

[19:12] Michel Tricot:
No, it doesn't.

[19:13] Conor Bronsdon: [OVERLAP]
Hmm.

[19:13] Michel Tricot: [OVERLAP]
Some part of the APIs do. So here, typically, we see, OK, 45 calls took about a minute. And what is interesting, though, is to look at the context consumption to get access to all this information. I forgot where we were, but basically 37,000 now. I think we're at around 26 at boot up. Now, the thing that is interesting is to look at what does it look like when you make your agent only rely on API calls. So here I have a version of it that is not going to be using Cloud. but that will still have access to listing and getting and retrieving information. So context, still the same when it boots up, about 16K, so that was 16K. And now let's ask exactly the same question. And so here, you end up in a state where your agent has to know

[20:13] Michel Tricot: [OVERLAP]
has to work around all the deficiencies of APIs. And so when agents have deficiencies, like when APIs have deficiencies, it means that you need to build more to make your agent efficient. And typically an example here is you cannot search for users. So what do you need to do? You need to go through all the pages and you need to find who is naive in my system. And what happened then is like people start saying, oh, then I need to replicate that data if I don't want to be paginating through all these pages. And so bit by bit, they start building their own homegrown version of the contextual because they want that efficient access for these very simple queries.

[20:54] Conor Bronsdon: [OVERLAP]
I have admittedly done this very recently, so

[20:56] Michel Tricot: [OVERLAP]
And

[20:57] Conor Bronsdon: [OVERLAP]
this is interesting to see.

[20:59] Michel Tricot: [OVERLAP]
that's just a pattern because you start with API and then you end up putting your finger in a big machinery where you have to rebuild data centralization. You need to build indices. You need to build searches. You need to build relationship between different concepts. And here, like the very degenerate example is, well, I need all these calls. And what do I need to do? Because I cannot filter by my NAI user. I need to go through every single course over the past month and a half and I can tell you this takes about three minutes to run so what I'm going to do is just show you something a little bit different and we can see after that like how much time it's taken overall because well it has to hit the API, it has to go for, like, what are the, like, API rate limits. It has to do a lot of things that it shouldn't have to do and things that can actually cause your agents to break. So good thing, this time it worked and it's great. Like, normally it takes about three minutes and often you have more than 45 calls because it has so much bloat in the context. that it's unable to fully filter by the one made by Ni. And you see in terms of token consumption,

[22:15] Conor Bronsdon: [OVERLAP]
Yeah, 62k.

[22:15] Michel Tricot: [OVERLAP]
we're basically consuming almost like 30,000 more tokens than we were

[22:22] Conor Bronsdon: [OVERLAP]
Wow,

[22:22] Michel Tricot: [OVERLAP]
before.

[22:22] Conor Bronsdon: [OVERLAP]
yeah.

[22:22] Michel Tricot: [OVERLAP]
And that's just a very simple example. So now imagine when you have very complex agents. Yeah, so that's a little bit like the value of what it means to have a good context. Now the thing where it becomes even more interesting is when you start doing what I call joins across different systems. So for example, this one is very simple. I want to look at all the TKs that were created. Let's do yesterday because today just started. And

[22:55] Conor Bronsdon: [OVERLAP]
This

[22:55] Michel Tricot: [OVERLAP]
give

[22:55] Conor Bronsdon: [OVERLAP]
is

[22:55] Michel Tricot: [OVERLAP]
me

[22:55] Conor Bronsdon: [OVERLAP]
where

[22:55] Michel Tricot: [OVERLAP]
also,

[22:56] Conor Bronsdon: [OVERLAP]
I have to admit I haven't used co-work that much.

[22:59] Michel Tricot:
yeah.

[22:59] Conor Bronsdon:
I've been trying to become a power user of cloud code, but I still need to integrate co-work into my workflow a bit more.

[23:05] Michel Tricot: [OVERLAP]
Yeah, it's actually how I also as a CEO, how I like to dig by myself into how different parts of the organization are working. So I have my own MCP that I call the CRM MCP that has like Gong, Salesforce, Zendesk. There is a fourth one. well whatever but basically what it's doing here is just able to pull things from uh zendesk then it's able to resolve the organization in zendesk into what do they look like on gong and then it's going to be able to just pull me all the different gong calls and

[23:41] Conor Bronsdon: [OVERLAP]
The

[23:41] Michel Tricot: [OVERLAP]
this

[23:41] Conor Bronsdon: [OVERLAP]
correlation

[23:41] Michel Tricot: [OVERLAP]
is something like

[23:41] Conor Bronsdon: [OVERLAP]
is really nice.

[23:42] Michel Tricot:
Exactly. The correlation is so important for agents. But also, this is a place where agents are actually very good at. As long as you give them the ability to self-improve. Like, if I'm searching for, I don't know, let's take Airbyte as a company, and maybe it's misspelled on Zendesk, or maybe someone put Airbyte Inc., but on Salesforce it's called Airbyte. doing this type of correlation by just hitting APIs, your agent is in for a treat here. Because it might not work, the API might kick you off, the API might not have fuzzy searches, etc. It might not have embedding. And that's really what you get is this ability to automatically create these different mappings between different systems.

[24:33] Conor Bronsdon:
So tell me more about the background data architecture that's powering all this. You're basically reducing API calls to a schema search within the database?

[24:45] Michel Tricot:
Yeah, so what we do today is we have our own query engine. So yes, what we do is we basically have created a full data pipeline that hourly, or depending on how you've configured the context, can replicate the data more or less frequently. And then we index that data. One thing that we are going to do on the contextual is support a stronger embedded BLE Bear also with unstructured data. But right now for like structured data and very simple like fuzzy matching, it works extremely well and it allows you to just you know, even if you have unstructured data, even if you don't have embedded to always get the right transcript because all this metadata is tied to that transcript and you can already get it. So now imagine you have also the embedding on top of that, like your search is just so good. And in terms of data engineering, yes, it's just, very close to what Airbyte is known for, which is, A, you connect to the systems, you load this data into the systems, and then you do some post-processing on the data to make sure that agents can have the most accurate data. And it's exposed either through directly an API, it is exposed through MCPs. And we also have, sorry, I'm gonna share my screen again, one last time.

[26:12] Conor Bronsdon:
And for context for the audience, you have open sourced the Agent Connector SDK that powers a lot of this.

[26:20] Michel Tricot:
Exactly. And that's exactly what I want you to show, which is

[26:22] Conor Bronsdon: [OVERLAP]
Perfect.

[26:22] Michel Tricot: [OVERLAP]
we have also a repository that contains all the agent connectors that we have. So those can connect, like read and write from API, and they can also interact with the contexter. And if we take a very simple example, let's take, yeah, we just released ClickUp yesterday, so I think it's a good one. You can actually see all the types of questions or tasks that this connector can do for your agent. So you can almost automatically populate your agent with those and it can understand the scope of operation it can perform. And

[27:06] Conor Bronsdon: [OVERLAP]
So

[27:06] Michel Tricot: [OVERLAP]
then all the information about how you integrate it into your actual agent.

[27:11] Conor Bronsdon:
how are you combating context bloat with these connectors? Because that's obviously a problem that I'm sure many folks have run into where you just end up feeding too much information to your LLM and it begins to be uncertain about what it should be prioritizing. Or you just simply burn too many tokens as we showed earlier.

[27:31] Michel Tricot:
Yeah, so the search actually is very scoped. So the first thing is a lot of this is done on the back end to make sure that we only return what we believe is the most relevant information. So that's the first part. And that's one thing you will have a hard time doing with APIs, because it will just return to you the whole block of information. Now, one thing that, and this is something we're building at the moment, is how do we create more of like a a data retrieval agent. So something that gets a sense of what is the task at end and can, on our side, figure out what are the different pieces of information that you might need and make it available to the calling agent.

[28:14] Conor Bronsdon:
And these kind of agent handoffs, I think, are increasingly important as we start working within agent teams. I mean, whether or not you're coding in Gastown or you're using, you know, parallel teams in Cloud Code or something else there, increasingly we need to hand off to other agents.

[28:30] Michel Tricot:
Yes, and that's really the solution. At the end of the day, we go back to a very standard engineering problem, which is if you have a monolith, well, it's very hard to control what's going on. So if you just have one agent that does everything, you will have context bloat, you will have accuracy issues, etc. And it's better to just spin up a smaller agent that knows what it does and does it super well. So decomposition.

[28:59] Conor Bronsdon:
And I will shout out agency.org here, which is an organization that I was part of building when I was at Galileo and we kicked it off with Cisco and many other great partners, now part of the Linux Foundation. But I think these kind of open efforts, and there are others that are in the open too, to ensure agents can connect and work together are just going to be so crucial for the next couple of years because we can't have these be black boxes.

[29:24] Michel Tricot:
Correct. Yeah. And also it's just like not every company can build excellency in one specific domain.

[29:34] Conor Bronsdon: [OVERLAP]
Bye.

[29:34] Michel Tricot: [OVERLAP]
So, you know, we see Firecrawl. Why does Firecrawl exist when every company needs to crawl the web? It's because they know how to create agents that crawl the web and then they just hand you off the information.

[29:45] Conor Bronsdon:
So this is a brief sidebar, but I do think that the idea that this was kind of driven this kind of SaaS apocalypse in the public markets, I think it's very overblown. Like there are many problems with SaaS companies, many of them vastly overhired and are bloated and have issues. Many of them are in trouble. But most SaaS solutions, if they get really good at what they do, if they drill down, Why would I try to build that in-house? I don't actually want to build my own CRM. I want a good CRM that I don't have to worry about, that has support and training for my salespeople and everything else. So

[30:19] Michel Tricot: [OVERLAP]
Yeah.

[30:19] Conor Bronsdon: [OVERLAP]
it's interesting to see this narrative emerge. There's going to be a ton of custom software, but it's not always better to build versus buy.

[30:26] Michel Tricot: [OVERLAP]
Yeah, absolutely. I would say the only thing is it might not be the existing player, it might be other players that specialize into CRM made available to agents. But I

[30:37] Conor Bronsdon: [OVERLAP]
Totally.

[30:37] Michel Tricot: [OVERLAP]
think going out of your habits of that model was working and just how do I reinvent myself to make it work in a world where agents are the only one or the main ones interacting with my system. You need to really reinvent yourself and that might cannibalize your existing business.

[30:56] Conor Bronsdon: [OVERLAP]
Yeah, that's super true. And not every business currently wants to make that pivot. I mean, I think Google is a great example here where they were slower to build Gemini and to build in AI features than they maybe otherwise would have been because an incumbent, they have the best business in the world that spits out cash every month, every day for them. from this incredible search business, and they were very worried about cannibalizing it. So they were slower to market on the AI side, despite having driven a lot of the research, despite having

[31:25] Michel Tricot: [OVERLAP]
Yeah.

[31:25] Conor Bronsdon: [OVERLAP]
this incredible technical team. And it did give some first-mover advantage to folks like OpenAI and Anthropic and others. So it's going to be really interesting to see how this all shakes out. There's definitely huge changes happening. I mean, the Block layoff is a great example where, and for folks who are interested, I wrote a long, like 6,600, way too many words, essay. You can check it out on my newsletter at newsletter.chainofthought.show, which it is, of course, always linked in the show notes, or just go to conorbronson.substack.com. Block decided to lay off 40% of their engineering staff. We actually had their VP of AI dev tools on the podcast like a month and a half ago at this point. talking about how they're building their agent infrastructure with their open source agent goose, a lot of what they're doing here, I think is really tied into the philosophy that Michel's talking about. And Block basically said, we are going to redo our company with an inference first mindset, having intelligence at the center. And I think that's what you're getting at here, Michel, is like we all have to reimagine how our organizations work, how we are working, how we're providing data. And I think you're clearly thinking about this this transformation in the right way. And we're going to keep seeing it in public markets as major companies completely reconfigure how they're spending money, whether it's on headcount, whether it's on inference, how they're spending staff. It's causing a lot of disruption and it's just getting started.

[32:44] Michel Tricot:
Yeah. I mean, the thing to think about is whether LLM is the future or not, we don't know what kind of new generation of models are going to come in

[32:53] Conor Bronsdon: [OVERLAP]
on.

[32:53] Michel Tricot: [OVERLAP]
the future. The thing is, We used to rely on CPU as the engine that makes our product work. Today we've shifted to the LLM or the model that makes my product work. And so all the economics are changing now. You buy your CPU, you have a piece of hardware and you can use it until it burns out. An LLM is like a live system, like you need to have your GPUs behind the scene, but in a way, it's like your execution, it becomes your execution platform. And I think that's something that is very, very hard, especially when you have a very well-established business to rethink, which is, I'm not building on top of a CPU, I'm building on top of an LLM that itself is running on top of a GPU, but who cares now about GPUs? I'm kidding.

[33:42] Conor Bronsdon: [OVERLAP]
Yeah, well, we're going to talk TPUs here soon, I'm sure. Let's get back to the design decisions that you've made around agents and providing them context and data. As you've been building this context management for agents, this context store, You mentioned you have an hourly sync cadence as the baseline. That's a very specific design decision. For teams that need fresher data than that, what's the answer? And also, why did you make that decision?

[34:13] Michel Tricot: [OVERLAP]
The moment you need to cache, index information, you're going to have latency in how fresh the data is. So that's just, no matter what type of context are

[34:25] Conor Bronsdon: [OVERLAP]
Table

[34:25] Michel Tricot: [OVERLAP]
you building,

[34:25] Conor Bronsdon: [OVERLAP]
stakes.

[34:25] Michel Tricot:
you have to.

[34:29] Michel Tricot:
The hourly one, I mean, first of all, that's more like the default. People can get to different plans and have like faster ones, but it's also that What you don't want in the example, for example, that I showed is look at too much by just interacting with APIs. Like APIs do a decent job generally in that getting the latest version of the data. So the way we always think about it, and we build our own agents internally, is that we always have this hybrid approach, one where you get the understanding of the world through searches, through discovery. But if you want to get and make sure that you have the most fresh data, well, you can rely on the context. It's generally not too old. But you can also, if you want, just pull that information directly from the API. And APIs are very good at retrieving one object as long as you know what this object is. So, you know, if I'm searching for Conor, if I don't know your ID on Stripe, for example, I have to go through Stripe, all of the users. But if the cache tells me that Conor is ID 123, then I can go to Stripe and say, get me ID 123. And I get the freshest data. So for me, like this hybrid thing is going to work.

[35:47] Conor Bronsdon: [OVERLAP]
Okay, so you're tracking entities across these data sources too. So like the same person

[35:51] Michel Tricot: [OVERLAP]
Correct.

[35:51] Conor Bronsdon: [OVERLAP]
appearing in Salesforce and Zendesk, and maybe I'm in a Gong call too. You're doing that without embeddings though often.

[35:58] Michel Tricot:
Correct. I mean, we have a little bit more logic here, but yes, we have a lot of fuzzy matching, like connections that are created between systems, but that's the idea, yes.

[36:09] Conor Bronsdon:
What drove you to reimagine Erbite's approach in this way? Obviously, Erbite became a unicorn back in, what, 2021? Solving a different problem, this idea of, you know, an open source data pipeline company for, for humans is now being rebuilt around agents. I have to imagine from the incumbency perspective of what we talked about earlier with Salesforce and Google and others having to make these tough decisions. I'm sure you had to make some of those decisions internally too, of like, how do I change the direction of the company? What, what really landed you on, this is where we need to go.

[36:43] Michel Tricot:
Because as everyone is saying, the problem is not in the model today, the problem is in the data that you provide. And we know what it means to solve a data problem because we've solved it for analytics and BI. And the only thing we had to do was really rethink about the personnel that we're addressing. We're not just addressing a human in front of a warehouse, we're also addressing a developer and an agent accessing that data. And so there was a little bit of a shift in terms of adding on top of the product we already have, but it was more like a rethinking the persona, but the data pipes, we already had those. And we know that those actually matters in that era. So I would say our

[37:30] Michel Tricot: [OVERLAP]
rethinking of the strategy when it comes to how do we address that market wasn't as hard as some other company might have, which the whole business is down if they don't pivot. We have, it's actually more like a step up rather than a 180.

[37:48] Conor Bronsdon: [OVERLAP]
So it's more of how do we restructure what we're already doing so agents can access it more easily and just provide a different means of accessing the data that we're already

[37:54] Michel Tricot: [OVERLAP]
Yeah.

[37:54] Conor Bronsdon: [OVERLAP]
pulling together.

[37:56] Michel Tricot:
The big difference is that in the past, we've always been just a pipe. Now, because we want to have an offer that context store, we are also storing pieces of data. So on behalf of customers. So that's, I would say a bit of a different in terms of the business.

[38:12] Conor Bronsdon:
Interesting. Okay. Yeah, and I know you, you know, before coming to Erabyte, you were at LiveRamp where you were running a lot of data ingestion and distribution. And it seems like that experience has informed just this entire first stage of Erabyte and now this maybe second era that appears to be emerging.

[38:33] Michel Tricot:
Yeah, absolutely. I've understood how nothing happens if you don't have data that is flowing between systems. It is so critical. People don't know it's happening, but it's happening every single day. If you don't have the right data, nothing works.

[38:51] Conor Bronsdon:
So if we think a year ahead, and I know that's a long time in AI land, but let's, let's look, let's try to push out for a year. What does, what does success look like here? What, what do you think agents need to be doing and how do they need to be operating differently and hopefully enabled by your context work?

[39:10] Michel Tricot:
Yeah. To me, it's more, it's not so much about agent, but it's more about how we leverage them. That needs to, that's going to evolve is, I forgot who wrote that is just these different layers of agent adoption, where you really start by, you know, if you're a developer, you go on your IDE, you just open up cloud and you just do pair programming with cloud. Then you go to the next level, you go to the next level. What, I want us to, like what I see us doing as a human race in a way, like, and just taking the code as an example, is that we... have shifted our systems to actually enable autonomous and to trust the agent. And it doesn't mean that we should trust, and I think this is more like a mentality shift, like humans have a problem, like trust, like expecting nothing, like they expect nothing else than 100% success from a robot. When ourself, we are flawed and we do like 60% or 70%, we write software with bugs, but we expect an agent to write it without bugs. And so for me, what I'm hoping we get to by probably by the summer, yeah, let's give it three months, is that people realize that in code, it's okay for an agent to make bugs. Like, we're not searching for something that is better than us right now. We're just looking for something that can create massive throughput and can self-correct. And that to me is what we should be looking for. And whatever I can do on my end to enable it, I'll do it.

[40:48] Conor Bronsdon:
Fantastic. Michel, this has been an awesome conversation. I really appreciate you both showcasing some of the work you're doing, talking through your perspective on the space. One final question before I ask you for some links for folks to follow your work. And I know one of those is going to be your agent blueprint on Substack, which you've just started out. I'm curious about your thoughts on memory management versus context management. I'll preview here that I may come out the week before this episode, may come out the week after. I'm doing a deep dive on memory management with Richmond Alake over at Oracle, where we just talk focused on like, how can we help solve agent memory challenges? How do you, do you think about the interrelation between agent memory and context and how teams should be thinking about improving in those areas?

[41:37] Michel Tricot:
It's an interesting question. In my head, I try not to separate things too fast, because at the end of the day, memory is a form of context. Context is just an understanding of the world. That's what it is. And memory is part of your understanding of the world. And if we want to take like human images here, like I have contact, like everything I've learned in school, but I also have memories, which is, yeah, I was a kid and that all of that becomes pieces of information. So I'm trying not to differentiate the two too much. Yes, maybe the access, the, the, the window by which you remember it is different, but at the end of the day, it's just

[42:19] Conor Bronsdon: [OVERLAP]
but recognize how interconnected

[42:21] Michel Tricot: [OVERLAP]
Yeah.

[42:21] Conor Bronsdon:
they are.

[42:22] Michel Tricot:
Uh, it's just, it's data that is going to land somewhere in your agent and you have a way to query it, update it. And, um, and that's what it is. Sorry. I'm looking at it very much from like a data perspective,

[42:33] Conor Bronsdon: [OVERLAP]
No,

[42:33] Michel Tricot: [OVERLAP]
which is

[42:33] Conor Bronsdon: [OVERLAP]
no,

[42:33] Michel Tricot: [OVERLAP]
at

[42:34] Conor Bronsdon: [OVERLAP]
I

[42:34] Michel Tricot: [OVERLAP]
the end of

[42:34] Conor Bronsdon: [OVERLAP]
think

[42:34] Michel Tricot: [OVERLAP]
the

[42:34] Conor Bronsdon: [OVERLAP]
it's

[42:34] Michel Tricot: [OVERLAP]
day,

[42:34] Conor Bronsdon: [OVERLAP]
interesting

[42:35] Michel Tricot: [OVERLAP]
it's

[42:35] Conor Bronsdon:
to

[42:35] Michel Tricot: [OVERLAP]
just

[42:35] Conor Bronsdon: [OVERLAP]
get these

[42:35] Michel Tricot: [OVERLAP]
a

[42:35] Conor Bronsdon: [OVERLAP]
perspectives.

[42:35] Michel Tricot: [OVERLAP]
data.

[42:36] Conor Bronsdon:
Yeah, because,

[42:36] Michel Tricot:
Yeah.

[42:37] Conor Bronsdon:
yeah, really, really interesting. Michel, it's been a fantastic conversation. Like I said, where can folks go to follow your work and learn more about Airbrite's new agent builder and agent connectors?

[42:46] Michel Tricot: [OVERLAP]
Yeah, I mean, I would always start with our website, airby.com and otherwise, as you said, I have a sub stack. I might just give you the link after that. And LinkedIn.

[42:58] Conor Bronsdon: [OVERLAP]
I will link your sub stack. I've read a couple of the pieces actually in my sub stack article about this. So that will

[43:03] Michel Tricot: [OVERLAP]
Nice.

[43:03] Conor Bronsdon: [OVERLAP]
come out. Don't worry.

[43:05] Michel Tricot:
And LinkedIn as well. I'm pretty active over there.

[43:07] Conor Bronsdon: [OVERLAP]
Fantastic.

[43:08] Michel Tricot: [OVERLAP]
Maybe I should be more active on X, but I have limited time.

[43:13] Conor Bronsdon:
Well, Michel, thank you so much for the conversation. Listeners, if you enjoyed this, please let us know in the comments or the rating and review. It means the world to us. We'd love to hear from you and we would love to hear who you think we should have on next. So thank you so much for listening and hope everyone has a great rest of your week.

[43:31] Michel Tricot:
Thank you, Conor.