Chain of Thought | AI Agents, Infrastructure & Engineering | Mastering Multi-Agent Systems

AI agents offer unprecedented power, but mastering agent reliability is the ultimate challenge for agentic systems to actually work in production. Mikiko Chandrashekar, Staff Developer Advocate at MongoDB, whose background spans the entire data-to-AI pipeline, unveils MongoDB's vision as the memory store for agents, supporting complex multi-agent systems from data storage and vector search to debugging chat logs. Chain of Thought is hosted by Conor Bronsdon.

Show Notes

AI agents offer unprecedented power, but mastering agent reliability is the ultimate challenge for agentic systems to actually work in production.

Mikiko Chandrashekar, Staff Developer Advocate at MongoDB, whose background spans the entire data-to-AI pipeline, unveils MongoDB's vision as the memory store for agents, supporting complex multi-agent systems from data storage and vector search to debugging chat logs. She highlights how MongoDB, reinforced by the acquisition of Voyage, empowers developers to build production-scale agents across various industries, from solo projects to major enterprises. This robust data layer is foundational to ensure agent performance and improve the end user experience.

Mikiko advocates for treating agents as software products, applying rigorous engineering best practices to ensure reliability, even for non-deterministic systems. She details MongoDB's unique position to balance GPU/CPU loads and manage data for performance and observability, including Galileo's integrations.

The conversation emphasizes the profound need to rethink observability, evaluations, and guardrails in the era of agents, showcasing Galileo's family of small language models for real-time guardrailing, Luna-2, and Insights Engine for automated failure analysis. Discover how building trustworthiness through systematic evaluation, beyond just "vibe checks," is essential for AI agents to scale and deliver value in high-stakes use cases.

Connect with Chain of Thought host Conor Bronsdon:

Newsletter: https://newsletter.chainofthought.show/
Twitter/X: https://x.com/ConorBronsdon
LinkedIn: https://www.linkedin.com/in/conorbronsdon/
YouTube: https://www.youtube.com/@ConorBronsdon

Follow Today's Guest(s)

Connect with Mikiko on LinkedIn

Follow Mikiko on X/Twitter

Explore Mikiko's YouTube channel

Check out Mikiko's ⁠Substack

Connect with MongoDB on LinkedIn

Connect with MongoDB on YouTube

Check out Galileo

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Try Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Agent Leaderboard

Creators and Guests

Host

Conor Bronsdon

Creator and Host of the Chain of Thought Podcast | Technical Ecosystem Lead at Modular

What is Chain of Thought | AI Agents, Infrastructure & Engineering?

AI is reshaping infrastructure, strategy, and entire industries. Host Conor Bronsdon talks to the engineers, founders, and researchers building breakthrough AI systems about what it actually takes to ship AI in production, where the opportunities lie, and how leaders should think about the strategic bets ahead.

Chain of Thought translates technical depth into actionable insights for builders and decision-makers. New episodes weekly.

Conor Bronsdon is an angel investor in AI and dev tools, Technical Ecosystem Lead at Modular, and previously led growth at AI startups Galileo and LinearB.

Disclaimer: All views, opinions and statements expressed on this account are solely my own and are made in my personal capacity. They do not reflect, and should not be construed as reflecting, the views, positions, or policies of my employer. This account is not affiliated with, authorized by, or endorsed by my employer in any way.

[0:00] Speaker:
If you don't have a way of observing what these agents are doing, if you don't have a way of evaluating them, and if you don't have a way of putting guardrails around their behavior, you just don't have an agentic system that's gonna really survive. Hey, folks. I am Connor Bronson coming to you live from the Galileo offices in San Francisco, which are our offices are about three more weeks before we move to a new location.

[0:27] Speaker:
We are here with Chain of Thought, the podcast for developers building with AI. I'm Connor, the head of developer awareness at Galileo. And we're here for a conversation about how AI is changing, how agents are evolving. And this complexity that has been introduced as we begin to build multi agent systems, as we begin to expand upon more basic reg implementations

[0:54] Speaker:
to massive complex networks of LMs interoperating with one another, self evaluating sometimes, self improving. And there's a new urgent challenge that is being faced by companies around the world, which is reliability. And the ability to have the trust needed to put these AI systems into production. It used to be easy when this was just an experiment, when it was fun, when it was new. And now people are going, Oh, what's the actual customer value? Can we trust us to stay within the guardrails we have? How do we ensure these systems are not just powerful, but also predictable,

[1:29] Speaker:
safe, and trustworthy? So, today we're thrilled to be joined by someone at the very heart of this intersection between data and AI, Mickey Chandrasekar, staff developer advocate at MongoDB, and someone you may know from LinkedIn. Mickey, thank you so much for joining us. So good to be here. Yeah. I'm excited to discuss the agent landscape, agent ops, and so much more that Mongo is at the centre of here within this evolving data and AI ecosystem.

[1:56] Speaker:
So let's maybe start with that vantage point at MongoDB. I know you've only been there a couple months now, but you've obviously been involved in this space for so long. You've got this depth of knowledge about what's happening and working with developers across different industries. And what do you see as how MongoDB is going to push AI and data forward in the next couple of years? Yeah, totally. And to give a little bit of context to,

[2:21] Speaker:
you know, to that question, so, in terms of my background, like, what I like to tell people is that I've essentially had the joy and the opportunity to work at every single stage of the data to ML to now the generative AI pipeline. Of my career was working as a data scientist. Part of my career was working as an ML Ops Engineer. And then, you know, more recently my job has been in terms of developer advocacy and education

[2:51] Speaker:
around how to build production ready systems. Any system that involves whether it's generative AI or whether it's classical machine learning or even analytics. That's kind of been the focus of my career. Bringing all that to my role here to, you know, help advocate for developers that are trying to build these reliable, production ready, generative AI applications and platforms.

[3:23] Speaker:
So that's why I do at Mongo. And one of the reasons why I joined is because, so Mongo already has a really rich history of providing one of the best document data stores in the industry. And when people think documents, they immediately think like PDFs, etcetera. But really, what you know, MongoDB was able to figure out really quickly was how to create a database that essentially, you know, has a polymorphic schema,

[3:50] Speaker:
which allows you to be very flexible with how you design the data that feeds your application and to be able to encompass any kind of data. So that's the story of MongoDB, the V1 of MongoDB. MongoDB has also started moving into how can we enable developers at all stages of maturity to not just excel with their data store, but how can we now then help them leverage that data store to, for example,

[4:28] Speaker:
support Rag applications, to support agent applications, not just from the vector store that we have now on Atlas Search, not just from the memory stores that we also have available now as modules in Landgraf and a couple of our partners. But how can we do all that to help continue supporting developers building the next generation apps. Now, in terms of kind of,

[4:56] Speaker:
you know, MongoDB fits into the industry now is we really see ourselves as the memory store for agentic applications and agentic systems. Totally. Yeah. And in terms of like what that actually means, you know, it's not just storing the data. It's not just storing, you know, vector data for semantic search and similarity in Rag, but it's also storing the chat logs,

[5:22] Speaker:
you know, things to help developers debug, trace any kind of errors, and to figure out how they can continue improving the underlying data, underlying processes to, you know, continue improving that user experience for their applications. Absolutely. And I'll say that's one of the reasons Galileo is really excited to integrate with MongoDB as a data platform,

[5:44] Speaker:
is seeing the incredible stuff being built with Mongo. There's so many cool agents that I'm seeing developers come up with. Are there particular use cases or particular moments of agents in action that you've seen people build with Mongo that get you fired up about where the future is going? Yeah, absolutely. I mean, you know, it's really interesting because I think, a lot of the

[6:06] Speaker:
you see kind of two schools of thought right now in terms of the, the digital sphere world, the world of the internet. And one thought is, you know, that, oh, agents are this, like, over hyped fake thing that people talk about building, but don't actually build. Meanwhile, there are tons of companies that are building production scale agents. So there's a number of customer stories on the MongoDB site

[6:37] Speaker:
as well. But we've seen builds from both really small startups. We have a few of those partners on our site. They're built on MongoDB. We also have a lot of blue chip, well known companies that have also started building agentic systems with Atlas specifically. And more recently, MongoDB announced the acquisition of Voyage, which hosts a number of very high performing

[7:09] Speaker:
embedding and re ranking models. And bringing them into, you know, the same house means that, you know, any team, whether you're a solo developer that's building their, like, first million dollar app, or whether you're a team at an existing, like, twenty year old company telecom or in finance, in medical, health care, you name it, they can all build, like, very similarly performing agentic systems.

[7:37] Speaker:
It's interesting that you bring up similarity in performance because one of the big challenges we're seeing with agents and AI systems in general is reliability. Obviously, nondeterminism is an incredible tool, an incredible opportunity of these systems, but it comes with risks about trustworthiness, getting to production consistently, and actually delivering on the promise of AI and the opportunity therein.

[8:06] Speaker:
What has been the approach for Mongo and yourself around how to improve the reliability of these systems, especially with the incredible data layer that Mongo provides? Yeah. Absolutely. And, you it's really interesting. So I I recently gave a lightning talk at the AI engineer world's fair, It was about essentially how to solve memory for multi agent systems. So, I had taken a poll and I asked everyone, Yo, raise your hands if you

[8:33] Speaker:
are building an agent. Okay. So basically 80% of people raise their hands. I said, Raise your hands if you, like, have an agent. Right? Or multiple agents in production. And about those same people roughly raised their hands. Maybe there's one or two less. And then I asked them like, how many of you have gotten your agents to work, like, straight off the bat, a 100% of the time? No one raised their hands. No one at all.

[8:58] Speaker:
You know? So to me that kind of it's a good cross section, especially because that conference, it represents the builders who are building at the edge. So to me that was very reflective of like, what a lot of teams are experiencing right now, which is that even if, for example, you have an agent that does a structured workflow, sometimes the answers it gives are going to be different. It might be one out of 100 runs,

[9:29] Speaker:
or it could be one out of 20 runs depending on how complex the task you have given that agent. So, I think that's a really, really big challenge because I think your users and customers of whatever app or product you build, they have such high standards now for the customer, like the user experience. And most importantly, if you're dealing with really sensitive data so for example,

[9:53] Speaker:
if you're dealing with an agent that internally helps put together financial reports, financial analyses, competitive intelligence well, intelligence is important, but actually financial reports. That is a huge thing that if you're a public company, can't get it wrong. Or even we've seen some embarrassing examples I've recently seen. So right now it's summer and

[10:20] Speaker:
every publication in the world is pointing out there, you know, here's our top 50 summer reading list. Something that happened apparently was that a few of those summer reading lists that came out of reputable sites and media channels were fake. The books did not exist. Wow. The books did not exist, and the reading lists are still published. And so when people went to go look at these books, they couldn't find anything. It's like how I mean, that's a relatively trivial example in that, you know, no one's,

[10:53] Speaker:
Yeah. The danger No one's actively harmed, but the reputation risk there. You know, those sites did actually take a ding on reputation because the common feedback people had wasn't necessarily that they used generative AI, for example, to write the copy, but was that no one did the editorial review to check if these books actually existed. So that's a relatively trivial example. Everyone knows the the story about the the guy who was able to get a, I think, a Ford truck for a dollar. Yeah.

[11:25] Speaker:
A lead lead support pod. Thank you. Lovely support pod. Fun story. You know, but what about, for example, a more serious case where you have an agent that is helping to put together initial diagnoses or doing triaging for, people with severe health conditions. Like, that's a or agents that put together that do underwriting for loans for insurance. Like, those are very serious

[11:53] Speaker:
cases where the consequences can be disastrous for people. Absolutely. And I will say, feel like I'd be remiss if I didn't mention, Galileo's wonderful case study with Majid, where we talk about how to solve these newsroom challenges. So go check that on our website at galileo.ai. But, hallucination is both a feature and a bug Yeah. When it comes to LLMs because we want them to create new. We want them to try new things. We want them to be able to think their way out of problems, which is form a of hallucination.

[12:24] Speaker:
But when it happens in the wrong areas, when it happens outside of the the guardrails we've tried to set Yeah. When it happens in a way that affects people's lives negatively, it's a huge issue. And there's a lot of approaches being considered about how to solve this. Obviously, we focus on the, like, observation, evaluation, guard rating side of things. And I know Mongo is thinking a lot about agent ops as well, and how do the systems operate. Can you tell me a bit about

[12:50] Speaker:
the approach to agent ops and what your thinking is and how it's evolving? Yeah, absolutely. So it's to me, it's fascinating. So, you know, before the world of General AI came about, I was working in ML Ops, right? Way So let me approach it from kind of like two angles. So the first angle is in terms of I want to address some interesting trends that I see around,

[13:25] Speaker:
once again, the building of agents in the world. So there's one school of thought that is sort of like, well, you know, agents, because LMs are nondeterministic and all solar stuff, you should kind of just let them do their thing, and you should treat them as the wonderful magical puppets that they are. Like, box. Let it go. Do their thing. Right? And then it's okay. We'll catch the errors later and then kind of figure out something out on the, application layer. Right?

[13:53] Speaker:
But the other school of thought that I personally follow and hold on to is that agents are a software product. We have best practices that we've established with traditional software and applications and software engineering best practices there. And yeah, some things will change a bit with agents, but we should still approach them with a certain rigor. We should approach them with an understanding of, like, we have these best practices that we, for example, built out in the world of MLOps.

[14:29] Speaker:
Let's see how we can adapt them to agents, but let's not get away from the fact that they are still software products. Still code based. So in terms of, for example, how we're approaching it on the MongoDB side, of the things that we're really good at is data. You know, it's storing data. It's helping developers access data. It's helping them search it, helping them to organize it.

[14:57] Speaker:
So that's one of the first ways we're approaching it is how can we store all the data that you need, not just to feed your applications, but also to help you understand how your agents are performing, what are the conversations that they're having. How can you then, for example, plug into observability and evaluation providers to then be able to understand them and do those traces.

[15:23] Speaker:
There's a few folks that are pretty, I would say, great voices or great advocates for that school of thought. So, there's Hamil Hussein and I think Shreya Shankar. And, you know, I love all the stuff that they produce because they advocate for that rigor. I agree. For that value approach. So that's one way MongoDB is approaching is we do data really well. Let's

[15:49] Speaker:
bring that excellence and expertise to developers, and then let's figure out how to build tools around it so that they can kind of leverage all those different sort of abilities, data store, a vector store, a memory store. Yeah. And then I'd say the second thing that we're doing, and this is related to both Voyage and a lot of the recent improvements and features we shipped around Atlas.

[16:18] Speaker:
For example, embedding models are obviously very important if you're doing sort of a rag style workflow. But also re ranking models are really important because you want the ability to, you know, feed the best documents, the best matches, you want to bring that quality in for what you feed into your rag system. Right? So that's another way that we're approaching it.

[16:48] Speaker:
Alasearch team has shipped some really amazing features. So now developers, for example, can not just do full text search, not just do semantic search, but also they can do hybrid search where they combine the best of full text and semantic to create even, like, better better pipelines. I love your perspective here because I completely agree with you. We can't always just let agents run amok.

[17:17] Speaker:
We have best practices we can place. We have things we can take from prior eras of software engineering. And, you know, I I know there's this hype that software engineering is going away. And maybe someday it will. But I don't think it's happening anytime soon. I think we should be leveraging these best practices. I think we should take the learnings we already have and apply them to this current era.

[17:39] Speaker:
And I'll I'll say Hamill is actually coming on the podcast So very very excited to talk to him and learn quite a bit because I I feel like every time I talk to him, I I just engage in so much knowledge and approach there. And he's actually contributed to some of how we've designed some of our new features on our platform as we've thought through this problem too.

[18:00] Speaker:
You know, one of the features we we recently launched, that we're we're very excited to have working with agents that are leveraging MongoDB are some some new interfaces to help with the debugging understanding and identification of problems in multi agent systems, including, you know, a graph view to trace, okay, like, what actually got happened within this agentic

[18:21] Speaker:
work through, or this agentic workflow. And then, you know, a timeline view where we can look at multi agents together and say, okay, like, who was communicating with who? When was it happening? Which tools were getting called when? It can really help with debugging and understanding where there are challenges, which you wouldn't need if you took out the brain amok, I suppose. But, again, don't think you should do that. And then a message view, so you can really dive into and unpack

[18:47] Speaker:
a particular string of messages and and understand the actual chain of thought of of what occurred. And I think it's so crucial to create the reliability that we're talking about in order to enable enterprises to really go to production because, you know, we can't risk people's livelihoods or their health. And yet there are so many drudgery tasks that we can automate. There are so many

[19:11] Speaker:
hours of human time that we can free up for creativity if we apply these systems in the right way. And I'm curious from your perspective, what are the features that you would wanna see on a reliability side? Like if you could just wave a magic wand and make us build something, like what would it be that you would want us to build to help support this ecosystem?

[19:31] Speaker:
Oh, that's so interesting. So interesting. Let me think about that. So, in preparation for some of the research and talks that I've been doing for Mongo, especially around multi agent systems, what's so really interesting is that it's not that people aren't building multi agent systems. Whenever people think, oh, this thing will happen in the next two years, I can guarantee you it's already happening now. Someone in the world is building it. But I think the difference between people who are building the products and the systems of the future

[20:14] Speaker:
versus the people who are building what is available now, there's just this gulf of experience and knowledge and practices. And sometimes, you know, it takes a while for the rest of the world to keep up. So, for example, multi agent systems, right, in production. So it's just now that I think we're starting to see papers come out that are really you know, they've taken that second school of thought, which is that if agents are

[20:43] Speaker:
software engineering products, products, can we apply, for example, scientific analysis to understand all the ways that they break? So, there's a couple of those papers out. Some of them I had referenced in my talk at the conference. One of them was why multi agent systems fail, which I thought was great. What the paper tries to do is it tries to provide a taxonomy

[21:07] Speaker:
of potential failures in multi agent systems. So I think a lot of the focus around failure, oddly enough, has been around single agent systems. You know, and in thinking about memory, thinking about, like, memory for single agents, people kind of focus on, like, okay, so you need, like, you need a data store, then you need, like, a vector store, and then, you have a chat store where you save stuff, and then like, that's it.

[21:37] Speaker:
And a lot of this, like for example, chain of thought reasoning has been about improving a single agent. But what we're seeing now actually is that people are building multi agent systems. I think that's the advice and best practice that's developing now is instead of having agents try to do a single agent try to do a lot of complex tasks, have like a system of small agents, each one that is equipped to do a, you know, one

[22:06] Speaker:
to three tasks. Make sure that you have criteria for how to evaluate the performance of those agents. And then you look at that team or that coalition or what have you holistically. Right? Where so so going back to that paper, Why Multi Agents Fail, that I thought was so good. And there's a few other papers that I'm happy to listen, share with people Was as that

[22:42] Speaker:
what was the implication was one, that a lot of these failures are actually pretty predictable, and you can you could kind of classify them in this taxonomy. And then I think the second part was that all the stuff around how to make a single agent better, we need to kind of extend it on how to make a system of multiple agents better. Two types of memory concepts that

[23:09] Speaker:
we're starting to really think about and that's starting to come out in papers is, for example, having like a skills library and having like a concept of like a blackboard Blackboard memory. Yeah. So Blackboard memory is where you have agents kind of come together to post like partial solutions, and essentially it's read write, and they kind of can pick up the the,

[23:32] Speaker:
you know, the trail and each can kind of bring their own unique sort of expertise or focus to help solve that problem. And then a skills library is the okay, once a pattern has been established, it's a little different from a cache because a cache is like you store a query result, right, that you can kind of fetch. But with skills library, it's like once agents, once your multi agent system has like

[23:58] Speaker:
figured something out, instead of having them refigured out each time, you essentially save that pattern to this like skills library. So going back to the question of like what I would love to see in like an evaluation observability framework is I'd like to start seeing a capture of those kinds of traces in something that's native to like those types of memory concepts, because I think those are going be really important. I don't think people are really talking about it. We've seen examples

[24:32] Speaker:
of people doing some kind of like Canvas, Blackboard style thing. But that's what at least what I predict is that, you know, all the existing understanding of memory for single agent systems that happen you happen to have a group of them in a system we need to evolve that to think about multi agent systems where these agents are working together in tandem and then to have

[25:03] Speaker:
native tracing and observability and evaluation on those agents. Because that becomes very different, right? Instead of having each agent is compared to the same criteria, instead, like each agent has a different criteria that is focused on like their kind of specific skill set. And I also kind of see people playing around with different types of agent architectures.

[25:27] Speaker:
So most of the time we think of agents in a cooperative sense. But for example, what if you had, competitive agents? Yeah. It works really well actually sometimes. Yeah. You know, so that's that's what I would that's my wish list. I I love this. And I particularly like it because it directly relates back to something you said earlier, which is, hey, there are lot of concepts from software engineering we can apply to agents.

[25:51] Speaker:
And I would even extend that and say, there are a lot of concepts from good organizational development and good management that we can apply to agents. Like, so you mentioned earlier breaking down agentic systems from, oh, this one agent that does everything to here are ten, twenty, however many agents that are doing small tasks together as part of a team. Oh, you know, teamwork, never talked about that at all. For that matter, look at software engineering. We've never taken epics and broken them down into stories Yeah. Broken them down into sprints,

[26:19] Speaker:
and, put those on on a on a board and said, okay, great. Like, here's a task we're gonna do here. That that certainly is not a concept we can understand. We don't have to break large PRs down to small PRs in order to actually get them reviewed and done. Wild wild that we have to do that. Specialization, you talked about that too. Of and, I mean, clearly, that's not something we do within the economy or within teams.

[26:37] Speaker:
We certainly don't specify and diversify and customize our goals for those specialized people either. They don't have different goals based on their job functions, maybe. For example, like, not something we can apply to agents. Definitely can't change the criteria for those folks. You wouldn't maybe, I don't know, think of us like a sales agent, which is a crazy thought. And like a software agent is like having different goals. Just a just a weird

[27:00] Speaker:
example. Absolutely. And I mean, like, you think about too, like, let's say for example, you have, you let's say for example, you have an agentic system, that is focused on, real estate. So essentially an agentic system that will help not just couples, you know, or individual people with dogs. What's funny, I actually had worked on, I tried working on my own real estate tech startup. Oh, interesting. Yeah. This was early on my a little bit earlier on my career,

[27:29] Speaker:
I was like one of five people plus a few contractors. And I was working on like the the data engineering and data architecture and pipelines and all that. Data in real estate is wild. But as part of that, you know, we had to read how much these papers and this one paper I read was that the biggest reason for single people to buy a house was because they had a dog.

[28:00] Speaker:
That was one of strongest indicators of someone who was single. Euphorispray. Interesting. Was to buy a dog. I always think about that example. You have real estate anyway. You have a real estate system that is helping people find homes, but it's also helping real estate brokers find listings that they can then sell to these people. And let's say that's like a multi agent system, right?

[28:26] Speaker:
What you could do, for example, is you can then start AB testing and experimenting with each of these different agent personas. So let's say, for example, you have a listing research agent. All it does is go crawl different, real estate listings or and MLS is another place where, you could get some of this listing data. Or Or, you know, in that agent example, maybe somewhere like MongoDB might have that data. It's a weird weird place to put it. I don't know. Yeah. But let's but you let's say, for example, you wanna test out different language models.

[28:59] Speaker:
You wanna test out different reasoning models. You wanna test out different styles of prompts. So you have five or six different of these research listing agents. You could essentially do this parameter fine tuning. You can experiment with different prompts. Obviously the data for the real estate listings you could store in Atlas for MongoDB. And then essentially like you have a

[29:27] Speaker:
way to, you know, pick who is the best like configuration, right? Who is the best type of agent? In order to do that, you need to be able to observe the performance, you need to be able to trace it, you need to understand why was the performance better with these agents. And then you can kind of, like, pick the the golden star. Yeah. You know? And I think too much of that decision making has been purely qualitative so far. Whereas we want both quantitative benchmarks and qualitative feedback. And it's really when that comes together that you see highly customized, highly successful systems. Yeah. Absolutely. Obviously, we both care deeply, Mickey, about reliable AI systems and enabling developers to build

[30:13] Speaker:
this bevy of agents that they're not only building already, but building in more complex systems and have these deep reliability needs. How does Galileo's integration with MongoDB help enable developers around the world to build more trustworthy and reliable AI agents, more trustworthy and reliable AI agents? Yeah. Absolutely. I mean, so what it comes down to is I think there's,

[30:39] Speaker:
some important core components of a production ready, reliable agentic system. MongoDB, we cover essentially the data store for applications, both small and if people want to scale it to worldwide, we definitely cover that. We also have the vector store for semantic search, and we have the memory store. So that's all great. But at the end the day, going back to these are still products, these are still software products that are going to be released out into the wild and interacting with real people. They also might be interacting with other

[31:16] Speaker:
agents, but ideally they're interacting with real people who, you know, have very real kind of concerns and expectations when it comes to the products that they interact with and the kind of experience that they want. And I think if you don't have a way of observing what these agents are doing, you don't have a way of evaluating them, and if you don't have a way of putting guardrails around their behavior,

[31:44] Speaker:
you just don't have an agentic system that's going to really survive. And it's quite possible that, you know, your company and product brand could also take a big hit. So, you know, I see the tools that Galleo offers as part of its platform, really helping ensure that, one, people can build the agentic systems that they need to, that they can scale these systems,

[32:13] Speaker:
and they can do so in a way that continues to not just improve the user experience, but that can help continue to like grow the companies and the brands that are building these products. So it really it's enabling trust, I think, by developers in the systems that they're building. Completely agreed. And I think that's why it's so important that companies like Mongo and Galileo also align to open standards like OpenTelemetry

[32:40] Speaker:
so that it's easy for us to work together and easier for developers to leverage data across these different systems. How do metrics in particular for agents factor in? Obviously, something where we've developed some agent metrics. Do you view that as an important cornerstone of what AI builders should be thinking about as they evaluate and absorb these systems. Yeah. Absolutely. I mean, so

[33:06] Speaker:
part part of my life, I did competitive bodybuilding. And, you know, and I I'm also, generally speaking, a very growth minded person. And I think the phrase, forgot exactly how it goes, but it's the you you can't improve what you can't measure. Yeah. That's that's I think about that a lot. What's measured is managed. There's a lot of these little Yeah. So, I think

[33:34] Speaker:
vibe checks are fine, but I think at a certain point, you do need to put quantitative metrics and measures in place. Because, so what it comes down to is, for most people, they are building agentic systems for a company, and most of the time it's probably a for profit company or a for profit business. Know, an agentic system, it's not a silver bullet to building a profitable

[34:09] Speaker:
company. But being able to figure out what's working and then being able to make those decisions, to be empowered to say like, okay, so this agentic workflow or this agentic feature seems to be doing really well. This one doesn't. So let's really kind of prioritize what's working and then let's deal with, you know, what isn't. I think that's just that's crucial.

[34:35] Speaker:
You know, I think people need both, you know. And I think this idea that because agents are powered by LMs or, you know, they can also be powered by multimodal models. We definitely have use cases for multimodal agents at MongoDB that we've seen. But this idea that because an LM or a VLM or whatever is like the reasoning part of an agent, and therefore you can

[35:03] Speaker:
you don't have to use measure like metrics to measure it because it's language, I think is totally false. I I completely agree. It's I think we get too wrapped up in this piece of like, oh, we're language prompting and we're using NLP. You know, I'm a native English speaker, but English is not a very clear language. I can say things in many different ways. It

[35:28] Speaker:
requires context to understand often. Numbers are different if you apply context to them. And same is true of code. And that's why I think it's important to have, like, code based metrics because there are are other languages out there that are have more clarity. And particularly as we look at, like, human language and how it's evolved, it's the contextual elements of it are are so diverse depending on what culture you're you're sitting in, what type of language you're sitting in. And so

[35:57] Speaker:
Yeah. Using that as a a prism to to look at, you know, software and agents, we have to understand the implications of how we apply these vibe based, purely natural language pieces of feedback versus also having context via numbers and providing more systematic, more scientific approaches. Yeah. Absolutely. And you see this too, especially with coding agents or copilot style agents Totally. Where the output is code,

[36:26] Speaker:
you can definitely measure certain things, for example, like accuracy, efficiency, lines of code. Lines of code is a terrible measure for the record. It is a terrible measure. Absolutely. But certain things, for example, like the runtime efficiency of, like, code, all other stuff, like, you can and, like, for example, the number of times someone accepts a suggestion or things like that, that those can absolutely be measured.

[36:49] Speaker:
Yeah. Lines of code is terrible. Yeah. I I used to work at Linear B and we're a company that helps doing metrics for software engineering particularly. And they that was, like, the number one thing the founders like, anyone who said lines of code, they're like, oh, I'm gonna get them. I know. I know. It's it's it lines of code is a it's a very, how should I put this?

[37:14] Speaker:
It's a it's a it's a hot metric. Life's come up a lot. And I think the the typical pushback is always like, well, you know, as you get more experienced in your, technical leadership track, lines of code is not what you should be measuring. It should be impact, and sometimes that impact does not encode, and all that good stuff. But but but other things, for example, like within coding agents, like, should actually those should absolutely be enumerated.

[37:39] Speaker:
Yeah. Mickey, thank you for an incredibly insightful conversation. It's been so fun having a chance to to hear your viewpoint here, and I can't wait to see what you get up to at Mongo and what's next for you. It's clear that as the power of agents grows and as more multi agent systems evolve, the need for a dedicated reliability framework that works hand in hand with data platforms like the incredible work we've done at MongoDB

[38:04] Speaker:
isn't just a nice to have. It's an absolute necessity for anyone serious about productionizing AI. So thank thank you so much for sharing your perspective and and your research and your knowledge with us today. Thanks for coming on to talk about MongoDB and Galileo's partnership. Yeah, thanks for having me. I mean, we're super excited about Gallio's new agent reliability platform.

[38:22] Speaker:
I mean, there's just so much opportunity for MongoDB and Gallio to just continue deepening our partnership and, you know, enabling AI builders to scale with MongoDB and Gallo together. Absolutely agreed. And I know a lot of our listeners would love to follow along with your continued work. Where can they find you on the internet? Yeah, absolutely. So if they want to get in touch with me, you know, please feel free to do so on LinkedIn,

[38:48] Speaker:
Twitter, YouTube. I also I do have a Substack as well. I'm I'll share the links. Perfect. We'll link them. Absolutely. And in terms of other work that's going on, so we have an amazing developer relations, developer advocacy team at MongoDB. I have a number of my colleagues, Richmond Alec, Apurva, Jesse Hall, you name it, Anaya. So all these folks are really working on

[39:17] Speaker:
giving developers the tools that they need to build these systems. So I would encourage people to follow the MongoDB LinkedIn account. MongoDB has a YouTube channel as well, and we're present on Medium, Dev. Two, all that stuff. So everything that we'll be producing, including our thought leadership pieces around agent ops, around solving memory for multi agent systems, will be on those channels. Fantastic.

[39:44] Speaker:
For our listeners, you can learn more about building next generation AI applications with MongoDB and Galileo out at Galileo's new free AI reliability platform available at galileo.ai. Check it out. Try it out. Give us some feedback. Build a great application. Build a great agent. Leverage Mongo as your data store, and, let us know how we can improve. We'll have links to everything we discussed today in the show notes. That's all for this episode of Chain of Thought. Don't forget to share it with your friends if you enjoyed it. And, Micky, thanks again for coming on the show. Absolutely. Thanks for having me.

Chain of Thought | AI Agents, Infrastructure & Engineering

More episodes

Chapters

Show Notes

Creators and Guests

What is Chain of Thought | AI Agents, Infrastructure & Engineering?