Chain of Thought | AI Agents, Infrastructure & Engineering

Unlocking AI agents for knowledge work automation and scaling intelligent, multi-agent systems within enterprises fundamentally requires measurability, reliability, and trust.João Moura, founder & CEO of CrewAI, joins Galileo’s Conor Bronsdon and Vikram Chatterji to unpack and define the emerging AI agent stack. They explore how enterprises are moving beyond initial curiosity to tackle critical questions around provisioning, authentication, and measurement for hundreds or thousands of agents in production. The discussion highlights a crucial "gold rush" among middleware providers, all racing to standardize the orchestration and frameworks needed for seamless agent deployment and interoperability. This new era demands a re-evaluation of everything from cloud choices to communication protocols as agents reshape the market.João and Vikram then dive into the complexities of building for non-deterministic multi-agent systems, emphasizing the challenges of increased failure modes and the need for rigorous testing beyond traditional software. They detail how CrewAI is democratizing agent access with a focus on orchestration, while Galileo provides the essential reliability platform, offering advanced evaluation, observability, and automated feedback loops. From specific use cases in financial services to the re-emergence of core data science principles, discover how companies are building trustworthy, high-quality AI products and prepare for the coming agent marketplace. Chapters:00:00 Introduction and Guest Welcome02:04 Defining the AI Agent Stack03:49 Challenges in Building AI Agents05:52 The Future of AI Agent Marketplaces06:59 Infrastructure and Protocols09:05 Interoperability and Flexibility20:18 Governance and Security Concerns24:12 Industry Adoption and Use Cases25:57 Unlocking Faster Development with Success Metrics28:40 Challenges in Managing Complex Systems30:10 Introducing the Insights Engine30:33 The Importance of Observability and Control32:33 Democratizing Access with No-Code Tools35:39 Ensuring Quality and Reliability in Production41:08 Future of Agentic Systems and Industry TransformationFollow the hostsFollow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Atin⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Conor⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Vikram⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠Yash⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Follow Today's Guest(s)Joao Moura: LinkedIn | X/TwitterCrewAI: crewai.com | X/Twitter Check out Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Try Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Agent Leaderboard

Show Notes

Unlocking AI agents for knowledge work automation and scaling intelligent, multi-agent systems within enterprises fundamentally requires measurability, reliability, and trust.

João Moura, founder & CEO of CrewAI, joins Galileo’s Conor Bronsdon and Vikram Chatterji to unpack and define the emerging AI agent stack. They explore how enterprises are moving beyond initial curiosity to tackle critical questions around provisioning, authentication, and measurement for hundreds or thousands of agents in production. The discussion highlights a crucial "gold rush" among middleware providers, all racing to standardize the orchestration and frameworks needed for seamless agent deployment and interoperability. This new era demands a re-evaluation of everything from cloud choices to communication protocols as agents reshape the market.

João and Vikram then dive into the complexities of building for non-deterministic multi-agent systems, emphasizing the challenges of increased failure modes and the need for rigorous testing beyond traditional software. They detail how CrewAI is democratizing agent access with a focus on orchestration, while Galileo provides the essential reliability platform, offering advanced evaluation, observability, and automated feedback loops. From specific use cases in financial services to the re-emergence of core data science principles, discover how companies are building trustworthy, high-quality AI products and prepare for the coming agent marketplace.


Chapters:

00:00 Introduction and Guest Welcome

02:04 Defining the AI Agent Stack

03:49 Challenges in Building AI Agents

05:52 The Future of AI Agent Marketplaces

06:59 Infrastructure and Protocols

09:05 Interoperability and Flexibility

20:18 Governance and Security Concerns

24:12 Industry Adoption and Use Cases

25:57 Unlocking Faster Development with Success Metrics

28:40 Challenges in Managing Complex Systems

30:10 Introducing the Insights Engine

30:33 The Importance of Observability and Control

32:33 Democratizing Access with No-Code Tools

35:39 Ensuring Quality and Reliability in Production

41:08 Future of Agentic Systems and Industry Transformation


Follow the hosts

Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Atin⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Conor⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Vikram⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Follow⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠Yash⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠


Follow Today's Guest(s)

Joao Moura: LinkedIn | X/Twitter

CrewAI: crewai.com | X/Twitter


Check out Galileo

⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Try Galileo⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

⁠⁠⁠⁠⁠⁠⁠⁠Agent Leaderboard

What is Chain of Thought | AI Agents, Infrastructure & Engineering?

AI is reshaping infrastructure, strategy, and entire industries. Host Conor Bronsdon talks to the engineers, founders, and researchers building breakthrough AI systems about what it actually takes to ship AI in production, where the opportunities lie, and how leaders should think about the strategic bets ahead.

Chain of Thought translates technical depth into actionable insights for builders and decision-makers. New episodes weekly.

Conor Bronsdon is an angel investor in AI and dev tools, Technical Ecosystem Lead at Modular, and previously led growth at AI startups Galileo and LinearB.

Disclaimer: All views, opinions and statements expressed on this account are solely my own and are made in my personal capacity. They do not reflect, and should not be construed as reflecting, the views, positions, or policies of Modular. This account is not affiliated with, authorized by, or endorsed by Modular in any way.

[0:00] Speaker:
I wanna say it's it's shaking up everything. It's shaking up the stacks. It's shaking up the protocols. And maybe I'm going too far here, but it's it's shaking up the market. A bunch of, like, companies that were quote, unquote coasting, like, and now everyone is, like, in the trenches and, no, no, no. This is it. This is the real deal. Like, this thing can change the market in a way that's meaningful.

[0:29] Conor Bronsdon:
Welcome back to Chain of Thought, everyone. I am your host, Conor Bronsden, and I'm delighted to have my cohost, Vikram Chatterjee, cofounder and CEO of Galileo. Vikram, great to see you on this Friday. Thanks, Conor. Happy Friday to you. Yeah. And it's an especially exciting conversation we're about to have because we are lucky to have Joe Mora, the founder and CEO of Crew AI joining us. Joe, so good to see you.

[0:52] Speaker:
Hey there. Thank you for having me. Very excited to be here and chatting today.

[0:56] Conor Bronsdon:
So many stuff for us to go over. Oh my god. No kidding. The AI agent space is just undeniably electric right now. We've got people trying to create new frameworks. We have enterprises scrambling to find their agent strategies, as I know a lot of folks are talking to you about. Developers are, I think, both excited and maybe a little overwhelmed by this year's pace of innovation.

[1:18] Conor Bronsdon:
And you, Joe, are right in the eye of that storm building Crew AI. So we're very keen to talk to you today and to discuss how to make great AI agents, whether it's at a startup, whether it's at an enterprise, and to explore this emerging AI stack, what it looks like, where we're headed, how companies can build reliable agents that deliver on the promise of customer value.

[1:41] Conor Bronsdon:
And we also want to get your thoughts on interoperability. There's so many different frameworks being discussed, whether that's Google's A2A, agent the agency effort with Cisco that we're both a part of, governance, and broadly how agentic workflows may reshape how software works and what we're doing on a day to day basis. So let's start with this broad piece of

[2:04] Conor Bronsdon:
what is the AI agent stack, and how would you define it and its core components today?

[2:12] Speaker:
I love that. That's such a great question. And by the way, given the scope of what we're covering, it's gonna be a five hour episode. This is gonna be pretty chill. Yeah. I hope you don't have a hard stop here. No. But I I I gotta say that this the idea of a stack is something that I I started thinking about after I actually spent three days in kinda like another company off-site.

[2:34] Speaker:
This company is huge, right? It was like 900 people in there and I was invited because they were a partner and we're about to coach them on sell Kure AI so they could resell us, our enterprise, kind of like licensing. And in there, I was I was talking with these people and I was trying to understand like, alright, like these these guys are closing these insane

[2:56] Speaker:
deals, like $20,000,000 deals. And I was trying to understand like, what is that product? And I started to realize a little bit about their product, and then I started to look at agents, and I was like, well, that's very similar. And you if you look at the market right now, it's actually it is very similar. Like, you you look at all these companies, they usually start on this

[3:18] Speaker:
discovery process where they have this, like this desire to get enabled or educated, and with that comes a bunch of interesting technical questions. Like, oh, memory, or how about graphs versus events, and all the open source projects. And now those are interesting questions, but honestly, they are a commodity. Like, yes, you're getting memory. There's 20 different ways that you can do that. Like, that doesn't really matter at the end of the day. But once those companies start to think about,

[3:51] Speaker:
alright, we're gonna be agentic native, right, and that's just a fancy way of saying, like, I'm gonna have hundreds or maybe thousands of agents running in production in x year's time, then the real questions pop up, and you start to thinking about, well, how am I gonna provision them? How am I gonna think about authentication? How am gonna evaluate them? How am I gonna measure them? And and that,

[4:12] Speaker:
for me goes into this idea that it's not like one thing like this companies will need a suite and is a suite because of the stack is that is many different things. And it's for all the genetic resources, not only the agents, The agents, the tool, the authentication, the authorizations, and everything's in between. So I think it starts on the bottom on the, you know, like, data management level. So your Databricks or Snowflake or BigQuery, and then you work away

[4:39] Speaker:
through MLM orchestration, authentication, connectors, and stops at, like, this idea of agentic apps where you have a natural UI that you can actually interface with these agents. But, sorry. I'm gonna leave at that because it's something that we can talk about, but that alone is a huge topic. I'm also curious, Joe, when you think about stacks generally, right, with the if you if you go back to cloud time frame, it goes back to that IPA framework and sense that there's infrastructure, this platform, and then all of that allows you to build applications on top. We are starting to hear a lot from the the cloud providers, as well as a bunch of others around how this IPA framework almost changes for agents. Like the

[5:25] Speaker:
infrastructure that you need for this, it's obviously there's the there's the there's the compute side of things, but then there's also the whole idea of like, how do you actually build out these these agents at scale very soon? And then there's a platform side of this where Crew plays a very big role there, but how do you help people build out these agents quickly? But to your point, the platform piece also starts to include what does auth look like, and what does the communication protocols look like.

[5:49] Speaker:
How much of the how much of the stack, the IPA stack, so to speak, and you think is gonna stay the same as what it was for the cloud versus it's gonna have to change fundamentally for multi agentic systems, which I personally think is, like, literally, by the end of twenty twenty five, you're gonna have those agent marketplaces. We're gonna all be talking about

[6:07] Speaker:
don't recreate your own, I don't know, agent for travel booking. Just use these five agents out there. And so there's gonna be a marketplace. What what changes and what doesn't change from a stack perspective, you think? I mean, that's that's happening. I mean, heard I I heard it's happening. It's coming. There's many people like, we have our own marketplace by now when the episode comes out, but there's other companies they're also launching their own. So I think it's it's right on the corner. Yeah. If I got to choose,

[6:36] Speaker:
I wish more stayed the same. I'm a simple kinda, like, guy. I like protocols that work, that have been around for forever. Like, HTTP and RAST is just, like, it's my bread and butter. But I think, like, honestly, there's a there's definitely a push for new pieces out there. Right? And and that is changing a little bit. So I think some of the infrastructure remains the same. Yes. To some extent. But we're seeing even, like, companies questioning their infrastructure choices.

[7:04] Speaker:
A lot of companies calling back from the cloud. Like Yeah. I don't want a SaaS. I wanna self host. And some companies are like, I don't wanna self host. I want on prem. I want a physical server. Like, you run some of this math and there are companies that are batting on this and this is the way that things are gonna go, I honestly I I'm not a 100% sure that one way or the other yet, but there are big companies batting that, like, on prem and physical servers are gonna be the thing for the next ten years, because, like, if you're running this at a huge scale and you can save 60%

[7:36] Speaker:
by running this cycle, like, in a physical server, it might justify you doing it. I'm not sure yet if that's the way that things are gonna go, but I wanna say it's it's shaking up everything. It's shaking up the stacks, it's shaking up, like, the preexisting, it's shaking up the protocols with all the new protocols coming. And and honestly, and maybe I'm going too far here, but it's it's shaking up the market.

[7:59] Speaker:
I mean, our you look out there, a bunch of, like, companies that were quote, unquote coasting, like you're winning, like you you don't come out of the woodwork like like like the Sergei, right, from Google. When you saw that guy like come up in like in the news and give interviews like it has been forever like and now everyone is, like, in the trenches and, like, no. No. No. This is it. This is the real deal. Like, this thing can change the market in a way that's meaningful so everyone is out there. You see Benioff out there. You see, like, all the big players getting in the dirt. So to your point, going back to your questions, I think there's a lot a lot of the things that will change.

[8:37] Speaker:
I think the new protocols definitely MCP and a two a kinda, like, had a a a head start, and that's kinda, like, pretty interesting. I think MCP has, like, a wants to expand now to also include agents. So it's very clearly gonna be, like, AWS against, like, Microsoft and Google, it seems, kinda, like, battling for that protocol there. But I think, a proper standards is gonna take, like, years to establish.

[9:02] Conor Bronsdon:
And interoperability has to be a priority here because as you're both highlighting, the opportunity here is to have thousands of agents potentially working together. And that doesn't necessarily mean only internal. You're often gonna have to interact with other companies' agents. You're gonna have, in fact, you probably already are in some cases on the Internet today. So

[9:25] Conor Bronsdon:
how do we ensure that this ecosystem actually has the right frameworks and the ability to deliver on this promise instead of becoming even more chaotic?

[9:38] Speaker:
The hyperscredders are gonna play a a role in this. This is very clear. So there there there are integrations with them. So for example, Crea is natively integrated now with Badrock Agents in Azure AI Studio. So we just announced it with with the Microsoft folks on the build event in Seattle in a few weeks ago with AWS. But I think the hyperscalers are gonna play a role. But beyond the hyperscalers,

[10:05] Speaker:
there are companies that are gonna have their own agents. Right? So if you really think about it, like, ServiceNow was a big one, SAP is a big one, and and Salesforce is another big one. And Yep. Companies are gonna use those agents. There's no way around it. Right? So right now, a lot of the work that we do is working with this company, figuring out those connection points.

[10:25] Speaker:
And some companies are not ready to kinda, like, settle on the framework on kinda, like, a protocol, and they just wanna kinda, alright. Let let's do, like, HCP, and and that's okay. So I think as long as, like, as long as there is interoperability, I think how we get there, the customers don't really care. But interoperability does need to exist. I mean, that's the one thing that we hear from everyone in the market is no one wants to get them their lock anywhere in the stack. They wanna change LLMs at any point. They wanna change frameworks at any point. They wanna change, like, memory at any point. They would just wanna have flexibility.

[11:03] Speaker:
Yeah. And actually, to that point of flexibility, let's if we go from if we transition from the enterprise down to the actual developer that's trying to build stuff. Right? We're seeing for a while now that the software engineers have finally gotten all these tools that they can just use and build stuff. But going back to that, like, age old question mark, which they all have around, like, I I can build out these these pilots really quickly, and I can I can I can do really cool things in a hackathon? But as soon as we start talking to even startups here in San Francisco around, like, great, but this is this is a massive business that you can build. You can build agents to try to to completely,

[11:42] Speaker:
you know, upheaval the entire supply chain market. They're like, yes. That's what we're do. Day three of that, they're like, it doesn't quite work. We need to kind of figure out how to make this work. It has a 100 more failure modes than than than we thought before with a simpler Rag based system. This is the big bottleneck that I'm always very curious about because there's a whole build stack, but then there's the actual productionization

[12:03] Speaker:
stack. And there's something to do with the new SDLC process almost for for these nondeterministic systems. So maybe something to talk through is, like, what what should developers be thinking about with the like, can can you use your own existing IDE, your your your existing testing frameworks, etcetera? And but where do you change things up in your SDLC process such that you can actually build at scale in a very reliable way with these multi agent systems?

[12:35] Speaker:
Yeah. I think, honestly, like, it's as you said, it's a different mental mental model. Right? We're talking about, like, this idea of, like, nondeterministic kind of, like, framework. So if you think about traditional software engineer, independently of, like, whatever language you use, you're talking out of very strongly typed systems, where you very clearly

[12:55] Speaker:
know what it is coming in, what are the transformations that are happening, and what's coming out, and with agents and like AI apps, you don't. So writing deterministic tasks become a problem, and things like LM as a judge become way more important in here. So I think like that's where like things like some of the folks that you some of the things that you folks are doing are like very important because

[13:17] Speaker:
if you're not measuring, like, you don't know if, like, it's working. What we see is, like, people sometimes struggles at different points. They struggle to get it started because they've never done something like this, and usually we help them quite a lot, but then they also struggle to get it better because they get into, you know, like a kind of like a short kind of blanket kind of thing where they fix one way, then the feet go out and scold, and they fix the feet and the blanket's short to get on top. So, like, in some they're operating without having, like, the regular

[13:52] Speaker:
CI, CD, texting that you would do and more, like, mature things. So I think there's, again, there's a stack out there. There are tools out there that can help with you. But I think a lot of the engineers out there, they're not like you and I. They're leaving and briefing them of the day to day. They're still catching up, honestly.

[14:10] Conor Bronsdon:
So let me posit this to both of you then. You've both alluded to some of the edge cases, some of the opportunities here. And and obviously, Karru's doing some incredible stuff to increase the accuracy and reliability of agents, and Galileo is taking our own approach of, like, how can we provide evaluations, observability, and reliability tools for folks who are building agents.

[14:33] Conor Bronsdon:
But what are the other areas where significant innovation needs to occur for agents to realize their potential? Are there particular layers where that are very nascent still or need additional standardization? I'd be curious to get both your opinions on this.

[14:50] Speaker:
Now I I can tell you that, like, for for me, there's a few big layers. I think authentication is one. Very clear there's there's a lot of people working on this. Right? Like, Okta is launching their own thing. Microsoft launched their Entra integration and all that. But it's still not very clear. Like, it it feels a lot of, like, smoke mirrors. Right? Like, yes, you now can fingerprint and you can authenticate,

[15:16] Speaker:
but there's like, this problem runs way deep. It's not only authentication, it's the scope of the access and then dynamically changing that scope of access as the execution goes, depending on who is running this. Like, I mean, just a few days ago, right, GitHub that the huge, like, instant where their new MCP server was giving access to private repos and it shouldn't. So I I I think, I

[15:41] Speaker:
I think, Vikram, you kinda hinted into this, like, bringing the systems in production creates so many new vectors of attention, that you gotta have that you you don't watch for. Yep. 100%. I feel like the the there are a couple of these layers, Connor, but it does require cross across the industry adoption of certain standards for this to actually happen. The one weird parallel that I talk about, and I'll do this in thirty seconds just to keep it short, is in the financial services space, honestly. Because,

[16:10] Speaker:
you know, it was it's really there's a big interoperability problem in amongst banks for sending money from one bank to the other. Right? And in in in Asia, across China and India and a a couple of other countries across Europe, they built out an interoperability network across the all the states and across the EU, across countries, and that was adopted by everybody. What that allowed you to do is send money from one bank to the other bank by just, like, using a very simple protocol.

[16:37] Speaker:
I'm I think the same thing needs to happen here because once you you know, we you have to have these protocols for being which everyone can just agree to. The worry that I have on that side of the stack, on the protocol side of the stack, is the same thing that happened with models where people are thinking of that part of the stack as something that I can own as a cloud provider. Right? And and this is gonna allow me to become to have the wedge such that I can own the entire agentic market. I think that's the that's a very shortsighted way of doing things. Right? It's kind of like how OpenAI just came out with a closed source mark. They're just gonna own this and monetize the hell out of this. Google came out with BERT way back, which was the right way of doing this in my opinion, and I am biased because I was part of that. But I feel like that was that was good because now go, builders. Let builders build. Let let people do more with this. So I feel like the now the same thing is happening with frameworks where you have a two a and a bunch of others, and the ones which start to get

[17:31] Speaker:
where you're taking a cross industry approach to this. So you bring everyone together and start to adopt everything around this one framework. And that's gonna be very important on the at just the communication protocol side of things. So I think that's one side of the stack, which is, hopefully, by the end of this year, we see something kind of come up where everyone just galvanizing around one standard. I think that's gonna be huge.

[17:55] Speaker:
But the other piece is to Joe's point before, I agree with him. There's something around authentication, which is really gonna be important as well, where you might see Okta coming in, but you're also starting to see certain security players come in because now there's nonhuman identities from a security perspective, which is becoming a bigger and bigger thing. Again, Okta is playing there. A couple others are playing there. But if you if you look at it, it's kinda similar. Again, I go back to cloud because it cloud brought in the advent of Okta

[18:21] Speaker:
actually had to be built out for that. Datadog had to come in for observability. You had to have see, all sorts of cloud security companies came and got bought by Cisco. So the same thing is gonna happen all over again, I feel like. The whole stack is completely free. But the biggest thing which worries me is this is the communication protocol where I'm starting to see people get a little bit greedy around, like, I own this now. I own this. It shouldn't be that way. Somebody has to come in and say, like, we all own this. Let's work together as Switzerland, and let's come up with something which everyone can own as open source.

[18:48] Speaker:
Yeah. I feel the protocol for sure. And we had people ping us about, like, hey. Let's lounge like this. Let's do that. And and and, honestly, like, yeah. It's interesting to see. It's definitely there's there's a sentiment there, you can tell that people feel like if they launch the protocol, they they get to the market a little bit, and honestly, I don't think that that's the way that it would work. I think, honestly, even whoever launches a successful protocol,

[19:15] Speaker:
I don't think, like, this person is gonna get much of an edge. If anything, they're gonna spend a lot of time working on this while other people are building. Yep. Exactly. It's I feel like it should just be some kind of an open source protocol. The MCP piece is interesting, but that's not the same as these communication protocols. I I'm I was very excited about the MCP's piece because at least that allowed some standardization around tool calling, which is great.

[19:40] Speaker:
That that was a huge unlock. That was such a big pain point, but I feel like now the next pain point, which has not been completely faced by people, honestly, is the multi agentic communication protocol. If anyone very few people I think Joab alluded to this before. Very few people in the enterprise are actually building on multi agentic systems, true multi agentic systems. And so the pain hasn't been felt. But I feel like by the end of the year, with these marketplaces coming in, so many people will face that pain that we are gonna have to start working towards some kind of a communication protocol that we can all agree to and then start to say no to a bunch of these others.

[20:12] Speaker:
Agree. Agree.

[20:14] Conor Bronsdon:
What about the governance side of this? As we start to unleash more and more agents into our businesses, into our day to day lives, particularly as they get more complex with these multi agent systems that are capable of more than maybe some of the the junior coding interns that we started with in certain cases. There are so many second order and third order impacts that

[20:38] Conor Bronsdon:
can potentially occur here. And, you know, Vikram, you brought up financial services here and the, you know, the idea of the the swift network approach. There are also a ton of regulations that come come in there, and banks in particular need to consider that. But most industries have some regulatory frameworks to concern themselves with. What do you think developers and organizations that are building AI agents

[21:01] Conor Bronsdon:
should be thinking about when it comes to governance and, I guess, the actual not just observability, but also,

[21:11] Speaker:
maintenance of these systems? Joe, I'll be very curious to get Joe's thoughts on this, especially given where Drew sits in in this space. And I know that you probably are talking to a lot of financial services too. I it's actually fresh in my mind right now because I just before this kind of a call with one of these with the top 10 bank in The US, and they're first of all, they're they're building agents. Like, they're they're I take that back. They've built a multi agentic system for their internal use cases already, which is amazing to see, first, from just an adoption perspective, first of all. I just take a second to understand this. Like, it's this is new technology. This is a top 10 bank in The United States, and they've already built a multi agentic system. So it's here and now and and then some. But the second thing was their question was,

[21:52] Speaker:
about governance. Their question was, how do I I have all these policies from my moderate risk management team, from my other teams, and I and my other stakeholders. I have auditability, and I have all of these other responsibilities to uphold you. How can I take all these policies and actually bake that into my system so that I can actually show them exactly how this this these agents made their decisions, number one, which is more of a visibility protocol? And the second one is more about how do I bake these different policies into my system in the form of different kinds of

[22:26] Speaker:
different kinds of evaluators. And that's kind of where, you know, Galileo has been playing quite a big role. We've been thinking a lot more about this. We don't believe the future of the system of this industry is going to be a set of magic metrics, which has been, I feel like, very much generative AI one point zero observability, which is like, I have these 10 metrics and I have these 15.

[22:46] Speaker:
That's nonsense. Like, we keep coaching people about how it's all about your use case and your business and how you can bring those policies and bake that into the system in the form of really high accuracy custom evaluators that you can build out. And I think that's where that's where things are gonna go. And I think it's gonna get more and more solidified from a product process perspective in these companies

[23:09] Speaker:
around, you know, how do you build these high accuracy evaluators, and how do you bake that into your SDLC process and your CICD process? And that's very that's to Joe's point before about education and coaching. That's generally the coaching that we give them as well. Like, don't come don't don't come in asking for, I don't know, a faithfulness metric. It is, again, like, it's ludicrous.

[23:29] Speaker:
Like, you ask the question of, like, what's important for your business, for your use case? Let's let think about that more deeply, and then we'll give you that metrics IDE for you to be able create that in two minutes, and then take that into your CICD. As I said, it's high quality, low cost, low latency, and scalable. That's that's why I feel like that that move has to be made for the auditability, traceability, and governance side of things. But I'd be very curious to hear your thoughts, Joe, in terms of, like, when you're talking to these banks and telling them that, hey. You can build all these crews.

[23:58] Speaker:
I'll bet that their eyes light up in terms of the thousands of use cases they can unlock. But then their eyes probably get into the side of, like, wait. But how do I get through governance and security and everything else? Yeah. I gotta say it's interesting because I think, like, highly regulated industries are moving exceptionally fast on this. Yep. I mean, financial industry for sure, but also insurance companies are moving exceptionally fast. Like, there's a bunch of industries that traditionally move slowly,

[24:24] Speaker:
and they are jumping on this. And I think it's because, like, they see the potential, right, for, like, the the efficiency gains that they can have on some of these. Because usually, by the way, the industry means a lot of back office work, so that means a lot of potential for automating. We're seeing again, we're talking with some of these banks, and we might be talking with the same bank, and they're

[24:44] Speaker:
they're building quite a lot, and it's interesting to see because I agree with you, and and you see that even on the use cases. Right? When someone come to us and, like, they they ask us, hey. What should I be doing or what are what are other people doing? Usually, like, we disqualify them as a customer, and we don't spend too much time with them because they're not ready. Right? And even if they show up with a use case, if they don't know what a success looks like, then what's the point? Yeah. So you you gotta have you gotta you gotta know,

[25:12] Speaker:
like, what are you gonna do first, then what a success looks like for that, and then then we talk about how we're gonna measure it. Because if you don't measure it, again, like, there there's no point in kinda, like, viewing this out and not being able to tell if it's being successful or not. So I agree with you. I think, like, a successful metrics here, you you can do a few things with LMs that judge, but you're gonna say, what is the quality and what is the hallucination? Then it's pretty good that it takes you far enough,

[25:38] Speaker:
but you gotta go custom. There there's no there's no way around it. Right? For certain use cases, they're gonna be more taking actions and be measuring that. For other use cases, you might actually be in measuring tech on the bottom line and and how you're gonna measure that. So I do think that knowing what a success looks like and then making sure that you implement measurements for that is what unlocks people to move faster. And this goes to your point

[26:07] Speaker:
way, way, way, way above, like, traditional kind of, like, evaluation for LLMs where people are basically doing traces. Right? Like, oh, I do traces and I collect nectars around those traces. Well, that falls very short now when you're talking about agents. There's so many more layers in there that you wanna have observability on and that you could measure and evaluate things on that, that, yeah, you're missing out if you're selling just for, like, traditional kinda, like,

[26:33] Speaker:
LLM like tracing.

[26:35] Conor Bronsdon:
Yeah. We've actually recently released new capabilities around that as well, particularly targeted at trying to help builders to identify and unpack these more complex use cases that you're alluding to here, Joe. And then also on the customization side, we recently released a family of additional Luna models based on our our in house research team's proprietary work that are designed to

[27:03] Conor Bronsdon:
to create custom SLM evaluators that are especially useful for these kind of high risk production deployments you're talking about and can enable customization at a high level and an in-depth focus for large enterprises like FSIs, and in particular, we're seeing a lot of help from that when it comes to banks who need to create these custom metrics and these customized evaluation frameworks

[27:29] Conor Bronsdon:
that not only include the LMS judge, but custom SLMs that they fine tuned with us, and then also feeding in expert feedback from humans to auto tune these metrics and provide continuous learning through that human feedback. I'm particularly excited by this hybrid approach that we're seeing evolve and that we're seeing a lot of these major enterprises really start to think through and build out because to me, that seems like the direction to go to enable these vast multi agent crew systems

[28:00] Conor Bronsdon:
to really go into production as we get through the rest of the year.

[28:03] Speaker:
Great. I agree. I agree. And this idea of liking having, like, SLMs kinda, like, validating and, like, playing a role here, I think, like, that's huge. The one other thing I would add, Conor, though, is with the adoption of a 100 crews across an enterprise, right, the the what that does is it allows these developers to be able to build these systems that massively reduce operational expenses, right, at the bare minimum. Obviously, then there's an entire revenue component to this as well. So the value this unlocks is massive.

[28:36] Speaker:
We've also heard, and I'm curious to get your thoughts on this, Joe. We've heard about how because of this, the this the complexity of the software system itself for them is just much more than what they've had before. But at least earlier, I feel like they could wrap their head around the fact that there's a prompt, there's a model, there's context, there's chunks, and I get it. And it's, like, four or five things I can check for. I can they figure it out. Now it's like, which tool was called? What was the quality of the handoff? And which tool was not called? And why? Was there did the action actually get completed in the end or not? And the bar for whether action got completed or not is much higher than the bar for did the chatbot actually respond to the right thing or not. And so, you know, what we're starting to see is the number of Helium words that increased.

[29:20] Speaker:
And so I I feel like the other transition that's starting to happen or needs to happen now, at least from our customers, we keep hearing this, is you can't just even custom evaluators and stuff are not enough. You have to almost start to give them insights into, here are the five to 10 things that are going wrong across all your crews potentially, or not even wrong. It's more like, here are the five to 10 optimizations you can make across all of these different crews, across all these different agents.

[29:44] Speaker:
Maybe you're calling five tools which are doing basically the same thing. This happens to me all the time when I'm building out apps. And I would love it for some some my dev tool to just tell me that this is what's going wrong. It's almost like a cursor, but for my reliability of these tools. So we're starting to see that transition also starting to play out, and that's something which we just, frankly, launched ourselves as well. We call it our insights engine, where we automatically tell developers, like, here are the five to 10 things that are going wrong that with which you it's probably gonna take you two weeks to notice if if that and here's what you can do about it. I feel like that's gonna be much more important going forward than asking people to just try to figure it out, especially the system is getting so complex. So curious to get your thoughts about the complexity of the system and how developers should think about that. Yeah. I mean, the systems are definitely getting more complex because in the end of the day, like, these agents, they're they're at the layer on top of everything and you can have them perform rag and they can have them using these tools. So I agree with you that there is values of metrics that zoom in,

[30:44] Speaker:
like, but there's also maybe even more value in metrics that zoom out. Right? And we do add some of that on on our platform as well where you can zoom out and see overall like, alright, let me see what this executions are looking like. How are my agents behaving? What are my best agents? What are my worst? Why are my agents that elucidate the most? What are they want to do least? What are the tools that are breaking the most? What are, like, my cash retake on some of these tools? Right? There's a lot of metrics in there that I think can be very helpful. But I agree with you. I think right now, what people are realizing is

[31:21] Speaker:
one thing is is, like, playing and building the systems, and as you sound like you prototype something in a half a ton is very exciting. When you think about it's not even about getting this to production. It's more about scaling this to hundreds and even tens. Like like that that comes with its own complexity. I think some of, like, our most advanced and, like, customers now,

[31:45] Speaker:
they're putting around five use cases in production a month. What for me is insane. And that was up pretty fast. Like, honestly, if math and, alright, Like, I did five one month. That's okay. And now I and add another five. Now who is controlling this? Like, how big is this seam? Who is monitoring this? Who is making sure that this is working or not? So I I think there is a layer thereof, like,

[32:13] Speaker:
I there is a stack, and part of the stack is this control plane features Yeah. That they are getting lead to have in order to control your operation of agents as a whole. Yeah. Yeah. Agreed.

[32:30] Conor Bronsdon:
Joe, is this part of why Crew has been leaning into no code as this rapid proliferation that we're seeing and how many use cases folks are building and particularly as folks are building more coding agents too. It sounds like from what I've heard you say, crew is really thinking about this as a, okay. We're gonna be an orchestration control plane versus

[32:57] Speaker:
always be helping build the individual agents. Maybe we're gonna make that really easy for you to do with workflows that don't require you to to do a ton of coding. But I'd love to understand more about that strategy and how Cruise approaching this. Yeah. I think some of our thinking is if you really wanna transform, like, companies like that, you you need to democratize access to this. Right? It can be something that just a subset of people is doing. You need to make sure that everyone in the company is empowered to do it. And right now, even if you give, like, a and, like, no code twos out there for people,

[33:28] Speaker:
it's weird. You look at them using it. And I don't know if you ever had this happen to you, but you probably have, like, some elder in your family. And do you know, like, when they, like, they point you to the control, like, the remote for the TV, and they're like, is it okay if I press this? Like, they're not sure if they should, if there's something that'll break, and they're gonna con this configure the whole thing. So, like, that's kinda, like, what I see sometimes, like, there's nontechnical people trying to use these systems.

[33:56] Speaker:
And I I think we can do so much better than that, like, as an industry. So I think a lot of our focus now is if you really think about it and you democratize the access for people to do these agents, if you can viewed, to Vikram's point, 50 amazing agents and you can put them in a repository that this company can now reuse, there's so many different combinations that you can put them together that you can automate

[34:22] Speaker:
a lot of their processes. Right? And I think, like, then that gets, like, these companies on the on the follow-up stage that we are too far to steal, but I think it's gonna be somewhere in the future, and that is, like, less about automating the process and not about rethinking them entirely, And, like, realizing that some of the steps in there are not even necessary anymore

[34:43] Speaker:
and what that means. So I I think that's gonna be exciting, but we're definitely our thinking there on no code is, again, our crew has the open source promo code where we have a dedicated team. We we look at that as a product, and we keep investing on that. And that is very important for us. I think it's the main kinda, like, asset. It's a huge way for us to give back the community.

[35:06] Speaker:
At the end of the day, like, if you really wanna, like, transform the market, you need to democratize access even further.

[35:12] Conor Bronsdon:
So as we democratize access to agents and unleash these hoards of interns and, you know, sometimes PhD level folks on businesses. There's this massive need, as you've both talked about, to observe them, to understand their impact, to identify the, you know, the pros and cons of what they're doing and improve and iterate on that. With that in mind, what's the philosophy

[35:39] Conor Bronsdon:
that you both think developers and other folks who are building these technical systems should take when it comes to testing and ensuring the quality, reliability and value of their agents.

[35:52] Speaker:
Yeah. I gotta say it's something that, it's funny and and Decrean will probably tell you the same is like, on day zero, they might not care or not spend too much time on it. They're just so excited. And then they deploy it, and then that's okay. But then what happens is, like, they have this thing ready in production, then someone deploys and breaks. Yep. It is not that it breaks like the new thing that they're doing, the new thing works

[36:19] Speaker:
and it breaks like something that it used to work before and now they have to roll back and now they go into that like oh now we need to run like 10 prompts to make sure that this works before we deploy it, and then becomes a manual process, and that's the point where they like, alright, we need some help here, either two or people, there's something that is gonna help us with this. So I think like that's kinda like where where things are tracking now. People are moving fast, very excited,

[36:47] Speaker:
and then they realize that as they start to regenerating fast, they start to break. And they start break because they don't have measurements in place. They don't have, like, observability in place. They don't have all those things in place that help them. Kinda like to to your point, kinda like a see this come from a mile afar. Right? And even preventing those from happening, and that gets in the way sometimes. But I think it's gonna be a a key

[37:11] Speaker:
aspect. Like, I keep telling people this, like, some people, like, sometimes some customers, especially on the first call, they might ask, like, the the infamous question, like, why should I why should I buy your enterprise solution? I can use OpenSearch. I was like, well, imagine why they're running a thousand agents in OpenSearch gonna look like. And, like, how are gonna control that? Right? So there I get it. I don't think open source is competitive

[37:39] Speaker:
with enterprise. I think they're complementary. You still use open source. Right? And you can use any open source. That's the beauty of it. So that would be a little bit of my take. I think there's definitely, like, component here of, like if people don't start thinking about data from day zero, it's gonna come back to haunt them pretty fast, and they're gonna need to go around and figure it out. Yeah. I 100% agree. It gets the day zero first principles thinking.

[38:04] Speaker:
From a software engineering perspective, it's still the same concept. So you have to go back to thinking about what's the use case, and therefore what are the different tests I need to build out, and therefore and then what's and do some kind of ski testing, preproduction, and then also obviously in production. The difference, though, that I feel like is what is interesting is all of this is still

[38:25] Speaker:
data science. Right? We've just lost the science in data science because now it's we've gone so far away from this being called machine learning to now AI to now agents. We've forgotten that this used to be all machine learning, literally a year and a half ago. And so the data scientists in the room look at all of this, and they're like, of course, you need to look at the data. Of course, there are rapid feed lab feedback groups. You talk to the software engineer, they're like, there's no way in hell we're look at the data. There's we we once we productionize this, that's done. My CICD process should take care of that. So it's a little bit of, like, that education and coaching, and, like, that's why these the term evals is becoming really popular among software developers, but a data scientist is gonna be like, welcome to the club, man. This is something we can bring for for years, and now everyone's all about evals. But this is the same thing. And so I feel like we're just layering on the the age old concept of data science, but educating developers globally about how to do it how to do that in a faster way as they're building these systems, because the ability to build these systems has become much, much faster because of tools like Rue AI.

[39:26] Speaker:
I I just think the feedback loop piece, Connor, of going from identifying failure modes when you're trying to do some kind of even the minimal scale testing, because as soon as you put something out there, the age old thing in data science is, as soon as you put something out there in production, out of distribution starts on day zero. New kinds of queries start coming in, new failure modes are detected, and then you have to kind of fix it. So I think where this industry has to move very quickly is how do you start to identify those failure modes quickly and then auto

[39:56] Speaker:
correct the prompt or the tool call and all this other stuff, which is honestly all software engineering? How do you auto improve that? And that's the piece which is different than where what we had before, where in circa machine learning world one point o, you had to go and contact, like, name your labeling as provider, get the data there, train, retrain your model with it. It used to take three months. Here, I think that it can it's it's possible for us to just quickly have those feedback loops come in. I I think that's gonna start happening sooner, but that's gonna be the big unlock for people to feel like, I can actually rely on this thing because my feedback loop automation system is just working fine. And that's that's the that's where Galileo's goal is as well going forward. It's not about magic metrics. It's more about automatic feedback loop creation.

[40:40] Conor Bronsdon:
So as everyone in this market continues to build more agents and, you know, work with crew to build out new agents with low code or expand upon their CICD processes with, automated evals, There is just a massive amount of work that needs to get done. And a lot of that we expect to be done by these agents. But it also means we are on the cusp of a major transformation. In fact, we're already

[41:10] Conor Bronsdon:
transformation. Joe, you brought this up earlier, but even established enterprise companies are are kinda freaking out. They're trying to figure out their play in the agent space. And the adoption curve of this technology is so fast compared to some other previous waves. What does this mean for innovation in the next I I don't even wanna say beyond the next year, let's say through the next year, like, does the things look like? Because I I think they're adapting so fast. And in particular,

[41:39] Conor Bronsdon:
I'd be really curious, Joe, if you wanted to share a bit with us about kind of Cruise approach and strategy for this next stage of the company and this next stage of the agentic revolution. And then Viktor might be curious on on kind of where you see Galileo

[41:54] Speaker:
fitting in with that as well. I think, I think, yeah, you you bring a a a good point. I think it's very clear that this is gonna be a a massive industry. I mean, the potential is there. It's not me and Jensen saying it is again, it's all the signals that's out there. Right? There's people getting out out there. You go into anyone's website, these agents all over it. Like everyone's trying to capture some of that attention.

[42:18] Speaker:
That contributes to some of like the agent washing that is happening where people are just like calling the agents that they're not, and that makes like everyone that is not educated yet, life's harder, so they need to grasp some of that information. But I mean, this is changing the industry. One example that we are feeling, I mean, we because of the open source,

[42:39] Speaker:
we get a lot of traction into our website. Right? And now going from, like, nothing to number 14 and growing, ChatGPT is the fourteenth biggest source of traffic and growing. Like, people are not even they're not searching on Google. They're chatting and getting a link from CoreAI and jumping to our website. So, like, what that means. Right? And then, again, as as these models got better and better, if you really think about it, like, hey. You have an agent that is doing a planning, like, for a feature, and then you have agents that are coding it. Well, what happens to Jira in that way? Like, why you need cards? Right? Like, again, I'm I'm, like, extrapolating

[43:20] Speaker:
here to extremes, but you can see how, like, some of these automations and these processes, they have, like, a very heavy impact on kinda, like, what what business used to be and how business have been run. So I I think from from our point of view, one, it's a race to bottom in terms of models' prices. Models' qualities keep getting better and better. It feels like whenever it feels like we're hitting a ceiling, someone comes up with a breakthrough in one way or another.

[43:50] Speaker:
Right now, like, I'm CreoAI is building to the future. Like, we're building features that said, like, people might look at it as like, this is not gonna work. Well, it might not work today. It was gonna work two months from now. So we're very much bidding for the future, a little bit until, like, on the cutting cutting edge, until, how those things are gonna go. And

[44:11] Speaker:
I think that's a little bit on overall the vision that we're seeing in the market. And the potential is right there. I think this is a unique time window for companies that are operating in the industry, given how hot the market is. And what that means for crew is land grab. It's like literally closing as many customers as we can and help them to get to success.

[44:35] Speaker:
So a lot of what we're doing is is helping these customers getting this use case out there and see the value and step up. And education plays a huge role in that, and that's why we do so much content and courses and things like that. So I I would say that's, like, part of our strategy. What is some of our culture is just, like, holding no punches and making sure that we are, like, putting ourselves out there and, like, shipping as fast as we can. Love it. Vikram, I see you nodding along to a lot of this. I I know that resonates with you.

[45:07] Speaker:
It does. It definitely does. I agree that we are in a very unique time and place right now, and I think the, the responsibility like, it seems like like, you know, all of, like, true AI is doing a very specific thing, which is extremely crucial for the foundation of agentic development. Right? And our Galileo's goal for the last four years has been to solve the measurement problem in AI, specifically unstructured data AI. Companies started, like, twenty months before the GPT moment even happened. So we've been dreaming about this moment, like, around where language models can actually be used by developers because it unlocks so much.

[45:40] Speaker:
However, our the the push and pull, I think, tends to be that how much should you build for what customers here and now or developers here and now are asking for versus also looking at where the puck is going and start to build for that. Our thesis has always been, like, it has to be, like, a sixty forty. And so, you know, from a reliability standpoint, it's we we see this world going in this direction where it's gonna be moving from

[46:04] Speaker:
magic metrics towards insights. It's gonna be moving from, you know, like building out test driven development towards actual automatic feedback loops from production. And, you know, having scale is gonna be really important. And people aren't talking about that enough, I think, on the reliability side. It's very much on offline evals. It's, you know, here's a playground, here are some metrics,

[46:27] Speaker:
drive five different models, they'll work. But very quickly, you're gonna have, like to Joe's point before, you're gonna have a 100 different agents, talking to each other. That's one final form of load testing and scale testing that's gonna start to happen. You're gonna have millions of different, tasks that need to be completed all the time from all of your users. How do you scale reliability and how do you scale these tests to that extent becomes extremely important, and what is how do you control that kind of an ecosystem and sleep at night knowing that your brand is gonna is going to be safe, knowing that your users are gonna be safe? So that those are the things that we've been we talk about all the time at Galileo, that's why we launched our Luna two, a family

[47:07] Speaker:
of of models of a lot of research, which is specifically focused on agentic evaluations, as well as our insights engine instead of the metrics engine, we can automatically give developers the failure modes before they even know that that's a problem, as well as automatically tell them what to do next. Right? And the next step from here is do it for them. So it's that's the phased approach that we see, and we I feel like it's not even an if question, it's a when question, and that's why we invest so much in AI research and infrastructure and not just a software stack reliability,

[47:38] Speaker:
but that's the future that we see coming coming through for developers, and I feel like we all have our part to play in pushing the envelope so that the this reality of multi agentic systems can happen much, much faster than otherwise could have. Yeah. I love that.

[47:52] Conor Bronsdon:
Well said by you both. It's such an exciting time and such an exciting space to be in. And I'll say personally, I am incredibly stoked to have crew as someone that we're doing a lot of work with, whether it's our agent next event series where Joe, it's so excited to have you be a part of that. When this episode comes out, it'll actually be tonight, and then we'll have more to come after that, whether it's, you know, our our planned webinars,

[48:17] Conor Bronsdon:
the the integration work we're doing, the content, you know, having you on this podcast. It's just such an honor to have you involved with the work that we're doing here at Galileo, and we are, I think, just as excited as you about the potential as industry and, you know, the work we can do together in the coming months. So, Joe, thank you so much for joining, Vikram and I, to share your perspective on Chain of Thought today. You called it at the top of this episode. We really need at least another hour or two for this conversation, so can't can't appreciate you enough for joining us. Yeah. I love that. Thank you so much for having me. This is very exciting. And, yeah, I mean, excited for what we'll be cooking together.

[48:49] Speaker:
Yeah. And

[48:51] Conor Bronsdon:
and definitely go check out Crew AI if you haven't listeners. Crewai.com is the place for everything. You can find so much there. And and Joe has so much interesting content he shares on his his Twitter account, on his LinkedIn, and all over the Internet. They are doing some incredible stuff. Really interesting use cases you can find, fascinating blogs, technical depth. Definitely

[49:12] Conor Bronsdon:
keep an eye on them. And if you're you're looking to build agents, check out CareAI. They are fantastic. To everyone listening at home, be sure to subscribe wherever you get your podcasts and check out the Galio YouTube channel for more content like maybe a weapon Earth CareAI coming soon. Events, deep dives, and of course, every episode of this podcast where you can watch us interact with all of our incredible guests like Joe. Joe, again, thank you so much. Been a pleasure having you on. Thank you. Have a good one. Thanks, Joe. Thank you. See you.