Chain of Thought | AI Agents, Infrastructure & Engineering

Sudhir Hasbe is President and Chief Product Officer at Neo4j, the graph database company powering 84 of the Fortune 100 (Walmart, Uber, Airbus) at $200M+ ARR and a $2B+ valuation. Before Neo4j, he ran product for all of Google Cloud's data analytics services: BigQuery, Looker, Dataflow, and led the Looker acquisition.
His thesis: the hallucinations we blame on AI models are really a data architecture problem. LLMs weren't trained on your enterprise knowledge, so handing them a data lake with 10,000 disconnected tables and asking them to reason is the wrong design. The fix is knowledge graphs: feeding the model a structured map of relationships, entities, and context so it can reason over meaning, not just vector similarity.
Sudhir breaks down the five capabilities knowledge graphs unlock for enterprise AI: GraphRAG (moving accuracy from 60% to 97%), semantic mapping across siloed systems, context graphs, agent memory, and multi-hop reasoning. He explains three architecture patterns customers are actually shipping, why giving an LLM hundreds of tools makes it worse, and what Uber, EA Sports, Klarna, and Novo Nordisk are doing differently.
This is the case for treating knowledge as infrastructure.
We cover:
  • Why enterprise AI needs a different playbook than consumer AI
  • The five data asset types every agentic system needs: system of record, historical, memory, context, and reference
  • How GraphRAG combines vector search and graph traversal to move from 60% accuracy to 95%+
  • Three architecture patterns: semantic layer only, semantic map plus domain data, full consolidation (the Klarna/Kiki model)
  • What context graphs capture that Salesforce doesn't: the Slack and email negotiation behind every deal
  • Why giving an LLM hundreds of tools drops accuracy, and how Uber uses knowledge graphs as a business validation layer
  • What Neo4j's Aura Agent, MCP server, and A2A support mean for developers starting today
Chapters:
(0:00) Why building a self-driving car is hard
(0:22) Intro
(2:03) Hallucinations as a data architecture problem
(4:31) From models-as-core to systems-of-knowledge
(6:13) Why data lakes fail AI agents
(9:15) The five data asset types enterprise agents need
(11:46) Where basic RAG breaks down: the Spotify metadata lesson
(16:00) GraphRAG: 3x accuracy, easier development, explainability
(18:47) Semantic mapping across the enterprise estate
(19:23) Three knowledge-graph architecture patterns
(22:42) Context graphs: capturing the "why" behind decisions
(25:33) Individual vs. organizational agent memory
(28:40) Multi-hop reasoning for fraud rings and AML
(31:52) Why there are no shortcuts in enterprise AI
(36:38) What happens when you give an LLM 100 tools
(39:19) The Uber example: knowledge graph as business validation
(44:42) First mile of a 26-mile marathon
(48:32) Aura Agent, MCP server, and the A2A protocol
(50:43) Where developers should start
Connect with Sudhir Hasbe:
Connect with Conor:
More episodes: https://chainofthought.show

Thanks to Galileo — download their free 165-page guide to mastering multi-agent systems at: 
galileo.ai/mastering-multi-agent-systems

What is Chain of Thought | AI Agents, Infrastructure & Engineering?

AI is reshaping infrastructure, strategy, and entire industries. Host Conor Bronsdon talks to the engineers, founders, and researchers building breakthrough AI systems about what it actually takes to ship AI in production, where the opportunities lie, and how leaders should think about the strategic bets ahead.

Chain of Thought translates technical depth into actionable insights for builders and decision-makers. New episodes weekly.

Conor Bronsdon is an angel investor in AI and dev tools, Technical Ecosystem Lead at Modular, and previously led growth at AI startups Galileo and LinearB.

Disclaimer: All views, opinions and statements expressed on this account are solely my own and are made in my personal capacity. They do not reflect, and should not be construed as reflecting, the views, positions, or policies of Modular. This account is not affiliated with, authorized by, or endorsed by Modular in any way.

FINAL TRANSCRIPT
================
Speakers: Conor Bronsdon, Sudhir Hasbe
Duration: 52:23
Total Words: 9175
Generated: 2026-04-15

---

[0:00] Sudhir Hasbe:
Why is it hard to build a self-driving car? Because you just can't like get into an accident. It requires a lot of validation and a lot of stuff to be done in a self-driving car, right? Things like the systems that are critical to businesses, for them to be accurate is a necessity. It's not an option.

[0:22] Conor Bronsdon:
Welcome back to Chain of Thought, everyone. I am your host, Conor Bronsdon, Head of Technical Ecosystem at Modular. My guest today is Sudhir Hasbe. Sudhir is President and Chief Product Officer at Neo4j, the leading graph database company. They have more than 200 million ARR in business. They are valued at over $2 billion, and their technology is embedded in 84 of the Fortune 100, including Walmart, Uber, Airbus, B&W, and many more. Before Neo4j, Sudhir ran product management for all of data analytics services at Google Cloud, where he oversaw BigQuery, Looker, Dataflow, and actually led that Looker acquisition. And all of that has informed his thesis on how AI and data are flowing today. And that is that while we all talk about hallucinations in AI and blame them on the model, their predictive text machines after all, we need to stop treating it like a model problem. It's fundamentally a data architecture problem. It means that we're not supplying the model the right context, the right memories. And the fix isn't better prompts or bigger models, it's knowledge graphs. Sudhir, welcome to Chain of Thought, excited for this conversation.

[1:33] Sudhir Hasbe:
Same here, Conor, thanks for having me and looking forward to the conversation today.

[1:37] Conor Bronsdon:
Yeah, it's gonna be a lot of fun, but before we dive in, I do have to say a quick thanks to our presenting sponsors, Galileo. If you're building applications and need to understand why your models are behaving the way they are, Galileo's eval intelligence will help you debug, evaluate, and monitor your LLM systems. Check them out at Galileo.ai. Sudhir, you've argued that the challenges with hallucinations in AI are due to how we architect data for our systems. Can you explain your perspective here?

[2:03] Sudhir Hasbe:
Yeah, first of all, I want to decouple the consumer AI versus the enterprise AI, right? That's one of the big things. A lot of these large language models have been trained in publicly available information. So of course, they are much better trained on general web information on the web and used based on that. And the newer and newer models with more reasoning capability are getting better and better at answering questions and getting information. But enterprise AI is completely different. In enterprises, you want to use your enterprise knowledge to power the agentic or AI systems. And these models have not been trained on that information. And training a custom model or even fine-tuning models is actually an expensive proposition. You're not going to go ahead and give your enterprise like, you know, IP and knowledge in many cases to some of these models. You may train some small models, but not everything. Everybody can build another GPT or like Gemini-like model. That's not where the world will be going. So that's one thing. The second thing is models by default are non-deterministic. They're supposed to take information. The generative technology was about how do you predict the next word. With the reasoning capability, things are improving more. But still, it is going to try and answer questions based on information it has and come up with logic behind it in some cases, right? So therefore, my thesis is like for enterprise AI especially, you have to figure out how and when to use model and for what you're using the large language models for, which is actually language understanding, intent understanding, also reasoning capabilities. It's becoming really good at reasoning through things, but then you all have to figure out what's your data architecture to provide that enterprise knowledge to the agentic system so that you can make right decisions and you can audit the decisions, you know exactly what you're doing and be more deterministic in some of these decision-making. So that's my thesis on this, is like you have to, as an enterprise, figure out what your data architecture is. And also Gartner talks about this AI-ready data concept, which is same thing that I'm trying to say is that it's less about AI-ready data, but how do you give enterprise knowledge to AI systems so you can make right decisions.

[4:31] Conor Bronsdon:
Yeah, when I interviewed Aishwarya Srinivasan last year talking about architecting AI agents, she argued basically that the shift from AI is basically we're moving from models being the core of everything to the systems around the models and the harness being extremely important and that we've kind of underrated that for the last couple of years. Your argument somewhat sounds like the next step, the shift from system to knowledge architectures. Is that a fair description?

[4:58] Sudhir Hasbe:
Yeah, I think it is. I actually went and looked up that particular like podcast. I've read about it too. I think that she's, Aishwarya is absolutely right, right? Model is one component of the full system and how you architect the system is the next thing to figure out how to build enterprise ready applications or systems that are built. And what I'm saying is, as part of the overall system, you also need to think about what is your data architecture as part of the holistic system? How are you going to get your enterprise data is siloed? It's in like hundreds of applications, if not thousands. And even if you put all of that into a data lake or data swamp, it's going to be like, thousands or tens of thousands of tables, this is all disconnected. How do you bring all of that information so that agentic systems or AI systems can reason over it and make right decisions? You need to figure that out. And I think that is the problem to solve as part of system architectures, how these things, systems come together, would be what is your data strategy and how is the data going to be represented, constructed so that models can actually leverage it best into making decisions.

[6:13] Conor Bronsdon:
So one of the great things about data lakes or data swamps is how evocative that imagery is. But those clearly aren't working effectively for AI agents today.

[6:24] Sudhir Hasbe:
Yeah.

[6:25] Conor Bronsdon:
What's the right model of knowledge application or databases to provide AI agents so they can actually reason over them effectively?

[6:35] Sudhir Hasbe:
Yeah, I think I would say if you think about like, you know, data lakes, like, you know, having all data, even if it was like, first of all, data is siloed in very, very different systems. Even if you put it into data lakes, having a lot of tables and letting agents go, like, you know, go into the system and figure out what you want to go do with the data and find the relationships between data yourself is not going to work, right? So there are different patterns and practices we see where customers are leveraging, like, you know, knowledge graphs for, and I think one of the patterns that we have seen is this using it as a semantic map. like we've been working with customers like EA Sports and they did this where they use Snowflake for all of their data lake, like they have like massive amounts of tables and all and the users want to go ahead and ask business questions, get answers and all and the big challenge is like it's all over the place, it's massive amounts of data but you want to be able to figure out what is the right data set, what domain is it in, what is the interrelationship between different data assets, and this could be, in this case, it's a lot of it is in Snowflake, but in many other cases, it could be across in different systems. So I think using a knowledge graph to go ahead and map your, like, you know, estate, like enterprise estate, or what dependencies are, what things are going on in what tables and how they are related to other tables, plus what is your semantic model, which is, ontology is about like what does this particular word mean in esports case it would be like when you say fifa it could be the fifa 26 game it could be fifa last year's game in the context of the user what does that different game means america's could mean us it could mean north america it could be everything So I think understanding what those meaning of words are and having that into the semantic map then helps agents make much better decisions. So rather than throwing like, hey, here is all the tables and columns and go figure out what you do, you use a pattern wherein you go ahead and talk about Here is your semantic map of what each different entities means, and then let it reason over that, and then you can go ahead and convert that into a system that can actually give you accurate results and accurate answers, right?

[8:59] Conor Bronsdon:
So instead of a data lake, it's more about mapping all the inflows of that data in an effective way. So maybe it's the river of data that rolls into that lake that becomes the holding area is what you need to map so the agent can understand the inflows and how they relate to one another.

[9:15] Sudhir Hasbe:
Exactly. And also, it's not just about data lakes. The way I think about it is every agentic system actually needs like four to five types of data assets to make sense out of the enterprise environment, right? Let me show you something I was putting together. Last week, we had a customer conference, and I was trying to share how to think about enterprise estate. And if you think about it, You have information, a system of record of present, which is the operational databases where you have real-time data. Things are coming in. When you do a transaction on your credit card, it goes into a system of record. That's where the latest information is. You need to be able to access that. You need to be able to know what it is. Then you have your analytical systems, which are system of records for the past, which is what's the historical view of your data. You want to be able to get to that and that is in siloed in like, let's say it's Snowflake or Databricks or like large data lake environments that you have. Then you have memory, and we'll talk more about memory and context graph, but memory is more about what is the short-term, long-term state of the agents, what agentic workflows and what are the state of that. And then context about everything when you make decisions, what's not stored in system of record and system history, it's going to be all over different systems and interactions and how do you capture that. And so when you think about like what agents really need is they need this knowledge graph layer or enterprise knowledge layer which incorporates all of these different aspects which is metadata, the semantic map, what is the domain information and reference data which you may need some kind of virtual access. You are not going to move everything into one single system. Like breaking silos doesn't mean you move everything into one new system and now create another silo. It's more about, hey, you can do a virtual access around it. And then your ontologies and memory and all. So I think the whole idea is, how do you understand your enterprise estate, make it available for enterprise agentic platform to get access to all those different types of data, but one layer that understands all of it and maps it out. So agentic systems can come there and figure out where they need to go to do what activities and all that.

[11:46] Conor Bronsdon:
Yeah, I think we started with a more simple reg system was what everyone is using a couple years back, right? And we've since called it dead, but it's more like it's evolved. We started with, you know, chunking our documents, embedding them as vectors, doing similarity search. And a lot of teams are still running that playbook to some extent.

[12:05] Sudhir Hasbe:
Yeah.

[12:05] Conor Bronsdon:
But where are you seeing it break down as you try to access this entire corpus of enterprise data?

[12:09] Sudhir Hasbe:
Yeah, I think, see, if your use case is very simple, your documents are very simple, you vectorize them. And vectorization is what? It's basically you're taking a large document, chunking it into smaller chunks, and then storing them in it. And then you vectorize it, create embeddings. And then when you have a question, you try to find a very similar, like whatever the question user is asking, you do a similarity search in the vector space. Again, I will give you a real good example. This was a podcast I was listening to. It is by the CPO of LinkedIn interviewing the Spotify CTO. And he was like, he gave this example and stuck with me in my head. I always use it. It's basically like, you know, vector searches, similarity searches have been done for a very long time at Spotify, he explained. But the problem they were facing was now in a vector space, When you say, I want to have this song and show me all similar songs that the user is listening to, you may be listening to a pop song and suddenly the next vector similarity could show you in country music or some other music genre. Because in vector space, it's just mathematical similarities that you're looking at. And their thing was, they went and did some acquisition about this metadata company that gave all the background information about songs. So now like users were confused. Why am I, I'm listening to this pop song. I want similar thing. Why am I now suddenly seeing this completely different genre? So they were able to use this metadata behind it to say, no, no, no. When the user is listening to this, now you use this additional metadata as a context to go ahead and restrict what the answers look like within that domain. So I think this is one example, which was really simple to understand. But we have been working with folks like L'Oreal and others, many of them, like RocketMultigen. Many of them have started with the journey in pure RAG or vector search use cases. The accuracy is around 60%, 65%. There are different research that is done in that range by using graph and graph RAG, which is, when I say graph RAG, it is a combination of, I can do a vector search, And then I'm applying a filtering layer with all the other metadata as a graph layer on top of it. Or you can start with graph traversal queries and use vectors in combination. So you're combining these two patterns to go ahead and become way more accurate. And we have customers that are way above 90% accurate. And the next thing that I have seen, which is even better, is now with reasoning engines, like you just don't have to do like text-to-cipher or pure meta-search as a pattern. What we are seeing is like, because of reasoning engines now have ability to give tools to it, you can refine and get more accuracy over a period of time by adding more targeted tools. In case of EA, for example, if you're looking for games and game performance, and you know that question is pretty common by many, many people, you can create a new tool for game performance. Anytime somebody asks about performance, the reasoning engine or the query generator doesn't have to figure out what should be the SQL query across different tables. No, you have a single like a predefined tool that actually gives you exactly that. So the combination of going from just pure similarity search, which is never going to be enough because it's like non-deterministic, to graph plus vectors, which is GraphRag, allowing you to go ahead and do this lot of metadata, lot of relationships around it, and vectors, plus ability in reasoning engines to give specific tools, actually you can easily get to above 95, 97% accuracy over a period of time.

[16:00] Conor Bronsdon:
We know that knowledge graphs unlock capabilities for AI systems, and you've done a good job throughout conversations I've seen of yours and writing you've done of breaking down this across five capabilities. So graph reg, we've talked a bit about semantic mapping, agent memory, which if folks want a full deep dive on, our recent episode of Threat to MetaLockA is a great one, context graphs, and then multi-hop reasoning.

[16:25] Sudhir Hasbe:
Yeah.

[16:25] Conor Bronsdon:
Can you break down each of these concepts? Maybe let's start with graph reg first. We've talked a bit about it. For teams that have built basic reg and hit a wall, what does graph reg actually change about retrieval? And what are they getting that vector similarity search alone can't give them?

[16:43] Sudhir Hasbe:
Yeah, I think this is the five benefits that customers experience that you are highlighting, just to keep it on the screen so people can see. A graph rag, of course, like there are three main points on graph rag that is really what is the biggest benefit. One is accuracy. As we talked about, like, hey, if you're doing just pure vector-based rag versus like graph rag, There's a lot of research on the web where you can go from really 60, 65% accuracy to 90, 95 plus percent accuracy. It's 3x better, especially if your questions are complex, the problems are complex and across multiple documents or within a large amount of document, you're looking for things that are interrelated. Those complex queries like GraphRag is 3x plus better in answering those kinds of questions. So that's one thing. The second thing is it's easier to develop it. Like it's super hard to visualize, understand a vector space, right? Like if you think about, if you are looking at a customer and you said two customers are similar, and you'd actually did that when just like vector embedding search about something, it's super hard to visualize and figure out what a vector looks like versus in graph you have a complete like you know entities relationships you can look at the customer look at what products they have brought you can do complete explanation about what all is uh is around it i think that is a big part of the of the puzzle and then finally uh explainability and auditability right like whenever you want to make decisions you also want to be able to explain why you're deciding something it's not just i can easily develop things i can visualize it but it's also i can explain things in a much better way so higher accuracy easier to develop with and the third one is explainability those are the three big advantages that like GraphRag has over traditional VectorRag environments.

[18:47] Conor Bronsdon:
Uh, thanks for that. That's super useful explanation. And for those watching on YouTube, hopefully this great breakdown that Sudhir is showing is helpful. So, okay, GraphRec, you know, next we have this idea of semantic mapping of information. We've talked about how this enables agents Why is this so important? Obviously, it's interesting because it solves a problem we constantly hear from enterprises. Our data is in 50 different systems and nothing talks to each other. You're not asking them to move all their data. You're asking them to map the river system from which all their data flows.

[19:23] Sudhir Hasbe:
Yeah, that's correct, Conor. I think, let me see if I can pull up one more slide. This is the one which is interesting. There are three patterns we see with our customers as they look at building agentic systems, common architectures, right? The first one is what I call the semantic layer or semantic map architecture, wherein you don't move anything. you keep all your systems where they are, but use knowledge graphs as a primary way to understand your enterprise domain. You understand where what data is, where what decisions are being made, and what is the reference information maybe for some of the data sets that you have. So between reference data, between your metadata and the map of where what is. Now agents can go to this system to understand what the landscape looks like, but to actually make decisions and all that can go to the original systems where the data is. And so in this case, if you look at it, the knowledge layer is sitting on the side and it's not part of the actual decision-making process other than giving you reasoning capability for agentic systems. The middle portion is wherein you say, no, I think there are parts of the puzzle where the data is so distributed into multiple systems that without having some amount of data consolidated, you just can't do reasoning well. So if your customer information is split between 20 systems and you have no single view of customer, you need to bring these things together and what relationship they have with different things so that you can reason over it more consistently. And this example would be if you were, let's say, Salesforce or one of the other large companies, anybody who uses Salesforce in your environment, we are a customer of Salesforce. You have the sales cloud, you have the core CRM system, there's marketing cloud, then there is the service cloud. All of these systems are different. And then if you want to get a single view and then decide like, hey, when a customer calls for support, what has the person got? Where did they come from? What products have they bought? If you want that kind of consolidated place where you want to reason across multiple things, you could put some of that data into the knowledge graph layer. So you basically say it's semantic map plus some domain data falls into that. And then there is this extreme other end wherein You basically say, no, no, no, I want all my knowledge to be in a single system. I will use my agents to go run over that particular knowledge graph and make all decisions. This is Klarna. I think if people haven't seen Sebastian, who is the CEO at Klarna, talk about what they did with Kiki, which is there. Um chat experience for all of their internal employees, uh, they were able to get significant Uh, like they they got rid of bunch of saas applications They built everything on top of this knowledge graph layer in this scenario. They did move all of their knowledge in the company to one single knowledge graph layer and built everything. So I think there are three of these patterns, but I think what I see mostly people start with is either number one or number two, which is combination of one and some part of the data. That's where I think we see a lot of adoption from.

[22:42] Conor Bronsdon:
I think a question that folks who are maybe newer to agent development might bring up here is, OK, great. We've done the semantic mapping. We're starting to have an understanding of where information lives. How is this different from a context graph that I want to enable my agent with?

[22:58] Sudhir Hasbe:
Yeah, I think that's a great question. There's a lot of discussion about memory and context graphs, so let's talk a little bit about both, but let's start with context graphs. Context is very simple. If you are an enterprise B2B environment, if you're selling two enterprises, let's say, You have a CRM system that you'd use. Let's take Salesforce in this case. Salesforce is a system of record for decisions. What does that mean? You basically come in and say, hey, there is an interaction happening between the salesperson and their customer about, hey, we would like you to use our product. They come back and say, but my budget is only X. I can only like, you know, buy at this price, but this is our pricing. There's a lot of negotiation back and forth happening between the customer and the salesperson. That's happening, let's say, in emails and all. Then there is additional stuff happening between the seller and the internal, their manager, management chain. Maybe on Slack, they're talking about, hey, do I need, like, can I go out and give them this much discount? Seems like a really high potential customer. In future, they can grow to a really massive customer for us. That interaction is happening in Slack. And at finally, let's say the sales leadership says, okay, fine, give them an above normal discount so that we can get them started or whatever. So that happens. Now the system of record is going to store the decision, which is like, okay, the discount approved was this, and this is what it is. But all the context graph talks about, this context graph is the decision that was made, which should be recorded, but also the context of the decision, which is not in the systems anymore, which will be important for future decision making for autonomous agents. So this all interaction that's in email and in Slack, can you codify that and put it into the graph so that you can now say, why was this customer given above normal approval? And here is the additional context for it. So the reason why this is important for agentic systems, especially going forward is the other part is like you should do the same thing with agents and agentic systems, like what audits, decision audits you want to have. But it's because now you can make the future decision making much more accurate as we move forward. But that is the whole idea of context graphs is like recording not just the decision, but all the context around it. So your future decisions are much more accurate.

[25:26] Conor Bronsdon: [OVERLAP]
and as those decisions are made they become memories for the agents involved.

[25:32] Sudhir Hasbe: [OVERLAP]
Exactly.

[25:33] Conor Bronsdon:
How do you then differentiate the memory aspect for agents and run that in parallel or align to these other information graphs that we're already diving into?

[25:46] Sudhir Hasbe:
So I think about memory as a much broader concept. Context graph or decision orders is one part of memory. There will be other parts of the memory, which also includes things like all the stuff that agent is doing, right? Like every interaction, And there is also this concept of individual memory versus organizational memory. I will give you an example. If you today go to chat GPD, go to any one of the chat experiences that are available, you can go to their settings and look at personalization and you can see what it has learned about you. It's like a personal memory. Every interaction teaches them something. Mind already knows that I work at Neo4j, I'm this, I have two kids, they're going to go to college soon because I keep looking for college information. So that's what that is. And similarly, if you had an autonomous system doing fraud, like automation or fraud agent, all the individual fraud decisions it's making, it's recording that and it understands. That's the memory, individual memory. But in enterprises, what is also important is to think about organizational memory. It's not that individual agent did this thing about the transaction, but organizational memory would be if there were agents communicating with like the customer service part of it, and they are talking to this particular customer, the fraud agent, which is actually looking at different things and making decisions and recording that in the memory because it will basically be like hey we found this fraud we did this thing this activity was done and all of that if a customer calls on the other agent it should be able to refer back and say okay what happened to this customer what was the transaction did we block it what what's happening what was the rationale what was the decision for that and also i think memory is fundamentally recording of continuous state changes that is happening in the agentic systems and with agents. And then when you look at it at a wider than one single agent or set of agents and go broader in the organization, this gives you like this ever building map of things that are happening in the organization that are all autonomous in general, right? So that's where it could be. Thanks to Galileo for sponsoring this episode. Their new 165-page comprehensive guide to mastering multi-agent systems is freely available on their website at calleo.ai and provides you the lens you need to understand when multi-agent systems add value versus single-agent approaches, how to design them efficiently, and how to build reliable systems that work in production. Download it for free at the link in the show description to discover how to continuously improve for your AI agents, identify and avoid common coordination pitfalls, master context engineering for agent collaboration, measure performance with multi-agent metrics, and much more.

[28:40] Conor Bronsdon:
I like that you're mentioning the difference between a single agent's memory and the memory of an agent swarm, of an enterprise. And I'll definitely recommend once again for listeners that if you haven't already listened to our deep dive on agent memory that came out a couple weeks ago, highly recommend it. But I also think it's important we talk about how these concepts work together. So we've talked about context graphs, we've talked about knowledge graphs more broadly, semantic mapping, agent memory. graph reg. The thing that we haven't mentioned is multi-hop reasoning and the capability that that provides and how this interacts with these other types of information gathering and organization. Can you bring it all together for us, Sudhir?

[29:25] Sudhir Hasbe:
Yeah, I think, let me talk about multi-hop reasoning, right? I think multi-hop reasoning is nothing but like, hey, can you, when a problem comes to the agent, can agent follow through different paths that it needs to go in and hop across different information sets and make the decision, right? That's what it is. If you look at it, I was actually playing with Claude on some things and I was like trying to figure out some use cases for multi-hop uh reasoning part and especially because financial service is such a big one for vertical for us so i think this may not be clearly visible uh but hopefully it is so here is what what i did was i was like okay what are the if you look at fraud ring detection what does the multi-hop reasoning in this world look like and if i really look at like just the model the graph model for this is going to be pretty complex there are accounts there are people identity corporate entities like there's a lot of complexity behind it and if you look at multi-hop reasoning and if you're doing a fraud ring discovery it's basically going to go from this individual john to their iphone which is there to the the individual that's linked to it to their like you know account and then transaction and what the additional mule account looks like and so it's able to go ahead and hop between different data sets to go say is it really like enough fraud or not. And same way, if you look at anti-money laundering, it's going to jump from the legal entity to account to transaction. So multi-hop reasoning is nothing but ability for you to go jump between all these disparate data sets that are in different systems and trying to rationalize between their relationships and make decisions. And so I think if you look at the full stack we offer then, is like hey you will have an llm or some model that allows you to reason through things understand intent you will have like set of tools that are available to uh the the reasoning engines which is nothing but uh like set of uh apis or set of things that we give and there is like this whole knowledge graph layer that brings all the data together where you can reason over all of the complex data assets that you have. So I think a combination of these is actually going to give you the best way to build agentic systems then.

[31:52] Conor Bronsdon:
So this obviously requires a lot of setup. And I think there are folks out there who go, you know, I was supposed to do this magic bullet where it solves my problems. I have all this infrastructure work to do. And longtime listeners of this podcast will know that we've been talking about those challenges for two years almost now. But, you know, bear with me here. Obviously, you know, maybe I can just do one or two of these and, you know, it'll be fine. The models are getting so much smarter. I'm not going to get these hallucination problems that, you know, others are because the models are so much better now. What would you say to those folks who are maybe looking to skip a couple steps in their infrastructure buildup?

[32:34] Sudhir Hasbe:
Yeah, I think, what was it? I was just looking at some data in the industry. 95% of AI projects are still not hitting the production grade deployments. I think it's, if you look at darkness number, it is like 60 plus percent never even go from POC to the next phase and all of that. So I think the problem is for simple use cases, simple document, I want to search something, It is easy to go ahead and take the shortcut and figure something out. But in enterprises, when you try to solve problems which are more complex, it's going to require you to go ahead and look at a system. I come back to Aishwarya's point. I agree with her. You need to have a systems approach. rather than a model can do everything and I'm going to go solve for it. I think a great example of this is when we say coding is a solved problem, right? I love Cloud Code. I've been using it for like a few weeks now. I just love the ability it has. So I think building more systems now through Cloud Code and all is becoming more and more easier. It's much more accurate. but the amount of effort they had to go from like just a generic model to coding as a very generic problem to train the model figure out what the post-training looks like it took a while for us last year plus maybe people have been like improving that model it's getting better and better and it's really good now But that's again on a public data asset where a lot of code is available in GitHub and you can go do things, learn and all. I think as we get into the enterprises, there is no shortcut to this. I think with a lot of new tooling for developers, it's becoming easier to think about these infrastructure components. But I think taking shortcuts in enterprise is not a good strategy. You know, a lot of times, especially we work with a lot of regulated industries, financial services, government. We also work with life sciences. One of our best examples is Novo Nordisk. I can tell you more about it. But all of these different companies do need like, you know, reliable AI systems that can be audited, can be like this. So I will not take shortcuts. I would focus on accuracy. I think the bar right now is like accuracy and auditability or explainability. And if those things are good enough for your use case, then it's great. You should build the simplest system possible for you to go get your results. But like a lot of enterprise scenarios are more complex than just let me do a vector search and find a document. That's not the only thing you do.

[35:15] Conor Bronsdon: [OVERLAP]
Yeah, I think it's very interesting to compare the risk tolerance of enterprise businesses that are serving major maybe B2B customers or millions of users around the world to folks who are working with OpenCloth, for example. And I'm not trying to trash OpenCloth. Very, very cool project. I have a friend who's a maintainer there. He's gone to now sometimes shipping like a thousand plus commits a day. And throughput isn't everything, but it's interesting to see their approach where they're saying, look, we are going to refactor our code constantly. We're just going to keep building forward.

[35:51] Sudhir Hasbe: [OVERLAP]
Yeah.

[35:51] Conor Bronsdon: [OVERLAP]
We're going to churn code. We're just going to keep redoing it. And our speed and velocity is going to solve the problem. And they're able to do that because it's an open source project. They have an army of folks who are looking at security. and they have different rules around what they need to hit. Whereas if I'm a fintech company and I have major regulatory concerns, it can be a very different model for what my risk tolerance is and what I need to achieve in production consistently.

[36:19] Sudhir Hasbe:
Yeah.

[36:19] Conor Bronsdon:
And one of the big problems that comes up when this infrastructure isn't properly set up is hallucinations that impact customers, impact internal teams. Sudhir, what are you seeing happen when an LLM is given access to thousands of tools and no constraints on which ones to use for a given task.

[36:38] Sudhir Hasbe:
I think even the LLM companies would tell you, like, of course, everybody says then give it everything you have and let it make decisions. But I think that also expects us to be really good at what we are giving to these tools, which doesn't work. So I have seen many of our customers tell us like when they give, hundreds of tools to the same reasoning engine for a problem solving it actually comes becomes worse and worse as you add more tools starting with small set of tools it's really good i keep adding more and more tools and it starts becoming worse and worse and it this is the same thing you know when when you're also interacting with these LLMs on a day-to-day basis, the more information you keep giving it once it started answering something, and if you keep adding it, it becomes less accurate. And I think that is a very common thing that I'm noticing. And so I think architects at multiple of our customer companies have brought this up, like I was talking to somebody who used to be at Intuit, now is at Salesforce, and they brought it up like, hey, what they noticed was, like, you had to figure out what is the relevant set of tools for that particular use case that would make sense. So if you could limit the number of tools to relatively good size, but not unbounded, which are relevant for that particular use case, I think then it becomes much more, accurate as a system so this is one of the other things is like how do we take the enterprise knowledge and provide only what is contextually required for that set of use cases is a second level of things that that is going to be really important that will allow you to have much better business validation for the use cases that you have like get me customer information as one step of the agent is different kind of information for fraud versus like i want to do some kind of a marketing campaign for that particular customer like it's different things and if you had tools that were like similar looking but for different scenarios it will not be able to go ahead and reason over it and also the second thing corner i think is interesting is just giving tons and tons of stuff to llms is only going to drive a lot more tokens and cost it's going to be cost prohibitive right yes you should give everything to it but if it is unnecessary irrelevant information you're just increasing your tokens for no reason in the whole system so i think it's it's accuracy plus cost management are the two sides of this thing where having good layer of a set of things across your use cases would be an important thing to focus on.

[39:19] Conor Bronsdon:
So what does a good business validation layer look like? How can a graph enforce that an agent's actions are valid according to business rules?

[39:29] Sudhir Hasbe:
I think my thing would be like based on your use cases, if you know what data assets, what tools are most relevant for mapping that, and then using that when you are going in the reasoning layer, like when you're going to expose what tools to expose, then I think the knowledge graph can be a place. For example, for fraud, here are the 10, 15 most relevant tools that are relevant. in that use case. If that happens, before sending all the tools to the reasoning layer, you would basically say, okay, the intent is very clear. It's this one. Here are all the tools that need to go. Then it becomes that. I think there's a more complex version of this. I think Uber is one of our customers. They talked at one of our customer summits and they were explaining. For them, it's way more complex. They called like The the knowledge graph layer as a business validation layer and the reason was think about the number of jurisdictions Uber eats has to work in and if you're in Seattle area they have Seattle area has different rules on like the payments like the minimum wage. And when does that apply? If the ship, like when the person is from Seattle going to Bellevue or like neighboring cities, then these rules apply. The neighboring jurisdiction may have different rules applied. And all of these policies and rules are all over the place in the current systems that individual systems can take care of. Now in the new agentic world, like can, how do you make sure that if agent is going to make a decision on payment, it is applying these business validations really in the right fashion and stuff like that and so that's where they are using the knowledge graph is to have these complex thousands and thousands of jurisdiction combinations to be converted into a graph so that you can then use that as a validation tier. for them. But I think that's on the more advanced version of it. I'm saying even the basic stuff about, hey, what are the use cases? What are the tools? How do you restrict it? And then what decisions can that particular agent make within that realm, I think will be an interesting thing to start.

[41:37] Conor Bronsdon:
Let's talk about those restrictions a bit more. This connects back to something I recently spoke with Dan Klein from Skilled Cognition about on the show, where he talked about the gap between a prototype that works in a demo and a system you can actually ship. This is a very common refrain, obviously, but one of the interesting things he brought up was that prompting creates a vagueness layer because it's less precise than if we were using code to you know, say, hey, here's explicitly what to do. Often, if we're prompting, we're using human language, we're using English, typically. And English is notoriously a challenging language to learn for individuals, and one that has deep context into how certain things are actually explained, understood. It's part of why we're seeing developers often begin to do their prompting via voice instead, because you're able to much more rapidly provide additional context and kind of think through and talk through a problem instead of adding your own validation layer internally as you write out. Like, when we're writing, we're distilling our knowledge, and maybe there's something that's lost in translation there at times. So, as that vagueness layer, as Dan put it, begins to impact what's flowing into our agents and the orders we're asking them to pursue. There is this other side that you're talking about where we need guardrails to ensure that any vagueness that works its way in or any incorrect information or incorrect context that works its way in does not derail the activities that we're seeking to have our agents do. And you and the Galileo folks in the show and others have talked about this idea of policy-based guardrails. And you're saying we should encode them in the knowledge graphs itself, it sounds like.

[43:24] Sudhir Hasbe:
Yeah, that's correct. And I think I will tell you that it all depends on use cases, right? We work with one of the larger cloud platforms. They use us for all of their security graphs and security stuff. And if you are going to make a decision of shutting down some account because you feel it's been attacked and there's an exposure of risk there, you better be sure that you're going to shut down an account that is going to have an impact on some business you are perfectly accurate about that decision and that does require like hey what are what are the guardrails on it like what have you have to see absolutely true and what tools are allowed to do what action you take it needs to be really well articulated flow of the business process or business validation that it has to go through. It needs to recheck and all of that. Similarly, if you think about self-driving cars, why is it hard to build a self-driving car? Because you just can't get into an accident. It requires a lot of validation and a lot of stuff to be done in a self-driving car, right? So it's things like the systems that are critical to businesses for them to be like accurate is a necessity. It's not an option.

[44:42] Conor Bronsdon:
You've said before that we are in the first mile of a 26-mile marathon for agentic platforms. How do you see this stack and our approaches continuing to evolve? How do they need to evolve?

[44:58] Sudhir Hasbe:
I think it would be hard for me to predict everything. The marathon thing comes from like I love running and I have

[45:06] Conor Bronsdon:
You do have a ton of metals there now that I'm looking. Okay.

[45:09] Sudhir Hasbe: [OVERLAP]
I've run like four full marathons. I've done 60 plus half marathons. And it's one of the things I try to do every year is like do at least now I'm getting old, but like four or five halves at least a year is my goal. I've done two this year and I want to do more this year. So that's where it comes from. It's like mile one is too early to predict what the mile 26 looks like. And my thing is like, At least what I know is like when the whole agentic system thing came, and I will give you some history, right? When I was in Google in 2018, I announced this thing called Data Q&A. It was supposed to be like, ask any question to your data and you'll get answer to. We were so early, there were no LLMs. We had the internal system called Analyzer, which is what Google was using in different experiences. And one of the problems it had was, it just didn't have intent understanding it couldn't understand your intent and so intent was hard-coded so whenever you said something if this was the intent you do this it's like purely like rule-based like really not an extensible system llms come in and now we have this new capability wherein you could understand intent it will hallucinate but at least we knew what what we were getting more and then with vectors now we had this rag pattern and we realized rags were not good enough and all that take like another uh like you know few months to 18 months later now we know few key components in the system that are going to be around there's going to be a large language model which is going to provide you with a lot of intent understanding, language understanding. It also now has all this reasoning capability. So if you give it a set of tools, it will be able to reason through your intent, find the right tool to do activities. So we know large language models are going to be important. you also know in enterprises you will need a data architecture that actually is going to be critical so that these reasoning models can actually make good decisions and so that's where i believe you need to transform your data into knowledge through knowledge graph layer and so that's what that is layer is going to be and then you have like this ability to define additional context additional stuff that is part of memory or context and context graphs and all that's additional new kind of data that will be captured in the system so i actually think those are the key components in next 18 months i'm not sure what would change i think the world is moving too fast So I think we should do this thing in another year's time and see where we have landed now. I think if I see the innovation that's happening with coding and how fast coding is, like I just was blogging about, like I wrote an article saying the cost of development has now gone to zero. And so like cost of storage went to zero, cost of writing new code is going to zero. If that is the case, I don't know what next innovations are going to come and how

[48:08] Conor Bronsdon: [OVERLAP]
Well,

[48:08] Sudhir Hasbe: [OVERLAP]
fast.

[48:08] Conor Bronsdon: [OVERLAP]
tokens are nothing, let's note, but I hear what you're saying.

[48:12] Sudhir Hasbe: [OVERLAP]
No, no, I get it. I mean like the incremental cost of writing code, that development time of code was the most longest time. Now

[48:19] Conor Bronsdon: [OVERLAP]
Yes.

[48:20] Sudhir Hasbe: [OVERLAP]
it's like,

[48:20] Conor Bronsdon: [OVERLAP]
Yeah.

[48:20] Sudhir Hasbe: [OVERLAP]
if you have an idea, you can generate code very easily. You still

[48:23] Conor Bronsdon: [OVERLAP]
Velocity

[48:23] Sudhir Hasbe: [OVERLAP]
need

[48:24] Conor Bronsdon: [OVERLAP]
is increased

[48:24] Sudhir Hasbe: [OVERLAP]
velocity.

[48:25] Conor Bronsdon: [OVERLAP]
rapidly.

[48:25] Sudhir Hasbe: [OVERLAP]
Yeah, exactly.

[48:25] Conor Bronsdon: [OVERLAP]
Yeah.

[48:25] Sudhir Hasbe: [OVERLAP]
What I mean is velocity is growing like crazy, right? So we'll see where we land in mile five and mile six, I guess.

[48:32] Conor Bronsdon: [OVERLAP]
And I know part of what Neo4j is doing to prepare for this future is that you've launched your Aura agent, a no-code, low-code environment for building and deploying agents that are grounded in your knowledge graphs.

[48:45] Sudhir Hasbe: [OVERLAP]
Yeah.

[48:45] Conor Bronsdon: [OVERLAP]
We've also shipped an MCP server for integrating graph-based memory and reasoning into existing agent applications. How do you see yourselves and your company continuing to build for this agentic future in the coming months?

[48:58] Sudhir Hasbe:
I see my thesis is very simple, right? Like if you have your information, we will make it super easy to move it into knowledge graphs, to build knowledge graph. And the second is, once you have your information in the knowledge graph, to build an agent, it should not like be a big deal. We can help you build agent. deploy agent to production pretty quickly what we only need to understand then at that point is what's your use case and what what are you trying to achieve with that agent we can help you with a lot of help from all the coding agents and assistants and all we can help you generate the code that the agent will need a set of tools We have pre-built reasoning model. We internally use Gemini, the latest version of Gemini model to go ahead and run as a reasoning engine. Gemini is really, really good for some of these enterprise use cases. We basically run that in the background. We take your tools that you have defined, and then for your use cases, we actually go ahead and give you a platform that within minutes you can build an agent deployed to production. And then we expose that also as an MCP server because in a multi-agent system you may want to go ahead and use the set of tools that you have for this agent in other places. We also are planning to go ahead and out-of-box support like A2A protocol. So if you have multi-agent systems like agent core in Bedrock or you have like Gemini Enterprise or something else, you should be able to use these agents in different platforms for composing more complex agentic workflows. So I think that's been our strategy is we want to be part of the ecosystem and how we get integrated, but to build like the core agents that can make decisions and do things, we should make it super easy for our developers.

[50:43] Conor Bronsdon:
For developers who are listening and want to leverage knowledge graphs to power their AI systems and their agents, where should they start?

[50:53] Sudhir Hasbe:
They should just go to Neo4j Aura. They can come in. If you have data in one of your data lakes, Snowflake Databricks, you will be just able to, in three clicks, get that into Knowledge Graph. In just a couple of clicks, you should be able to create an agent, enable it as an MCP server, go to Cloud or any other platform where you can add a MCP server. Just add it and start chatting and do magic from there.

[51:19] Conor Bronsdon:
Fantastic. Sudhir, thank you so much for joining me today. It's been a great conversation. I appreciate you walking us through all the different aspects of knowledge graphs, your thinking on the future of AI agents, and much more. I really appreciate you coming on the show.

[51:33] Sudhir Hasbe:
Thanks, Conor. Thanks to you and thanks to all your audience for having me here.

[51:38] Conor Bronsdon:
I also highly recommend following Sudhir on LinkedIn where he shares some fantastic insights. And be sure to check out Neo4j and see if they're right for you. And while you're at it, consider subscribing to our newsletter at newsletter.chainofthought.show where we will be talking about agent memory. talking about context graphs and so much more in essays coming up in the next several weeks. A couple of them may actually be out by the time this show airs. So we hope you enjoy it. We hope you enjoyed this episode. And thank you all for listening. Sudhir, thanks again.

[52:12] Sudhir Hasbe:
Thanks.