AI is reshaping infrastructure, strategy, and entire industries. Host Conor Bronsdon talks to the engineers, founders, and researchers building breakthrough AI systems about what it actually takes to ship AI in production, where the opportunities lie, and how leaders should think about the strategic bets ahead.
Chain of Thought translates technical depth into actionable insights for builders and decision-makers. New episodes bi-weekly.
Conor Bronsdon is an angel investor in AI and dev tools, Head of Technical Ecosystem at Modular, and previously led growth at AI startups Galileo and LinearB.
FINAL TRANSCRIPT
================
Speakers: Conor Bronsdon, Richmond Alake
Duration: 59:23
Total Words: 9350
Generated: 2026-04-01
---
[0:00] Richmond Alake:
Memory is the last battleground for anyone that operates in the AI stack. And here's what I mean. Everyone is trying to solve memory. Database companies, model providers, everyone has got a say on how memory should be modeled in a genetic systems. But a single developer can build a memory system that can remember better than a chat GPT.
[0:28] Conor Bronsdon:
We are back on Chain of Thought, everyone. I am your host, Connor Bronsdon, Head of Technical Ecosystem at Modular. Before we dive in, a quick thank you to our presenting sponsors, Galileo. Check them out at Galileo.ai for your AI eval needs and much more. My guest today is really fantastic, honestly, and that is Richmond Alake. Richmond is Director of AI Developer Experience at Oracle. He's one of the most serious technical voices on agent memory right now. His AI Engineer World's Fair talk on architecting agent memory recently crossed 100,000 views, which may leak to some listeners where I'm recording this and when. But he also built the open source Memo RIS library, and he's been running 100 days of agent memory series that has produced some of the most concrete examples and implementation content I've seen on the topic. Not to mention, he also co-created a chorus with Andrew Ng, which is not a small thing. Richmond, welcome. Where are you joining me from today?
[1:28] Richmond Alake:
joining in from the outskirts of London in a place called Hertfordshire and yeah, in the UK.
[1:35] Conor Bronsdon:
Richmond, I have to thank you for coming on, not just because I always love chatting with you, but you bring so much more charisma to this show simply from your tone of voice compared to most of my American guests. Sorry to them, but we can all be honest here. Great to see you. You have been diving into agent memory and have such a depth of knowledge on this topic. Most agent failures are being blamed on models, but often the corporate is the memory, the context, or the lack of it. But what's your actual claim around agent memory? Why does memory engineering deserve its own name, its own categorization, separate from prompt engineering or context engineering?
[2:23] Richmond Alake: [OVERLAP]
Yeah, that's a good question. So thanks for having me on the Channel 4 podcast. It's going to be great. We're going to have a very good conversation. One thing around, to answer your question, why does memory engineering deserve its own sort of attention? And it's really because over the years, I've spent some time in database companies. I'm an AI guy in a database world. And one thing I've learned from speaking to teams building AI solutions is the failures that they have and the difficulty that they have is all around their retrieval pipelines. It's all around how optimal the RAG pipeline is, what retrieval techniques are you using, the quality of the information you're getting from your databases. And these are the conversations I was having and I started to realize that a lot of the agent engineers that are coming into the space, a lot of the AI engineers, AI developers, actually don't have that database knowledge. And to be honest, some
[3:29] Conor Bronsdon: [OVERLAP]
Yeah.
[3:29] Richmond Alake: [OVERLAP]
of them don't even want to learn that, right? But when you are able to communicate the importance of memory in agents and then really define the job to be done in a way that AI engineers and agent engineers understand and care about, you start to get into the world of memory engineering. It's very simple to understand, but it's nothing new. It's actually an intersection between a bunch of roles that we have today. It's an intersection between data engineering, database engineering, software engineering, agent engineering, and then you have this set of developers are focused on how to implement systems that learn and remember and adapt. Those are your memory engineers.
[4:20] Conor Bronsdon:
And it seems like you see a clear progression here from prompt engineering to context engineering to memory engineering. Why this differentiation and what's your recommendation to developers who are thinking through these topics? How should they approach each one differently?
[4:40] Richmond Alake:
Yeah, so there are multiple dimensions to why the differentiation. So we can start from, let's look at job descriptions, right? The reason why you want a differentiation in terms of what your developers are doing is because when you're hiring for the developer, you want to explain what they're going to be doing on the job. So if, for example, with prompt engineering, people understood it as
[5:05] Richmond Alake:
just writing a bunch of prompts, some magic words that will let the LLM behave in a certain way. But that's not the job that you ended up doing, because what you ended up doing was building RAG pipelines and actually looking at things like AI evals, evaluating your RAG pipelines, learning a bit about more on the context engineering side, and working with tools like LangChain, LamaIndex, ASAC, you actually don't do that much prompting anymore, especially with the increased capabilities of the reasoning models now. So One, really clearly defining the job to be done is one of the key advantages of really splitting up these roles into several distinct engineering disciplines. So that's the first one, job description. Two, just having the people in the field understand what you bring to the table. Right? So when I tell people, or when I saw the term context engineering becoming more popular, and the definition around context engineering is actually systematically selecting the context you put into the context window of an NLM while being aware of its limitation and constraints. that explain a more sophisticated job to be done in comparison to prompt engineering, right? Because now you're starting to explain to developers or people that are outside of the role, then when you look at When you look at context engineering, if you look at the definition itself, it's basically the systematic selection of information you put into the context window while being aware of the constraints of an LLM. So it really communicates to people that This is an engineering discipline. It's going to be bogged down in principles and design patterns and all of the richness of the software engineering disciplines that we've had over the years. It's not going to be a very simple task to be done, right? Then when you go into memory engineering, this now talks about The term itself really encapsulates the fact that whatever engineer chooses to define themselves as a memory engineer, it tells people that you are very concerned with the abilities of these AI systems to remember and adapt, right? It means that's what you're communicating to people, right? So those are why the disciplines and the distinct forms of the disciplines are important.
[7:57] Conor Bronsdon:
What about the failure mode? So obviously we've heard about context poisoning. What's the failure modes that you are seeing really impact people as they build their agents but fail to realize the depth of the memory engineering they need to get done?
[8:13] Richmond Alake:
One of the failure modes, it's usually at the
[8:20] Richmond Alake: [OVERLAP]
mental model you apply to how you're building your agent, right? And that is not using, not thinking memory first. That's a top failure mode, right? Not applying the right mental model. You're not thinking memory first. Memory should be a first class primitive in your systems that you're building. If you want them to actually perform well in production and you want humans to use it for longer rising tasks, just long communication, long interaction, then you need to start thinking memory first. So that's the first failure mode, is engineers are not thinking memory first. And if you start to think memory first, you start to see every piece of information as something to be remembered or forgotten. Now that affects the way you start to think about your data models. It affects the way you start to think about the attributes you put into what I call memory units, which are just, think of it as just the atomic unit representation of information, right, within an identic system. For example, when you start thinking about memory, and you think about information within your database, the information, there needs to be an attribute that allows the information to be recalled and forgotten. You need to implement that mechanism. You need to engineer that mechanism. There isn't a industry standard way of doing it. There are different ways to do it. So you have to also then evaluate the system against benchmarks and benchmarks out there like LongMemEVA or Locomo or Membench, just keep evaluating against these datasets. So against these benchmarks. So that is one of the failure modes, is the mental model. Another failure mode, if I think very, very deeply here, there's a principle I like to communicate to people, which is you don't delete information in memory engineering, you forget. Right?
[10:14] Conor Bronsdon: [OVERLAP]
So really equating it to how a human would approach things.
[10:18] Richmond Alake: [OVERLAP]
Exactly. Don't delete, forget. And that helps you understand, one, lets you understand the complexity of the job to be done. But two, it helps if you're working in initial critical applications or you're working in finance where audits, you get audits, I don't know. I don't know how many times financial institutions get audits in a year, but you're going to
[10:41] Conor Bronsdon: [OVERLAP]
But
[10:41] Richmond Alake: [OVERLAP]
get
[10:41] Conor Bronsdon: [OVERLAP]
regularly,
[10:42] Richmond Alake: [OVERLAP]
audits. Yeah,
[10:42] Conor Bronsdon: [OVERLAP]
yeah.
[10:43] Richmond Alake:
exactly. So it really helps one from a use case perspective, but also just implementing an engineering perspective. There are other failure modes, which are just typical of building systems and software engineering, but those two are the two that I'll say are top. One, not applying the right mental model when you're starting to build, which is think memory first. And two, not applying the principles, which one would be don't delete, forget.
[11:13] Conor Bronsdon:
Was there a moment when you personally hit the ceiling with prompt engineering and said, Oh, I need memory engineering to be its own discipline.
[11:22] Richmond Alake: [OVERLAP]
Yeah, so like I said at the beginning, If we say prompt engineering, right? We're saying just writing stuff into the NLM
[11:32] Conor Bronsdon: [OVERLAP]
Yeah.
[11:32] Richmond Alake: [OVERLAP]
to do something, right? Again, that was not the job we were doing, right? Because what we're doing, we're thinking about our chunking strategies. Right? And prompt engineering doesn't even tell you that you, doesn't communicate to people that you're doing that. So there was, I wouldn't say I was ever a prompt engineer. I was prompting and you put prompts into, instructions into this LLMs, but we were more concerned about what embedding model are we using? What chunking strategy are we using? What retrieval strategies are we using? Are we using vector or lexical or hybrid? How do they work against our use case? we were not really obsessed with are the, what would I call it? Is the LLM following the instruction accurately? Because that was really dependent on, it was really dependent on the information you provided it, but also the reasoning models or the reasoning capabilities of these models just made it so that you'd have to write as much prompt as you needed to before, right? Because they could follow instructions more carefully, it reduced
[12:37] Conor Bronsdon: [OVERLAP]
They're getting better and better,
[12:38] Richmond Alake:
that
[12:38] Conor Bronsdon:
yeah.
[12:39] Richmond Alake:
Exactly. So yeah, that's how I stole things. So was there ever a ceiling? No, there wasn't because I never saw myself as a prompt internet.
[12:49] Conor Bronsdon:
What does a well-architected memory system for EA Agents actually look like? And I'll note here, if you want to share your screen and walk us through something or show us an image, you're welcome to.
[12:59] Richmond Alake: [OVERLAP]
Okay, let me see if I can share my screen. I always have some demo which is showing up anyway.
[13:08] Conor Bronsdon: [OVERLAP]
yeah
[13:08] Richmond Alake: [OVERLAP]
So, honestly, you should put this part in the recording because Conor didn't even tell me to prepare for this. He was just like,
[13:14] Conor Bronsdon: [OVERLAP]
i should have i should have to be clear yeah
[13:16] Richmond Alake:
just catch me off guard. But no, I always have some
[13:19] Conor Bronsdon: [OVERLAP]
i was i
[13:19] Richmond Alake: [OVERLAP]
good
[13:19] Conor Bronsdon: [OVERLAP]
was just thinking as we're talking i'm like oh richmond's got a demo of course he does
[13:22] Richmond Alake: [OVERLAP]
I
[13:22] Conor Bronsdon: [OVERLAP]
like
[13:23] Richmond Alake: [OVERLAP]
always have
[13:23] Conor Bronsdon: [OVERLAP]
yeah
[13:23] Richmond Alake: [OVERLAP]
some application that I'm building, and because I always
[13:25] Conor Bronsdon: [OVERLAP]
yeah
[13:26] Richmond Alake: [OVERLAP]
build my application in agent's memory first, so they all encapsulate what I try to communicate to people. So let me share my screen here.
[13:35] Conor Bronsdon: [OVERLAP]
I
[13:35] Richmond Alake: [OVERLAP]
Let's
[13:35] Conor Bronsdon: [OVERLAP]
think what Richman's saying is he lives the work, not just
[13:38] Richmond Alake: [OVERLAP]
go. This
[13:38] Conor Bronsdon: [OVERLAP]
talks
[13:38] Richmond Alake: [OVERLAP]
is fantastic.
[13:39] Conor Bronsdon: [OVERLAP]
it. Acronyms are great. Yeah.
[13:42] Richmond Alake: [OVERLAP]
Absolutely, exactly. So, AFSA is basically, this is a demo and it's got a bunch of synthetic data. All the data is held in an Oracle AI database. And I'm using AFSA to basically showcase how a very intelligent chat assistant would look like within a wealth management institution. So wealth management institutions are financial institutions that are concerned with handling the investments of high net worth individuals like myself and Connor. But
[14:18] Conor Bronsdon: [OVERLAP]
You are so kind for including me on that list, but I'll
[14:21] Richmond Alake: [OVERLAP]
Well,
[14:21] Conor Bronsdon: [OVERLAP]
take
[14:21] Richmond Alake: [OVERLAP]
I'm
[14:21] Conor Bronsdon: [OVERLAP]
it.
[14:21] Richmond Alake:
recruiting myself. But yeah, they manage the investments for like massive wealthy families and they do a lot of investment allocation. So this system is an agentic financial services system that can query different tables. So I'm just going to do something. I'm just to make sure it's working. So let's see. Let's run what we call a convergent search. And I'm sure the servers are still running. I'm hoping they're still running. So this is a convergent search. And normally what we have, or one of the anti-patterns I always saw in a lot of engineering teams where they would use multiple databases, right? They will use multiple databases within their agent architecture or their infra, AI infra. And that's because The nature of data within production today has this heterogeneous nature, right? You would encounter different types of data in production. So, and there are different types of data that are useful and data can be represented in different... There's something happening on the screen. I am going to explain what's happening on the screen, but
[15:35] Conor Bronsdon: [OVERLAP]
No,
[15:35] Richmond Alake: [OVERLAP]
I just want
[15:35] Conor Bronsdon: [OVERLAP]
talk
[15:35] Richmond Alake: [OVERLAP]
to...
[15:36] Conor Bronsdon: [OVERLAP]
me through it. Talk me through it. It's great.
[15:37] Richmond Alake: [OVERLAP]
So
[15:37] Conor Bronsdon: [OVERLAP]
Yeah.
[15:38] Richmond Alake:
it's one of the things where a lot of developers, I saw them leveraging different database for different data types, so I wanted to use vectors, so I go get a vector database. Okay, I want to use JSON, I go get a NoSQL database. Okay, relational, I use Postgres, and then, oh wait, I want to do graph now. Okay, let me go get another graph database. But the thing is this. Now you have four databases that you have to manage and synchronize data across them. So this is what I try to showcase, right? I have this intelligent AI assistant called AFSA that can actually get me a client ID. I'm using the client ID. I'm trying to find connected accounts. Connected accounts is going to be a graph traversal, right? I'm trying to find nearby clients, that spatial search, and try to get the relevant risk research. That is vector similarity. So one query, it requires different types of search. So what we have is this converged search. And in this demo, I show how Oracle AI Database can handle different search mechanisms in one statement. Right? Just four mechanisms of search in one statement. And this reduces your AI infra into just using one key database. So this is something that I communicate to a lot of developers. And we can see here, we've got the information of the account ID. And we have some other accounts that are related to this account ID. The relationship is on their risk profile and portfolio overlap. That was the relationship that this account had with the other account. And then we have some other relevant search that we retrieved. And this agent can use the tool. The convergent search is a tool for the agent. But this is where I show memory. This is where memory comes in. Over here, the context window of AFSA is segmented and partitioned to allocate for different memory types. So I talk a lot about agent memory, I talk about different memory types, but one thing I talk about now is memory aware agents, which is the agent is aware that it has different abstractions that represents different memory types and included in in the contents window is how to use the contents of each memory types, then followed by the content. So for example, we have the conversation. I tell it, this is conversation memory and this is the purpose of it, right? And then we have the conversation that we just had in the conversation memory and the results of it. But then we have the knowledge base because it had to search for, oh, I'm just going to zoom in.
[18:30] Richmond Alake:
So it has a knowledge base right because you have to go through the risk profile and you can see what's searching through our semantic memory which is knowledge base but we have workflow so this agent can actually record. It steps and it's almost like a human when a human does a task. we are recording it, right? We're recording routines and skills, right? Now that skills MD is a thing. So I built this agent to record everything that it's doing as workflow memory, in workflow memory. So it recognizes that there was a query that was given, and then the steps it took to run the query. And then the answer. This is experience for the agent in the next iteration. So there are other things that I've included, like entity memory and summary memory as well. I can go deep into any of you want.
[19:22] Conor Bronsdon:
Yeah, I would love to understand how you map these different memory segments into cognitive types aligned to humans, because I know you've done some work around identifying basically the model or patterns for that.
[19:37] Richmond Alake: [OVERLAP]
So I'm not a neuroscientist yet. I don't know.
[19:41] Conor Bronsdon: [OVERLAP]
Yeah, I'll give you a couple of years. We'll see. I've
[19:43] Richmond Alake: [OVERLAP]
So
[19:43] Conor Bronsdon: [OVERLAP]
seen your educational background.
[19:47] Richmond Alake:
one thing is, I read a few papers and from neuroscientists, I've spoken to, I was at an event at AGI house in San Francisco. Yeah, AGI house in the Bay Area actually. And I spoke to a bunch of neuroscientists and It was an event where we brought neuroscientists and people that were into computational forms of memory and aging together, and it was so interesting to see how these two worlds are coming together. Because we've always drawn inspiration, I mean, from a technology perspective, we've always drawn inspiration from nature, right? To build planes, we look to birds. Convolutional neural networks from deep learning and computer vision are inspired by the human neurons and inspired by research done by two neuroscientists called Hubbell and Wiesel, where they experimented on a cat to see how visual stimulus affected the visual cortex of the cats. And they mapped a couple of the activations of the cat's brain to learn a lot. And convolutional neural networks actually gets inspiration from their research finding. The point is we always look to nature and nature inspires technology. So I did the same with agent memory. I looked at what do neuroscientists say about memory in humans. Well the first thing they say about memory in humans is that it's very complicated and it's not solved. We don't even know how it fully works yet. But one thing is this, we understand that memory is not just one thing. There isn't just like memory as one thing. We have different parts of our brain responsible for different parts of memory, right? Or different types of information that we store and retrieve. So just by that, you can imagine your context window if you're taking inspiration. Okay, now I need to start to think of segmenting the context window for different parts of for different memory segments, just trying to draw parallels here, right? And that's why I've done it this way. So that's why it's memory aware. And the second thing, the inspiration I also got is, there are three main forms of, or four main forms of memory in humans, right? We have working memory, We have episodic memory, we have semantic memory, and we have procedural memory. And all of this can correlate to computational form. So working memory is your context window. It's short term, it's limited. Or it could be a scratch pad, external scratch pad, right? Semantic memory is just your knowledge base. Institutional knowledge is semantic memory. Episodic memory, that is any information that has a timestamp, such as conversations, the interactions you have in your system. Then fourth, procedural memory can be the skills. We have skill MDs, MD files now. That's a form of procedural memory. So you can see the correlation that we're drawing.
[22:50] Conor Bronsdon: [OVERLAP]
No, you're just making me, I mean, yes, I have so many questions, but you are making me sit here. I'm trying to like map different things I do to these memory categories you've outlined. So I did not expect to go deep into neuroscience today, and I am very excited we did. And you're just making me think maybe I need to have like a full-on neuroscientist on the show now. Like this, okay, this is great.
[23:12] Richmond Alake: [OVERLAP]
Dude, if you do, I will watch that episode first. And I don't think we've gone deep. Neuroscientists would be like, yeah, we learned
[23:18] Conor Bronsdon: [OVERLAP]
Yeah.
[23:18] Richmond Alake: [OVERLAP]
that in life.
[23:18] Conor Bronsdon: [OVERLAP]
They're like, this is, this is kid stuff. What are you talking about? Yeah. Oh, that's funny. Um, all right. Well, let's talk about a specific memory type, maybe procedural memory, for example, here. So it sounds like a lot of us would map this very easily to work we're doing around setting up like a claw.md or an agents.md file, uh, to shape agent behavior. How do you fit this perspective into production architecture and like, what are the recommendations around being successful with agent memory on a procedural front.
[23:51] Richmond Alake:
Yeah. So on the procedural front, we're looking at procedural memory essentially would hold information that encapsulates maybe skills. I don't want to use the word skills in different ways. So I would say routines or way to do tasks. So if we bring things to humans, procedural memory for humans would be in like an organization would be like SOP, Standard Operating Procedure documents, which is how to do tasks. The way that we see this in production systems is this, right? In production systems, you don't want to be using files, essentially, right, to store your customers' information or any sensitive information. You want to be using more robust infrastructure components like a database, right? And the reason is this. Whenever you get to production, your system is going to be used by hundreds, thousands, and successful production applications are going to be used by millions of customers. Now, that means there's going to be a lot of transactions hitting your storage systems. And when you start to think about scalability and availability and even making sure information do not conflict with each other, you start to realise that there is a whole There's a whole amount of logic and engineering you need to build around these files to make it production ready. But then you start to realize that you're building a database, right? When you start to put things like ACID transaction, because you can lock files, right? When a file has been accessed, you can lock it. and then another person can't actually edit the file. But the problem is, there are other disadvantages to using file scalability. And I know the rave at the moment of recording is files are all you need or files and grep are all you need. But really, you got to be thinking in production, you got to be thinking auditability, thinking scale as well, and all the things that database folks have been trying to solve for decades. So that's what I see in production. Firstly, how you apply procedural memory in production is whenever you have skills, right? In PLCs, you have this skills.md. What can you do with these files? Well, you can store them in a database. You can store them in an Oracle AI database within a table, and the table will hold all of the different data types like I've showed you. You can hold the content of the actual file itself. You can hold the vector representation of the skills, so you can retrieve the right skills or a number of skills for an incoming query using Similarity Search. Right? You can have other data types, like what's the relationship of one skill to other skills? Then you could do some form of graph traversal. The key thing is, you can take your skills.md, however many you have, put it within a database with different projection and representation of the data. Using the right retrieval strategies, you can retrieve these skills when you need them. That's how, that would be a production paradigm.
[27:11] Conor Bronsdon:
I'd love to relate this back to something you brought up earlier, which is forgetting memory. You said this is something people often don't consider as they are architecting their systems. How does implementing controlled forgetting apply in this kind of production implementation?
[27:33] Richmond Alake:
there is no and i said when i mentioned there is no industry standard of doing this for example the way we measure relevance between data object is using vector search and using some form of relevance right we have a relevant score and if you use some vector search algorithm like IVF or HNSW and you can use that to look at some similarity. You can use mathematical operations like cosine and Euclidean distance to measure the distance between two vectors. For forgetting, there isn't an industry standard, right? But there are approaches.
[28:11] Richmond Alake:
But there are projects. And one of the ones that is really easy to implement, it comes from a paper called Generative Agents, written by folks at Stanford. It got released in 2023. And in that paper, they add an attribute to each memory unit within this agent simulation environment. And that attribute is a computation of three variables, which is the relevance, the recency and the importance of the information. And with that weighted score, they can actually start to rank information and update information as they go through this entire system. So recency is timestamped. right? Relevance is just semantic similarity. Then importance, the way you can see importance would be maybe how many of a memory units is this memory unit related to or is it called by? So you would have this variable or you have this weighted computation that increases and decreases for each of these memory units. And that could add a natural way for information to be forgotten because if information isn't retrieved or
[29:42] Richmond Alake:
it isn't a dependency to other memory units, that score will start to reduce
[29:48] Richmond Alake:
within that system.
[29:50] Conor Bronsdon: [OVERLAP]
So you've defined this fundamental building block of memory units and then talked a bit about some of the key relations it needs to have to other memory units as you
[30:00] Richmond Alake: [OVERLAP]
Yeah.
[30:00] Conor Bronsdon: [OVERLAP]
think through, you know, forgetting or remembering, I'm sure, as well. How does this operate within an agent harness where developers maybe wrongly assume that, oh, the model will take care of this? What do they need to be thinking about instead?
[30:19] Richmond Alake:
Yeah. Do you know, I don't think developers wrongly assume that the model providers will take care of memory. And this is why, because memory is the last battleground for anyone that operates in the AI stack. And here's what I mean. Everyone is trying to solve memory. Database companies, model providers, even
[30:48] Richmond Alake:
harness providers or framework providers. Everyone has got a say on how memory should be modeled in agentic systems. right? And it's an easy battleground for a research lab to operate in and a single developer to operate in as well. There isn't a plane within the AI landscape that has that sort of against that sort of equality between the players. For example, a single developer can't train a GPT 4-5 like model, right? But a single developer can build a memory system that can remember better than a chat GPT. And some have, right? Small teams have built these systems. So that's why I say I don't think developers are assuming the wrong thing if they think the model provider is going to solve it. But this is what I tell developers is they need to understand, at a very fundamental level, memory engineering and memory operation. And this is how it works in the agent harness. And just to define agent harness, an agent harness is basically all the software scaffolding and the control mechanism you have around, maybe in and around an agent loop. And an agent loop would just include an LLM that can take in information, observe the information and maybe do some tool calls and then produce a final output. Now, in an agent harness, what you're going to have are memory operations. for each of these memory types. And you can think of CRUD operations for each memory type. Create, read, update. I was going to say delete, but we don't delete, we forget. So we have the CRUF operations, I guess we could, C-R-U-F, I don't know. We have those operations within the agent harness. And within the outer harness, and what I mean by the outer harness, which is the aerial or the engineering space before you get into the agent loop, what you're going to need to do is pull in all the relevant memory. So you need to pull in all the conversation memory, you need to pull in the knowledge base, you need to pull in all of this memory. You mostly will have all this read operations. right? Then in the agent loops, you're probably going to have a lot of write memory operation because you're updating information that is coming out of the context window within the loop and even outside of the loop. So this is what I would say how it fits in the agent harness. I wish I had a diagram to show here.
[33:32] Conor Bronsdon:
Thanks to Galileo for sponsoring this episode. Their new 165 page comprehensive guide to mastering multi-agent systems is freely available on their website at galileo.ai and provides you the lens you need to understand when multi-agent systems add value versus single agent approaches, how to design them efficiently and how to build reliable systems that work in production. Download it for free at the link in the show description to discover how to continuously improve your AI agents, identify and avoid common coordination pitfalls, master context engineering for agent collaboration, measure performance with multi-agent metrics, and much more. I know, I feel like I am in a really engaging class right now and it is fantastic, I want to say. It actually draws me to my next question, which is, Besides the option of listening to this podcast, which everyone should do, and your other talks, which they should probably also do, where should developers go to learn more about agent memory and how to manage it?
[34:39] Richmond Alake: [OVERLAP]
I will find you. I'm joking. Sal's win.
[34:43] Conor Bronsdon: [OVERLAP]
It's kind of true though, like if you go on LinkedIn and you're looking for agent memory, Richmond is going to pop up.
[34:49] Richmond Alake:
Yeah, yeah, yeah. Again, if you're looking for Age of Memory, do a Google search or go on LinkedIn. Hopefully, I pop up. And if you don't, you can follow me on LinkedIn, right? I'm always talking about Age of Memory. And I'm kind of taking it into this phase of continuous learning as well, because Age of Memory has a lot to do with continuous learning. And that's going to become a possibility this year or next year.
[35:12] Conor Bronsdon:
It's super exciting.
[35:14] Richmond Alake:
But if you want to, like, I mean it when I say, if you're a developer, I will find you because that's my mission, right? I've been in this AI space for a good amount of years, educating developers in different forms. And that's what my focus is, really enabling developers to build AI application in production, that skill, think about things abstractly and communicate to people that are non-technical folks as well by using human analogy. It's just, I enjoy doing that. So yeah, I will find you.
[35:46] Conor Bronsdon:
I feel like we should just plug this entire conversation into Notebook LM and say, build me a fantastic diagram of everything Richmond said here, because I think it might give us a really interesting asset for you to use. And it also brings up a very specific question to me, which I think is something that maybe, you know, folks listening here would think is like, oh, I'm unsure on the boundaries. So, you know, if I am using, let's say, Claude, and I'm going through and having a conversation, if I talk to Claude for long enough, it will eventually compact the conversation. And it'll say, oh, I'm trying to help us keep this conversation going, let me compact it. When it does that, is that context management or is that memory management where it's actually forgetting things?
[36:33] Richmond Alake:
There is, compaction is interesting because there are two ways it can compact, right? Wait, context reduction is interesting because there's two ways you can reduce context, compaction and summarization, right? And one is lossy, the other one is lossless. So summarization is, I'm just gonna summarize a context window, throw away whatever we've done, and just put a summary in the context window. Compaction is, I'm going to maybe take a chunk out of the context window and store it externally and then retrieve it when I need to. So use some just-in-time retrieval mechanism. Where do the boundaries lie? This is where it lies. If you're doing summarization, a way to make summarization lossless is you summarize your conversation. You click that compact or maybe the compaction mechanism is agent triggered. That summary gets generated. Hopefully it has a lot of information compressed and it gets put into the context window. All of that is context engineering. How do you know which information to compact, right? How do you prompt the LLM to do the right selection of signals and the right compression, right? That's context engineering. But when you start to move, the uncompressed information to an external store because now you've given your agent the ability to do just-in-time retrieval. And what I mean is, even though you've removed the full summary, you've placed an ID right? A unique ID and a description, maybe a paragraph description of what's held in this ID as a way for the agent to just search that ID to get the full conversation when it wants to. When you put that into a database, or even if you're using files, that's memory engineering. Because now you have to start thinking about, when do I retrieve this information? How quick should this information retrieval be? And what retrieval mechanism should I be using? That's the boundary.
[38:50] Conor Bronsdon:
This is fantastically interesting and I am already regretting that I didn't schedule two hours with you because I feel like I could get so much depth out of this. So we may have to have you come back on. Thankfully you still have a little time left here. I guess the the key question I want to ask here, and you've maybe touched on this already, is agent memory fundamentally a database problem then?
[39:15] Richmond Alake:
It should be a database problem. It should be, right? Database providers, storage providers are in the best position to solve this. But agent memory, if I should define it properly, because that will help with the answer. I was trying to see if I have like a diagram somewhere of the outlines where I think the boundaries between agent memory and where
[39:44] Conor Bronsdon: [OVERLAP]
That
[39:44] Richmond Alake: [OVERLAP]
memory
[39:44] Conor Bronsdon: [OVERLAP]
would
[39:44] Richmond Alake: [OVERLAP]
engineering.
[39:44] Conor Bronsdon: [OVERLAP]
make a fantastic LinkedIn post. That's for sure.
[39:47] Richmond Alake: [OVERLAP]
Yeah, you know what? You are just in luck, Connor. As in, you've asked me to show you stuff and I guess because I live this stuff, I just have them sit in. So, let me show you a good diagram, I think, we show things.
[40:06] Richmond Alake: [OVERLAP]
You can see my screen, right?
[40:08] Conor Bronsdon: [OVERLAP]
Yeah.
[40:09] Richmond Alake: [OVERLAP]
You can see this draw.io thing just flowing around. And
[40:12] Conor Bronsdon: [OVERLAP]
Yeah. Yeah.
[40:13] Richmond Alake:
what I like to do is just draw diagrams and take my time, because I find that visual explanations really work well for people to understand what's going on. So here you see, you're going to see, this diagram is meant to show you where memory engineering, prompt engineering, and context engineering, where the boundary lies. So here we have our Oracle AI database. But we also have all of this, let's say ingestion, all these documents coming into the database. Then we have all of the different vector, all of the different data types that the Oracle AI database can handle. Remember, I showed you guys that before. I've kept some markers for the different engineering disciplines. But over on the edge of the database, Like, if you just imagine the boundary of the database, that's memory engineering. When you see agent engineers thinking about retrieval, indexing, storage, decay and forget logic, retrieval optimization, storage optimization, data modeling, that's memory engineering. But when you then go into the context window, you're going to see things like token budgeting, just-in-time retrieval, context retrieval, context organization and composition. All of that is context engineering. But you also have prompt engineering as well that operates within a context window and different techniques. And this is how I see the boundaries, right? It's how I'm seeing things. And this evolves over time as well. Any questions?
[41:44] Conor Bronsdon: [OVERLAP]
Uh, so many, uh, my first one is, can I get a copy of this diagram to include
[41:49] Richmond Alake: [OVERLAP]
yeah
[41:50] Conor Bronsdon: [OVERLAP]
in the show notes? Cause this is fantastic.
[41:53] Richmond Alake: [OVERLAP]
yeah you
[41:53] Conor Bronsdon: [OVERLAP]
Um,
[41:53] Richmond Alake: [OVERLAP]
can and the thing is because there is like a hundred things happening i never finish this diagram and put it out there uh maybe this would i'll give you a copy of this and we can put this in the show note yeah uh i don't know if this is fully complete but anyway i'll
[42:09] Conor Bronsdon: [OVERLAP]
Well, let
[42:10] Richmond Alake: [OVERLAP]
give
[42:10] Conor Bronsdon: [OVERLAP]
me ask you this. How would you think about prioritization here? So let's say I am a newer engineer to memory engineering, and I am getting really excited about it listening to you talk on enrichment, which I think is very easy. I mean, it's happening to me. Where should I start? Where should I be, you know, trying to have strong impact initially?
[42:29] Richmond Alake:
Well, if you're a developer, forget about memory engineering, context engineering, or all of these buzzwords. Your job is very simple. You're meant to build systems that work in production. That's it, right? That is your job. Now, if you want to have the most impact, what you can do is, firstly, understand what does it mean for the system you're building to be valuable and useful in production. Does it mean that the interaction that a user has with your agent can extend for multiple sessions or even days? Does it mean if you're building a coding agent, the agent could run for hours? Because then if those are your parameters you're using to define success, you can find different engineering techniques that you can start to leverage. Security is important. Now, if you pick security, you start to go into the path of sandbox environment that you need to leverage. So forget about the engineering disciplines. They're going to change. Prompt engineering was like sexy at one point, not sexy anymore. Context engineering is nice, not nice anymore. Memory engineering is up and coming, if I have anything to say about it. We'll see. It might be something else in a few years time. But the key thing is, if you're a developer, you just want to build something valuable. So what are the things that really matter to your domain? Then just figure out what people are doing to make things work. And eventually we can define it with some nice terms.
[44:13] Conor Bronsdon:
And you recently wrote a blog post that I glanced at. I will admit not to having read the whole thing in depth. But you wrote this blog post for Oracle where you talked about comparing file systems and databases for effective AI agent memory management. And you essentially argue, seemingly, that file systems are great as an interface, but databases are a stronger substrate, and that there's kind of a categorical error that many folks are taking on as they approach this, thinking, oh, I just need all these files. How does that relate to the retrieval decisions you're making as you build out the right database for your agents?
[45:02] Richmond Alake:
So once upon a time, I don't know if you heard this, but people used to say vector search is all you need.
[45:10] Conor Bronsdon:
It's always all you need. Yeah. Oh, this is it. This is the thing.
[45:13] Richmond Alake:
Yeah, it's always something is all you need or something is dead.
[45:18] Conor Bronsdon:
Yes. Reg is dead. Reg is dead. Everybody.
[45:20] Richmond Alake: [OVERLAP]
Rag
[45:20] Conor Bronsdon: [OVERLAP]
Yeah.
[45:20] Richmond Alake: [OVERLAP]
is dead. Just like prompt engineering is dead or something is dead or something is all you need. We're either on two ends of this. We're always at the extreme of the spectrum.
[45:31] Conor Bronsdon: [OVERLAP]
It's because there's too much of a hype cycle. This is a whole other convo we could go into,
[45:36] Richmond Alake: [OVERLAP]
Exactly.
[45:36] Conor Bronsdon: [OVERLAP]
but yeah.
[45:36] Richmond Alake:
So we quickly found out that vector search was not all you need, right? It was a very key component of what you needed, but it wasn't all you need.
[45:47] Conor Bronsdon:
Is the phrase all you need simply an artifact of us using LLMs to generate text and therefore overhyping that text and it getting transferred into the ecosystem? I don't know. Anyways, go on, sorry.
[45:59] Richmond Alake:
When you think about retrieval strategies for your agent, I'm not saying that there is one solution that will solve all your problems, because that would be a lie, right? In the piece, I even give you situations and scenarios where you should use file systems and where you should use databases. Everything has its place in your application maturity. If you're building a PLC and you have to demo to some execs that told you to build this yesterday, for sure you can think of using a file system because now you don't have to start thinking about tool selection and different things that come all around AI infrastructure. if you have it tomorrow. But once you start to get in front of your customers, once you start to scale, then you need to go to more proven systems, right? Files, again, files are nothing new. The philosophy of using files for everything was a thing in the 70s,
[47:02] Richmond Alake:
right? In AI, we love to resurrect things and act like we just
[47:06] Conor Bronsdon: [OVERLAP]
And
[47:06] Richmond Alake: [OVERLAP]
discovered.
[47:07] Conor Bronsdon: [OVERLAP]
we tried them a lot. We tried them a couple of decades ago. It didn't quite work because we weren't ready. Now it's starting to work. This is great.
[47:12] Richmond Alake:
Yeah, and if you look at vectors, right, I remember, like, when I was studying computer vision in university, everything was vectors, right? Matrix multiplication was convolution, convolution neural networks, and convolutions are just matrix multiplications with different matrices, right? So
[47:35] Conor Bronsdon: [OVERLAP]
AI
[47:35] Richmond Alake: [OVERLAP]
everything...
[47:35] Conor Bronsdon: [OVERLAP]
agents concepts are 50 plus 70 years old at this point initially.
[47:40] Richmond Alake: [OVERLAP]
Yeah, so it's like we always rediscover things and reimagine them. So look, this is my advice to developers.
[47:53] Richmond Alake:
There
[47:56] Richmond Alake:
is a subset of categories in technology where you have folks that are really solving a key problem. Database, data storage, data retrieval, The longest company that has been solving that problem is Oracle. Oracle is over 40 years old. So this is a company that understands data, understands how data should be retrieved and optimized. And we've seen all of the hype cycles once before me and you were born. I don't even know what hat cycles we had there.
[48:32] Richmond Alake: [OVERLAP]
I don't know what I'm talking about. There was the data era, there was the internet era, there was the cloud thing. It's all gone and now we're in this AI era, agent, and we'll probably do quantum very soon. The key thing is we, being in Oracle, you understand that change is constant, but to really produce value, you have to be a student of change. And what that means is, and the outcome of that is a database, the Oracle AI database, that gives you everything you need. And I know I said, there is not, excuse me.
[49:13] Conor Bronsdon: [OVERLAP]
it's all you need he says i okay see
[49:16] Richmond Alake: [OVERLAP]
Oracle
[49:16] Conor Bronsdon: [OVERLAP]
you richmond
[49:16] Richmond Alake: [OVERLAP]
is
[49:16] Conor Bronsdon: [OVERLAP]
that's
[49:16] Richmond Alake: [OVERLAP]
all you need. Oracle database is all you need. I'm contradicting myself. I'm gonna
[49:21] Conor Bronsdon: [OVERLAP]
the clip we're
[49:21] Richmond Alake: [OVERLAP]
get flamed.
[49:21] Conor Bronsdon: [OVERLAP]
gonna we're gonna start the episode with that yeah
[49:24] Richmond Alake: [OVERLAP]
Yeah, Oracle AI Database is all you need. Just put that in the episode title, please.
[49:28] Conor Bronsdon: [OVERLAP]
okay okay
[49:29] Richmond Alake: [OVERLAP]
I'm going to get flamed in the comments.
[49:32] Conor Bronsdon:
you know what but your boss will be happy so
[49:34] Richmond Alake:
We got a mission. Why are we getting so much attention? But look, there is a lot of retrieval strategies in the Oracle AI Database, like I showed you, right? I did a demo, and I showed you. And that could have easily been four different databases plus a file, like plus a bunch of MD files. You can go that direction if you want to, but this is the key thing. I'm a simple man. I'm all about reducing cognitive load, either for the LLM or for myself. So just the same way we don't want the LLM to really spend all of his reasoning
[50:13] Richmond Alake:
capabilities on thinking about information in the context window and what memory is this or what should I remember, by making it memory aware and segmenting things the way humans does, it's the same way we've got to treat, we have to treat our developers in the same way. I don't want our developers to start thinking about, is there a new data type that's going to become popular very soon? Okay, do I need a new database? Okay, I had files of what you need. Okay, I need to start bringing the files. I had this, I had that. It's like, no, one database. and we will handle it and it keeps evolving. So that's what I'm gonna basically present here. And I don't like to just talk a talk, I would walk in diagrams and demos and we have a bunch of resources so you can test things out yourself and make your tool choice. But yeah, experiment, this field is fun.
[51:09] Conor Bronsdon:
I think this idea of a converged database that Oracle is taking on is excellent from the standpoint of reducing cognitive load, as you put it. Because I think this is before the camera started rolling that we were talking about this, but Richman asked me, oh, how are you doing? And I said, you know, I feel like the last few weeks I have been surfing this wave of water, everything coming at me. And then I lost my surfboard this week, and I'm paddling desperately trying to stay above water and get back on the surfboard. And Richmond said very succinctly, I feel I've felt that for one, two years at this point.
[51:46] Richmond Alake: [OVERLAP]
Yeah.
[51:47] Conor Bronsdon: [OVERLAP]
And I think all of us who are involved in the AI space There is a point where you feel this. Maybe you feel this all the time because there is so much being thrown at you. There's so much new happening. There's so much exciting happening there. There's so many new ideas. There's so many new techniques and it can feel deeply overwhelming. It can give you too much cognitive load. Nobody can keep up with all the papers. There's a reason we're using LLMs to summarize a lot of this stuff. Uh, so when you can find the opportunity to be focused and simplify in a way that reduces your cognitive load, lets you manage your memory or your agent's memory,
[52:23] Richmond Alake:
you.
[52:23] Conor Bronsdon: [OVERLAP]
uh, and reduce the context you have to be considering, I think that's a really fantastic pitch, especially because, I mean, we can look at human history and we've been talking a bit about some of the, uh, some of the comparisons, you know, neuroscience, for example, as one area that we've talked about. But there's a reason that we have done specialization for hundreds of years now. And there's a reason that we've seen such efficiency gains from that. And increasingly, as we've had these incredible gains with the generalized frontier models, we are now seeing, oh, if I build the right harness for this, you know, cloud code is a great example that I think has gone extremely viral the last few months. Uh, you can deliver a lot more value, even if I haven't necessarily pushed the frontier capabilities this far, but by giving it the right context, helping it have the right memory management can be at the right tools specializing. Um, and
[53:17] Richmond Alake: [OVERLAP]
now.
[53:17] Conor Bronsdon: [OVERLAP]
I, I think really this entire concept you're talking about Richmond of memory management and agent memory is very clearly an opportunity to drive specialization and therefore drive efficiency and hopefully results.
[53:30] Richmond Alake:
You're right about there's a lot that happens in the AI space and this is how I handle the noise. I just select a lane and I just stay on that lane. And anything that doesn't talk about the lane, I try to ignore. Because if you've been in AI long enough, you're going to realize that you're going to pick a lane, but all lane leads to the same destination. Right? So you could have just focused on computer vision and deep learning. Eventually, you'd have found your way to where we are today, right? Visual, VLLMs, or whatever we call them, right? You could have stopped at NLP, and eventually you'd have find your way here, right? You could have stopped to robotics, and you would have find your way here as well.
[54:19] Richmond Alake:
And that's the way I see things. I just chose age of memory. And I said, I'm just gonna stick with this and try to, when I hear a signal or when I see something in the space, I try to, does this really, does this fit in that lane that I've chosen? And if it does, I go deep. If it doesn't, I ignore it because there is so much, right? I could be talking about quantum engineering next time I come on this podcast.
[54:46] Conor Bronsdon:
Good. I cannot wait to have that quantum discussion. And
[54:49] Richmond Alake:
I'm literally just busting out laughing.
[54:53] Conor Bronsdon:
this idea of convergence is not, you know, it's not a new one. The idea that machines and humans and knowledge, it's all converging into a single whole or a single opportunity. But it's definitely an interesting one to philosophically think about in the context of agents and specialization and memory and how we bring everything together. And Richmond, I just can't thank you enough for coming on the show today. It's been so fun to dive deep with you. Uh, and I absolutely going to have to have you back. I'm confident you're going to have a new innovation for me in a few months here. So, uh,
[55:29] Richmond Alake:
So
[55:30] Conor Bronsdon:
one, I want to just ask you as we close out here, and we mentioned your LinkedIn earlier, but where else should folks go to follow your work and to dive deeper, maybe into some of the courses and creations that you've already. Put out in the world.
[55:44] Richmond Alake: [OVERLAP]
I'm always going to be creating courses so hopefully by the time this come out it's going to be like a big course that I have and I also have I do this weird thing I call it weird but it's not weird I do this thing where I have a five-hour class that I live stream on O'Reilly, right? It started off being two hours, then now it's a five-hour class on just AI memory management.
[56:12] Conor Bronsdon: [OVERLAP]
I thought you were reducing cognitive load, Richmond. What's going on here?
[56:15] Richmond Alake:
So what happened was we started off with two hours, right? And I told O'Reilly, I was like, hey guys, this was last year summer. I said, Azure Memory is really popular. Developers are interested in this and this is gonna pay, interest is gonna compound. And we started off with two hours, then we did the class and we didn't have enough time. We increased it to three hours. We did the class, not enough time. Now it's five hours. We did the
[56:42] Conor Bronsdon: [OVERLAP]
I
[56:42] Richmond Alake: [OVERLAP]
class.
[56:42] Conor Bronsdon: [OVERLAP]
should have known better than scheduling an hour for this podcast.
[56:45] Richmond Alake: [OVERLAP]
Exactly. We did the five-hour class. Guess what, Connor? It's not enough time. Now we're turning
[56:51] Conor Bronsdon: [OVERLAP]
Damn
[56:51] Richmond Alake: [OVERLAP]
into
[56:51] Conor Bronsdon: [OVERLAP]
it, Richmond.
[56:51] Richmond Alake: [OVERLAP]
a two-day, two-day class. Because
[56:55] Conor Bronsdon: [OVERLAP]
Okay.
[56:55] Richmond Alake: [OVERLAP]
we could keep
[56:56] Conor Bronsdon: [OVERLAP]
Okay.
[56:56] Richmond Alake: [OVERLAP]
going on and on. The space is, again, I've chosen my late. And that's age of memory. It's going to go into continuous learning. And that's my late. Right? So there is so much things that are just happening in that particular lane that I am just so eager to bring to developers, and I'm so excited. So you can catch me on O'Reilly. I'm one of the instructors. You can search my name, Richmond O'Reilly AI Memory, and you'll see one of my classes come up. I work at Oracle at the moment, so you can catch that. At the end of every month, I'm on Oracle Developers' YouTube channel. The last Thursday of every month, I'm live streaming on the Oracle Developer YouTube channel. So you can catch me talking about whatever I've discovered in that month related to my late. So those are two ways you can see me just talk for hours.
[57:45] Conor Bronsdon:
And don't forget Richmond's LinkedIn as a place to get little micro sized bites of all the knowledge in these courses. Richmond, this has been fantastic. I am already plotting to have you back to talk more about the agent loop and training continuous learning. So whenever you're ready, you just let me know. I
[58:03] Richmond Alake: [OVERLAP]
Yeah,
[58:03] Conor Bronsdon: [OVERLAP]
think that's gonna be a ton of fun.
[58:05] Richmond Alake:
it'll be fun. Thanks for having me, Karna.
[58:07] Conor Bronsdon:
My pleasure, Richmond and listeners really hope you enjoyed this lesson. And, uh, I called it a lesson, but really I meant to say episode, but, uh, I think it's truly, truly was a lesson. And I would love to hear from you about other topics you'd like us to go deep on. I mentioned a neuroscientist earlier after Richmond and I were talking. If someone knows a great neuroscientist we should talk to about the different comparisons between agent and human memory, we would love to have that conversation. If you have other ideas for episodes, we always love to hear from you, whether it's in the comments on YouTube, Spotify, LinkedIn, Substack, anywhere else you can reach the show. You can always find our episodes at chainofthought.show. And remember that if you haven't already left us a rating or review on Apple Podcasts or Spotify or just dropped a comment somewhere, we deeply appreciate it. This engagement is help. so many other folks find the show, and it helps us to drive the next epic of incredible guests for you. So thank you again for listening or watching, and Richmond, thank you so much for this great conversation.
[59:12] Richmond Alake:
Thanks SilverCutter!