Technology Explorations in Data & AI

In this Technical Exploration, Jonny and Tarik show how to build a fully functioning RAG-based AI agent using MindsDB, turning internal data from Postgres, Slack, or Google Drive into a queryable knowledge base powered by semantic search.
You'll see how to:
  • Convert internal documents into a semantic knowledge base
  • Make Google Drive & Slack data queryable in minutes
  • Build a custom AI Agent on top of your knowledge bases
  • Run MindsDB locally with Docker
  • Use SQL to configure agents, connectors, and knowledge bases
  • Expose your agent through a simple API for app integration
We also touch on:
  • Chunking & embedding strategies
  • Local vs. cloud LLMs
  • How MindsDB compares to full ETL approaches
Resources:
  • Demo code: https://github.com/datamindedbe/demo-technology-exploration/
  • Previous episode (PyAirbyte): https://youtu.be/eLUQrSqP-ns
  • Click here to watch a video of this episode.
  • Full playlist: https://www.youtube.com/playlist?list=PLJ_da7qdfL80rA7byzC_CmyrfJWjcCTnb
Note: This video is not sponsored or affiliated with MindsDB.


Chapters:
  • (00:00) - Intro: What is MindsDB?
  • (02:03) - Scope and dataset
  • (03:08) - Quick tour: connectors & UI
  • (06:34) - Components: Source, Knowledge Base & Agent
  • (06:55) - Agent demo + how MindsDB queries data
  • (08:36) - The code: getting MindsDB running
  • (11:27) - Q&A: embedding times & creating your agent
  • (14:03) - A Slack agent in 5 minutes
  • (17:19) - Multi-knowledge-base agents
  • (18:23) - Q: Integrating with SDK & MCP
  • (19:31) - Takeaways


Data & AI: Technology Explorations is a biweekly show from Dataminded. Each episode a Dataminded engineer demos a tool or technique worth knowing about -- working code, honest takes, no hype.

Music by Aleksandr Karabanov from Pixabay

Creators and Guests

Host
Jonny Daenen
Head of Knowledge at Dataminded
Guest
Tarik Jamoulle
Data Engineer at Dataminded

What is Technology Explorations in Data & AI?

Deep dives and practical demos on the technologies shaping modern data and AI development. Join the Dataminded team as we explore, unbox, and critically review the latest tools, from building AI agents and RAG systems to optimizing cloud costs and accelerating data pipelines. We cut through the hype to show you what actually works in real data engineering practice, complete with demo code!

Jonny Daenen (00:00)
How do you make your data queryable using natural language? We see lots of terms pop up, RAG systems, semantic querying, and we'll show you today with MindsDB how to set that up in a few minutes. Let's go.

Jonny Daenen (00:21)
Hi everyone, welcome to technology explorations at Dataminded. In this series, we give you an initial look in new or interesting technologies. My name is Jonny, Knowledge lead here at Dataminded, and today we'll have a look into MindsDB. And for that, I've invited Tarik. Welcome Tarik.

Tarik (00:37)
Hi Jonny, thank you for having me.

happy to be here to talk about MindsDB.

Jonny Daenen (00:41)
So we will be building on top of what we did last time where we went over extracting data from Google Drive and putting it into a Postgres database and today you're going to show me MindsDB. Tell me what is MindsDB?

Tarik (00:44)
Mm-hmm.

⁓ MindsDB, we could call it an open source AI layer, which makes your data queryable using natural language. So what it can do is that it can connect to a database, like Postgres, for example. And then you can add AI capabilities using SQL commands.

So what it can do is it turns your data into a knowledge base. And then you can build chatbots to talk questions about this MindsDB can also connect to multiple connectors.

such as Slack, Notion, Databricks, and so on. So you can directly chat with your data. So you have two ways to use MindsDB You can use it to connect to a Postgres database that you own where you already ingested data, or you can directly connect to the data that lives in a system and query that.

Jonny Daenen (01:49)
Okay, so MindsDB, if I understand correctly, is a layer that you use to fetch data from an existing system and then basically index it so that it's easily queryable by agents.

Tarik (02:01)
Yes, Indeed.

Jonny Daenen (02:02)
Is

Tarik (02:03)
What we're going to do today is we are going to create a knowledge base on MindsDB from the database that we have in Postgres. a knowledge base is like a semantic search engine over the data. So it combines an embedding model, vector storage.

metadata tracking And then it automatically chunks long documents for better retrieval. So it's a way for the chatbot or the LLM to have a better semantic understanding of the data that is behind it.

Jonny Daenen (02:32)
Yeah, can you remind us again, what is the data we have available?

Tarik (02:36)
so during the PyAirByte demo, we ingested a couple of text files from Google Drive, which now live in the Postgres table here in the Google Drive files. you have the names here of the files, and then you have the contents of the files here.

So we will vectorize these, create a knowledge base out these files in the database, and then use a chatbot to ask questions about these files.

Jonny Daenen (03:05)
Yeah, okay. Let's have a look at what it can do.

Tarik (03:09)
this is mindsDB. Here, it's already running on my laptop, on this local host, locally. And then you can see here, for example, if you wanted to connect to a data source, you can have multiple data source in there where you can directly connect to the data, through the API.

Jonny Daenen (03:16)
Yeah.

Tarik (03:32)
and then ask questions ⁓ in real time. Or you can also create knowledge base from a selected data source. ⁓

Jonny Daenen (03:41)
And when you say creating

a knowledge base, does that mean then reading all of this data and making embeddings out of it or chunking everything? Okay.

Tarik (03:49)
Yes, yes, yes,

and then you can select your own embedding models to do that and we will see how we can use the MindsDB to do it. You will notice that for example, there is no Google Drive connector. which in... Yes.

Jonny Daenen (04:05)
Luckily you extracted that to postgres.

And I did see a Slack connector. So I see a mixture between data systems and also software systems or SaaS providers. Okay.

Tarik (04:11)
Yeah.

Yes. Yeah, indeed.

I think there's also Teams, Notion. So we won't have time to explore everything, but there is a lot that you can do with it. You can also connect it to an MCP that talks to other data sources or providers. So it's a quite versatile tool.

Jonny Daenen (04:21)
interesting.

Okay, so you could have MCP as a source. And so MindsDB could read from there. We have another video on MCP for those who want to explore,

Tarik (04:36)
Mm-hmm.

Yes.

We will go through the code later. But first, let's have a look at what is inside the MindsDB. Here we have, as you can see in my data sources, I already ingested the files. And they are there. Google Drive files.

you can see the contents. This is basically what we see in Postgres. And then what we can do is that we create a knowledge base out of this. So it's here. And then here you have information about the name of the knowledge base, about the embedding models that we choose. Here is this is the one that I use. This is my API key, which is hidden. ⁓ And then you have metadata column and...

Jonny Daenen (05:24)
Yes.

Tarik (05:29)
parameters and so on. And what we do afterwards is that we have an agent here. So it's nicely structured here. You have your agents, you have your jobs, you have knowledge bases, models. You can also use local models if you so choose, which is quite powerful because if you want to do everything locally or you have sensitive files that you don't want to connect to.

a cloud-based LLM, you can also use it locally. And then we have our agent here, which also uses a model and has built-in parameters like you need to connect to that knowledge base. You have a prompt template. You have a role. You answer user questions about that knowledge base, Google Drive knowledge base here. You should add the ID.

Here when you quote or paraphrase a chunk, you should write the ID and the source, summarize duplicates and so on. So you're creating an agent with ⁓ a specific prompting that connects to a knowledge base that you created, that is linked to files on the database so that you can use natural language to ask questions about your files. Yeah, yes.

Jonny Daenen (06:34)
Okay, so I see three levels. You have your source, that data

resides in the source. Then you have your knowledge base that is built up from data in the source and MindsDB helps you create that. And then you have an agent level, which can interact with that knowledge base, but also using an external agent to answer questions.

Tarik (06:47)
Mm-hmm.

Yes, exactly.

So here I'm going to use the Google Drive agent to ask questions about my files. So here I will ask him to give me pros and cons about Microsoft Fabric,

So here it's using the Google Drive Knowledge base because I've instructed him to do it. And he's doing SQL queries on the actual copy of the text. So here we have some pros and cons about Microsoft Fabric based on recent learnings. Here you see it's a fully managed platform and upgrade on the Power BI platform and so on. So you have a good view and then you have the source.

the ID of the file and also the name of the file, which is the Facts and Breakfast, Celebrating Fabric's First Year. And we know that it's actually here, right? So it took the information from this file here.

Jonny Daenen (07:48)
What I already wondered like is we see the query there select star from, and it's trying to do a where query with like a specific string match. This is translated into like an embedding behind the scenes that we don't see it, or is it actually trying to do like a literal match?

Tarik (07:52)
Mm-hmm.

No, he's doing semantic search. Yes, yes, yes, yes. Yeah, indeed. It's doing semantic search and then looking over all the vectors in the Google Drive knowledge base and then selecting the file which is the most appropriate to the question that we're asking.

Jonny Daenen (08:08)
Okay, this is just not showing in the query. Okay got it

Okay, so the equal sign here is a bit misleading. It's actually doing a semantic search on the content column in this case. Okay, got it.

Tarik (08:28)
Yes.

Yes.

So maybe I can show you the code on how to get this MindsDB up and running

basically we're doing a Docker compose mindsDB. So we are getting a mindsDB image from mindsDB. What is really nice about that is that you don't have to log in anywhere. It's open source and you just download the image as a Docker, Docker compose up.

Jonny Daenen (08:57)
Okay.

Tarik (09:02)
and then you set up a username and a password and you're in. And you can use the full MindsDB experience. here I tried to code a little bit. So for example, I have this Python file which is called setup-db.py and we are trying to connect to the MindsDB.

⁓ here with our Python script. Then we are executing SQL statement in here to create the database and the knowledge base inside mindsDB. We are connecting here to the Postgres database. We are creating the knowledge base using a model name and API key. You can define also the storage

you're not defining storage, then it's going to be stored in mindsDB. But then if you lose the Docker image or if you do a Docker compose down, it's in the cache, it's going to disappear. And here, you can store it in Postgres. You can also here populate the Google Drive knowledge base from

Jonny Daenen (09:46)
Yeah.

Tarik (10:00)
Postgres table and then we execute all this. So what does this do? as soon as I do make mindsdb and spin up my Docker image it's going to run all this.

You can also do it here you could create everything in mindsDB in this SQL query editor here for example this is example code to connect to a Slack data source so this is to show you that you can

Jonny Daenen (10:17)
Okay.

Tarik (10:31)
Do it inside mindsDB using the built-in SQL editor and then creating all of this here. But you can also do it in a more, I would say, production-ready way, where you can have everything as code, ⁓ where you can define the API keys and have them stored in a secret and secure location. Because here you would have to write, for example, the... Yes.

Jonny Daenen (10:44)
Yeah.

paste it in, yeah.

Tarik (10:55)
you paste

it in here, for example, I'm creating a database for Slack and I'm gonna have to paste in my token right in there, which is not a very secure way to do it. But if you want to,

take it up further for a production environment, you would have all of these secrets stored in a secure store. So this is to show that you can create a knowledge base, you can create an agent using the query editor of MindsDB. You can also, write it as code and run it when you create the image.

Jonny Daenen (11:27)
I have a few questions on the three steps you showed here. So the first step was creating a database. That is then the source that you indicate, right? This is the source Postgres database The second step I see is creating a knowledge base. So you set the stage for an empty knowledge base.

Tarik (11:27)
Yes.

Yes.

Yes.

Mm-hmm.

Jonny Daenen (11:44)
with specific columns I see and one specific content column.

And then the third step, you do is you load the data in the knowledge base. I expect there embeddings will start being calculated and behind the scenes, will load everything into the knowledge base of step two. Okay. And how long does this take? Cause embeddings, I can imagine if it does chunking or something behind the scenes, it can take quite a while, no?

Tarik (12:03)
Yes, yes, exactly.

Well, we don't have a lot of files. We have about 14 files or so. It doesn't take that long. We can make MindsDB here. And it's going to start doing a Docker Compose up and checking for the Postgres that is running and also MindsDB. Here you can see it's successfully connected to MindsDB. And then now it's...

creating the database if it exists, of course, and voila, it's done already. It was quite quick.

Jonny Daenen (12:47)
Okay and it reloaded these files.

Tarik (12:50)
Yes.

Yes. So for 14, for 15 text files, that was quite quick.

Jonny Daenen (12:58)
Yeah.

Right so now we have the knowledge base in place, it's filled up, but then we don't have the AI agent yet.

Tarik (13:04)
Yeah, indeed. So how do you create that agent?

You can create a new editor here. So what you do is that we are going to create an agent here. Drive, we'll name it Google Drive 2 transcript bot. And we can use GPT 4.1, for example. You define which

data it needs to connect to, so the knowledge base, Google Drive knowledge base, and then you can give it a prompt template. Here I said role is what I explained before. You need to explain the source, summarize duplicates finish every response with certain prefix, and then concise and helpful answer. So you just run this.

So you just run it, and then here you have Google Drive 2 transcript bot, and it's directly usable here.

flexible.

Jonny Daenen (13:55)
So essentially

in about two minutes you build a small RAG.

Tarik (13:58)
Yeah, basically. That's what's really nice about MindsDB.

we did here, the whole concept of ingesting files from a Google Drive, connecting to Postgres, this is rather a long step. What you could also do is here based on what is in Slack.

So it will do some kind of some part of ingestion itself. And then you can store everything in MindsDB. You don't have to connect to Postgres database.

So it's.

Jonny Daenen (14:27)
And so

here you have step one, create database connection that is connecting to Slack. this case, we consider Slack a database or a data source. Then step two is for yourself to just see what channels there are. I assume this is for development purposes.

Tarik (14:30)
Mm-hmm.

Yes. Yes.

Yeah, mm-hmm. Yeah, yeah,

development.

And here we can see, for example,

Jonny Daenen (14:48)
So you essentially

turn Slack into a database that you can query using SQL already.

Tarik (14:51)
Yeah.

Yeah. And then here we can see these are the names of...

of the channels, yeah. ⁓

Jonny Daenen (14:57)
The channels, yeah. Cool.

Tarik (15:00)
so I want to get the messages from the learning channel here, which is this ID, so we can run it.

And then here we have all the text from the messages here. We're talking about Claude. Yeah, indeed.

Jonny Daenen (15:15)
Yeah, Claude Sonnet released,

Okay, so this gives you a way to query the data so you know what's looking right. And then in the next step you say, let's build a knowledge base and you provide the columns.

Tarik (15:20)
Voilà.

Yeah, indeed.

Yes, you're providing the columns. then so here we created a knowledge base. It's already created, so we don't have to do it anymore. But then we can populate the knowledge base with the messages that we have in the channel ID that we defined as so. And here it's going to take a bit longer because we have...

a lot of messages in there.

Jonny Daenen (15:49)
Yeah. And

in the back, I assume it's doing all kinds of Slack API calls.

Tarik (15:54)
Yes, yes, but it's quite removed from what you're doing. So you can really quickly get to a point where you can ask stuff about your Slack messages and everything is really made easy.

Jonny Daenen (15:59)
Yeah.

Yeah, it puts like a SQL interface on top of your source system. this is really nice.

Tarik (16:14)
Okay, so the query was successful. So now we have a Slack knowledge base here.

And then we can create an agent that we will call Slack Assistant. And we define the knowledge base here.

So we'll call it Slack Assistant demo. Let's create it.

So voila, the query was successful and...

Jonny Daenen (16:38)
So we have a

new agent now that is primed with this information.

Tarik (16:42)
Yes, this guy, Slack Assistant demo. And normally now if you ask a question, it will look into the knowledge base of Slack. So we saw.

Jonny Daenen (16:44)
Yeah.

Like if we

ask, do you have news on Claude?

Tarik (16:56)
Hopefully, yes. Let's see. So he's using the knowledge base of Slack. here's the latest news on cloud from the Slack conversations. Claude Sonnet 4.5 has been released. And then you have the source ID, the message and the...

Jonny Daenen (16:58)
Let's see.

Yeah, that's probably the user identifier which you can resolve to the user so that we know who this is really nice.

Tarik (17:15)
Yes. Yeah. Woohoo.

Yeah.

what you could do, you could create an agent that has access to multiple knowledge bases. and there you have your problems that we had, that we discussed in the PyAirbyte video, where we saying we have a lot of different sources, and we had a lot of knowledge in Google Drive, in Notion, in Slack. And PyAirbyte was kind of a...

parentheses to get here because we also need files from Google Drive. But for example, if you want to skip that whole ETL process of ingesting those files and putting them in the Postgres database where you have more control, but it takes more time and more management and more resources, more overhead, you can also use the MindsDB integration connector, directly connect using the API.

creating the knowledge bases using all the tools in there. And really quickly, you can get to a point where you have an agent using the LLM of your choice, asking questions about multiple knowledge bases that can be all put together.

Jonny Daenen (18:18)
Yeah, this is indeed quite a fast track to get to something quickly. And then it makes me wonder, we're running this locally. I would love to have this integrated into some other tool. So does MindsDB then also expose an API that we can use to query this agent you just created, if we don't want to use this user interface?

Tarik (18:23)
Mm-hmm.

Mm-hmm.

Jonny Daenen (18:40)
because there's a top called code is that something that

Tarik (18:45)
Query an agent to generate response to your questions using your connected data.

Jonny Daenen (18:48)
⁓ perfect.

So it does actually allow you using the SDK. So suppose I have my Streamlit app somewhere running and I want to have my own interface and this as a backend, not with this UI, I could actually host this somewhere and query the agents directly.

Tarik (18:58)
Mm-hmm. Yep.

Indeed, yes, you could do that.

Jonny Daenen (19:08)
Okay, this is really nice. Of course, you're not answering the question on how you host this and how you can scale this or anything, but in principle, this is a nice way of very quickly getting from Slack, which is a bit locked down in the silo towards making it queryable and making this into an agent and then turning this into an API. I think this is a really nice value proposition.

Tarik (19:14)
No, yeah, indeed.

Mm-hmm.

Jonny Daenen (19:31)
Maybe a quick conclusion from your side.

Tarik (19:34)
it brings you to value quickly. with PyAirbyte for Google Drive and MindsDB for Notion and Slack, I was able to...

query multiple sources from Dataminded in maybe one day of work or so,

I think it's a really nice tool, MindsDB, to turn database into a knowledge base that you can query with natural language really quickly. It's SQL-based AI. It's open source, so that's really nice.

If you have already a Postgres database where you want to connect this to, it's really quickly to get an agent to do that. So yeah, I think it's a really cool tool.

Jonny Daenen (20:09)
Yeah. From my perspective, this was a really nice demo. I especially liked the whole Slack setup. And my takeaway is you can build a RAG in a few minutes. You get a knowledge base, you get an agent and you get an API So good for prototyping. Yeah. Right.

Tarik (20:24)
Mm-hmm. Yeah, definitely.

Jonny Daenen (20:28)
Thanks a lot for the demo, Tarik. Looking forward to seeing more of this and to trying it out myself.

Tarik (20:34)
Thank you, Jonny,

for having me. It was nice to talk about that.

Jonny Daenen (20:37)
we have some code. We'll put it in the comments. And also if you have any questions for Tarik, also put them in the comments. Again, thanks for watching and see you next time. Bye bye.