Technology Explorations in Data & AI

AI agents get messy fast once you move beyond simple prompts. Context windows fill up with noise, agents start reasoning in loops, and suddenly you're dealing with brittle behavior and hallucinations.

Jesus walks through how Claude Code skills fix this -- packaging repeatable workflows into modular components that load only when needed. He demos two real examples: an Explain Code skill and a PR Review skill that forks context, limits tool permissions, and uses CLI commands to analyze pull requests.

Resources:
- Demo code: https://github.com/datamindedbe/demo-technology-exploration/tree/main/demos/agent_skills
- Anthropic docs: https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview
- Skills standard: https://agentskills.io
- Curious about MCP? https://youtu.be/fIr55-koOJQ

---

Click here to watch a video of this episode.
Full playlist: https://www.youtube.com/playlist?list=PLJ_da7qdfL80rA7byzC_CmyrfJWjcCTnb

  • (00:00) - Introduction
  • (01:28) - Demo: Skills in Claude Code
  • (05:57) - How agents work: from prompts to context engineering
  • (08:19) - What are Skills? (vs MCP, RAG, Commands)
  • (10:33) - Building your own Skill
  • (15:20) - Skills vs MCPs
  • (16:29) - What about hallucinations?
  • (17:07) - Specs and Anthropic's Skill Guide
  • (19:28) - Skillception: a skill to create skills
  • (20:34) - Is MCP history?
  • (22:50) - Sharing skills & wrap-up

---

Data & AI: Technology Explorations is a biweekly show from Dataminded. Each episode a Dataminded engineer demos a tool or technique worth knowing about -- working code, honest takes, no hype.

Music by Aleksandr Karabanov from Pixabay

Creators and Guests

Host
Jonny Daenen
Head of Knowledge at Dataminded
Guest
Jesús García Ramírez
Data Engineer at Dataminded

What is Technology Explorations in Data & AI?

Deep dives and practical demos on the technologies shaping modern data and AI development. Join the Dataminded team as we explore, unbox, and critically review the latest tools, from building AI agents and RAG systems to optimizing cloud costs and accelerating data pipelines. We cut through the hype to show you what actually works in real data engineering practice, complete with demo code!

Jonny Daenen (00:00)
LLMs are powerful, but once you start using them seriously, your conversations can get messy. Your prompts will get longer, your context gets polluted, and you start copy pasting things over and over. What if instead of rewriting them every time again, you could package them into something reusable? This is where skills come in, a lightweight approach to turn repeatable workflows into a modular and reusable component inside your agent. Let's have a look.

Jonny Daenen (00:35)
Hi everyone, welcome to technology explorations at Dataminded. In this series, we give you an initial look into new or interesting technologies. My name is Jonny, knowledge teacher at Dataminded, and today we'll have a look at skills, something that you can use to let your AI agent be more efficient and more reusable. For this topic, I've invited one of our experts, Jesus. Welcome, Jesus. are you doing today? All good?

Jesus (00:57)
up, everyone?

All good, all good.

Jonny Daenen (00:59)
Before we get started, maybe a quick introduction about yourself. What's your role and how are you working with AI day to day?

Jesus (01:06)
Yes. So I am Jesus. I've been in Dataminded for over a year and a half and I am working a lot to creating AI use cases, let's say. But at the same time, I've loved to use, yeah, every new shiny AI tool to improve my, my workflow for better or worse. That's what I've been doing.

Jonny Daenen (01:24)
Yeah.

Okay. And so today you're going to show us how to build skills.

Jesus (01:28)
Yes, great. So we are going to use Claude because they are the ones that created the skill spec, even though now it's open source. But in my day to day, I use Claude code. I feel that is the most advanced software at the moment for this, but it might change quickly in the future. And I have created a couple of skills, really, really small skills. One that is to explain code. So every time that you open it,

You say, explain me this particular code and it's going to do a few things to look at what you want from the repo. And then it's going to give you a nice report so you understand it better. And then I have another one that is quite handy for engineers, which is to explain a PR. So for example, in mind that you open a branch, you make a few changes and then you want to craft a pull request. And then you can use this skill to make this process faster for you. And then you can.

manually check whether the description makes sense or not, but you don't have to type everything

is to re

Jonny Daenen (02:26)
Okay, so it

gives you a review of your pull request before you get the human

Jesus (02:30)
Indeed. So what I want to do is first show, it's going to be a bit meta, but I want to use explain code skill to explain review skill to see what we get as an output. So because I have installed the skills in my repository, when I put slash, I have a bunch of skills.

And here you see in the autocomplete this explain code. So this is the one that I will put in the repo. So then I toggle and I say, OK, hi, I want you to explain to me the PR review skill in this repo. So now it notes.

Jonny Daenen (03:03)
Okay, so the repository

contains already the code for a skill and now you're asking it to explain it.

Jesus (03:08)
Yes, so you see, like, ⁓ will start by finding the code. Of course, you cannot review something that you don't see. So it's going to find the code. Also, quite interesting, because it's Claude I didn't have to even point to the proper folder. Like, based on this abstract description or not super fine-grained description, it's going to find the skill. Once that it finds, it's going to try to understand the code in a way that is

high level, it throws the explanation. And because the way that the explain code has been created, it gives it in a way that is quite educational. Let's say that you have drawings and you see an analogy. So the PR review skill is a trained code inspector

Here is how the skill looks like, which is a visual description. And then this is quite nice. It's like the workflows. You type review PR and the number of your PR. And this is what happens. We are going to use a subagent that is going to check the divs of your current branch. It's going to read the files, and then it's going to give you a review. And then you see here that it's going to trace exactly one by one what's going on when you run this.

Finally, but this is more of an advanced topic, there is an argument that we put for the skill that says that you can only trigger the PR review skill when you call it yourself. Because for example, if you are talking in a conversation and then you say, hey, I want to review this and that, Claude might think, I need to trigger this skill, but it's not needed. You only trigger it yourself when you want to. And that's metadata that you can change.

Jonny Daenen (04:39)
So it's basically

who is in control. Either the human can call it directly with the slash command as you showed or the agent can decide to invoke the skill itself without having an explicit mention of the user.

Jesus (04:51)
And in this case, because we want to avoid false positives, we only allow the human to call it. For example, the explain code is not the same. The explain code can be called. Every time you say, hey, I don't understand fully this part of the code, it should trigger this particular skill for you.

Jonny Daenen (04:57)
Okay, cool.

I see,

okay. So as a skill designer, you're a bit in control of how it gets called. I also see other more advanced things, like fork agents and allow tools. These are more advanced topics to design your skill around.

Jesus (05:19)
Yeah, basically, this is bit more difficult topic. But the idea is that we don't want to pollute the main conversation. So what we do in this skill is that we're going to spin up a fresh sub agent with fresh context, different tools that is just going to do the PR review. When the review is done, then it's going to feed back the information to the main agent. So then you can read only the summary of the review and not the whole.

but also finding the files and doing the diffs and these things. You don't need to care about that.

Jonny Daenen (05:49)
Yeah, okay. And so this is a really nice demo that your skill is already working to show what another skill does.

maybe you can explain a bit like what is a skill now?

Jesus (05:57)
I think that skills are really important because nowadays we have moved from plain LLMs, let's say models that are like chatbots. You have a question and they give you an answer to agents. And agents is a hype word lately, but just means that you have an LLM that works in a loop. You know, like you have ⁓ a function that you want to optimize, for example, give you the best answer.

And then you're going to try to loop for a while to give the best answer. That's really nice. But the problem is that the more loops that you do, the more chances that you are going to pollute the context window of the LLM. Like you're going to put too many details that are not important. So the LLM is going to get lost, let's say, like humans. If I give you too many details, you're going to get lost. So then because of

Jonny Daenen (06:38)
Is this

basically what we then see when an LLM is thinking in the UI? It's basically going in a loop and keeps generating output.

Jesus (06:47)
Exactly.

exactly. So LLMs, have been trained now in a way to reason. So they go step by step thinking about the problem, that's the thinking. And then they act. Then they say, now I need to find the files, for example. I need to read the files. So then that's what you're going to see, which is the LLM interacting with the environment.

Jonny Daenen (07:04)
And is this a consequence of what was previously the chain of thought principle, where you basically say you have to do everything step by step?

Jesus (07:10)
Indeed. So basically what they have done, which is quite smart, is that they have, let's say, baked this concept into the LLMs. The providers, have fine-tuned the models to actually mimic that chain of thought out of the box.

Jonny Daenen (07:24)
by putting a

Okay.

Jesus (07:25)
So because of that, that is super powerful because now you can do really complicated stuff. However, you can pollute the context window. Basically you have an amount of information that fits in the memory of your agent. say it's quite ample. have like 250 K tokens in our models, which is you can feed books there. However, a few studies have shown that the more that you put the more chances that the LLM is going to do something random, like hallucinate.

or lose track. So that's why we have moved from this idea of prompt engineering, that was the idea of better prompting the llm, better putting instructions to context engineering, which basically is the art of putting in this context window only the things that are needed. You know, the instructions that are needed, the files that are needed, the tools that are needed. And then you have a lot of kind of tools. You have RAGs, MCPs, and then you have the latest, which is skills.

skills are really powerful because basically for me, I feel them as I feel that they are these kind of really nice way to make this context engineering efficient, in a way that is efficient, modular are reusable. It's something that is local. For example, you have MCPs, or RAGs, MCPs you need to call something remotely. So it's not in your control.

And RAG is something that is quite technical. You need to implement a vector database and a few things. However, skills is just markdown files, metadata, a good structure, and then Anthropic or Claude is going to handle the rest for you in the background, let's say.

Jonny Daenen (08:56)
So basically the way I understand it is if you have a reusable process that you can explain as text, all the steps and the way you do things, you can put that in a markdown file and the agent will pick it up whenever it needs it.

Jesus (09:09)
Exactly. And also it's not even the prompt, but also you can add Python scripts, for example, or you can do CLI commands or you can do whatever. It's just, are the instructions and then the agent is going to be able to execute whatever you put. For example, in the PR review, if we go to the definition of the skill, you're going to see that we are going to ask the agent to run a few commands in the CLI to get what have been the latest commits.

Jonny Daenen (09:36)
And is that then also the difference between the previous "commands" where you basically just inject the prompts in your context window? Now you equip it also with the ability to run some things locally.

Jesus (09:46)
Exactly. The "command" is more reusable prompt, let's say, which is super useful. But then skills is a reusable, let's say, function for the agent. That's how I see it. It's like a reusable function. And then it's defined by a markdown and extra components.

Jonny Daenen (10:01)
And

I also recall, the discovery of these skills is also a bit more lightweight because the agent only pulls them in when needed. And it's based on the metadata.

Jesus (10:09)
Exactly.

So I have an actual example of how that looks. They use this thing that is progressive disclosure, which is quite powerful. So in the beginning of your context window, you have the system prompt. This is something that you don't control And then you have loaded a bunch of skills, for example. Imagine that you have a skill that you have written that is used to read PDFs. The only thing that you keep

Jonny Daenen (10:32)
Yep.

Jesus (10:33)
There is a metadata. You only keep the name of the skill and when to use it, when to call it. That's the only thing that you need to track at this time. Then the message says, hey, I want to fill out this PDF. So then the agent is going to be, the PDF skills is going to be a useful thing to use. So you trigger the PDF skill. You see that it's now reading the definition of the skill. Hey, this is the definition.

Now, because in the definition says, if this is triggered, you have to use this Python script to read the PDF. And that is what is triggered, you see. And that's what is done in that part.

Jonny Daenen (11:06)
Yeah, so basically every time it needs more information, it knows how this information is there. So it's kind of an index and it fetches new information by running fetch commands or maybe even Python commands to execute things.

Jesus (11:18)
Indeed. So this is really good because it's super efficient. It also lives in your computer. You have full control of these. You don't have to go through the internet, MCPs, authentication, you know, like this was the problem with MCPs that they might be super powerful, but you need to under a lot of boilerplate that if you are doing things locally, like I need to read a PDF, you don't need to do all of these hassle.

Jonny Daenen (11:39)
Indeed, indeed. Okay.

And so how do I build such a skill myself? Is it a technical thing or is it something anybody can do?

Jesus (11:46)
Not at would say that it's simple.

so this is the we have this folder. This is the definition of what we have. As I said, we have two skills, the one that is explaining code and the one that is review codes. The first one is the easiest. So every skill needs.

This skill.md, that's the basic. That's the minimum thing that you need. And here you're going to have two things. You're going to have metadata. You see the name explain code and the description. This is what it does. So then the agent knows that they can use this skill when you ask about the code. That is what is always there with the agent. That's it.

Jonny Daenen (12:29)
that's basically what the agent always knows about and it knows how about the rest of this file. It's kept there on my disk.

Jesus (12:35)
Indeed. Then for example, this, to be fair, you could argue that it's kind of a command, but because it's prompts, let's say. it's like, when explaining codes, start with an analogy. You see what I draw a diagram, walk through the codes, and call out a ⁓ common mistake, for example. if I would be working with a junior engineer, and I want them to be able to catch up and learn.

This would be a good way to understand. So this is the bare minimum that you can do with a skill. But then you can get more advanced. For example, in this one, we have a few more interesting things. This is the one that we use to review a PR. So you see that now the front matter is going to be more complicated. Why? Because we have a couple of things. The first one, again, name, description. That's it. We now also get ⁓ an input.

So we are going to get an input into the skill, which is the number of the PR. So then you are going to fetch only things for that particular PR. We are going to do this disabled model invocation, which means that only the human can use it. And this is more interesting. So we are going to use context fork, meaning that we are not going to run it in the main conversation, but we are going to spin up a new fresh subagent.

We are going to use an Explorer agent that is an agent out of the box from Claude that basically is on to search information, do greps and these kind of things. And finally, and more importantly, because we know what should happen, we can also limit the amount of tools that this agent is going to have. Instead of, hey, have everything that you can have, no, you can only do reads, greps, and globs, and also use the GitHub CLI to get information about that PR.

And then the description is a bit more complicated, but it's basically review the pull request from these arguments. That is the input. The context you need to do these commands, like the commands to get the context from the PR. So as I was saying, the subagent is going to be able to do this for you. And then these are more checklists in these, like how should you do the review? What to look for? And the person defining the...

The skill is the one that is in control to say, this is what I want the skill to do. This is what I want the skill to look in the code. Because for example, for me, I work a lot in using infrastructure as code. So maybe there are some gotchas that I would always like to look for when I'm doing a PR with infrastructure as So then I put it there. So I put more fine-tune definitions.

And then the outputs. Hey, I want you that every time that you give me the review, this is the way that I want to have the review. So we have consistency. And this can be as complicated as possible because you can use scripts. For example, you can say, I run this script every time that you run the PR. Or you can have examples of good PR reviews. And these are things that you would add on top of this folder. So you would not use the same skill.md.

but you would create new files and reference them in this skill.md.

Jonny Daenen (15:25)
Yeah, because I heard you need to keep your files lower than 500 lines. So if you have examples or so, you put them in a subfolder and here you put like a little local link dot slash and then the file name.

Jesus (15:37)
Indeed, exactly as Markdown. And that's it. That it's like, if you need to do this, go to that file. That's it. And the models are getting smarter by the day. So that's all that you need to say. They will say, OK, I need to read this. And that's it. And they can get as fancy as possible. But I have a couple of references on how Anthropic, the one that created this spec, what are the advice to do?

good skills, what are the best practices, should you look for, for example?

Jonny Daenen (16:04)
Yeah, cause this is a minimal example. And so basically you showed that you have this front matter that controls how the skill is picked up and how it is actually initiated, like the sub agent component and also the permissions on your local drive. it can execute these commands. These are not possible. then in the body, you basically say, this is how the job to be done needs to be executed. You say, these are the steps. These are the commands that you should actually call.

Jesus (16:06)
Yeah.

Jonny Daenen (16:29)
And this gives the agent then a way to do everything, but it's not like the mandatory way. It can still hallucinate, right?

Jesus (16:36)
Yeah, indeed. Yeah, that's true. models can hallucinate. the people developing these models now have this in mind. They are putting a lot of effort into not allowing the models to hallucinate, but it's, you know, it's not that exact science. These are non-deterministic models.

Jonny Daenen (16:43)
Yeah.

Jesus (16:51)
My advice in that sense is that the more instructions that you put, the more brittle that the logic is, the more chances that the model is going to hallucinate. So that's why you have these reusable skills. it's like a function. know, only keep what is necessary and be as short and clear as possible.

Jonny Daenen (17:07)
Yeah, okay. You mentioned a few specs. Could you point us to the specs that people can use?

Jesus (17:13)
Yes. the ones that creating skills Anthropic. have released these skills, authoring best practices, which is a really, really long, but really informative page. So they start with, you see, concise is key, you know, this kind of things like why the example of the PDF, you know, and they give you a good example, a bad example.

Jonny Daenen (17:26)
Yeah.

Jesus (17:32)
the bad example is longer, but also is more brittle. The concise assumes that Claude knows what a PDF is and how to read them. You don't need to give stupid details like, hey, use this to extract the PDF. That's it. You don't need to tell your life. Here, for example, this is explaining what a PDF is. So this for the agent is like, I don't need to know this at this moment. I just need to know how to read it. That's it. ⁓

Jonny Daenen (17:54)
Yeah. ⁓

Jesus (17:55)
And then they have more the metadata and then they have more complicated. More fancier, know, this kind of progressive disclosure. So you see that in the main skill, the references to all the other markdown files that then the agent will read if they are necessary. So you don't pollute the context. And then

Jonny Daenen (18:13)
And these don't

need that front matter, it seems like the front matter is only relevant to your entry point of the skill.

Jesus (18:16)
Mm.

No, indeed, because you manage how these are called through your main SKILL.md. And then each skill is going to be a folder. So here we have the PDF skill, the basics, you need the SKILL.md. And then it's free. they are quite lenient into what you can do. think this forms dot MD. It's also a reserved kind of file. have not used it myself,

But they have a few tips into how to do it. Like don't nest too many folders or not, you know, on this kind of things. they, they go through what is a good skill pattern or how to split the information, how to reference. one thing that I've seen recently is that they have done the same, but in a more visual way. they, they release this kind of, I don't know, nice PowerPoint, let's say.

Jonny Daenen (19:04)
More of a book almost.

Jesus (19:06)
Yeah, and it's planning and design. but it has a lot of examples. Testing, for example, is an important part of skills. Everything that you do with LLMs testing is quite difficult because they are non-deterministic. You cannot do two plus two equal fours, simple unit test. So it's something that you need to pay quite some attention to be honest.

But yeah, they give you good examples on how to. Finally, one interesting thing, even though I don't like it too much, is that Anthropic provides you a skill to create skills. It's called a skill creator skill. So you can call it and say, hey, I want to do a skill to do this and that. And then they will trigger the skill creator, and then they will ask you some questions and come up with the best skill using best practice.

Jonny Daenen (19:47)
Yeah.

But you don't like it?

Jesus (19:48)
My experience has not been great, but to start it's really good, I would say.

Jonny Daenen (19:53)
I think I used it at one point where I had a long conversation to let it do something. And then I asked to make a skill out of this. And so it ended up with a skill that I could download as a zip. And then I had the same structure you had, but then I could start manipulating that myself. And I thought it was a pretty okay starting point, but that was from a UI perspective that was using the main Claude desktop UI. It was not using Claude code.

Jesus (19:59)
Yes.

Mm.

OK, yeah. Yeah, I mean, I think it's a nice light-wave wrapper, guess. Like, if you point to the best practices and what you have in mind, you can get the same of it even better. But it's good to start. It's a nice way to get like a scaffold, and then you can take a look at it and tweak it however you wish,

Jonny Daenen (20:34)
Okay, a few questions. What is the relation according to you to MCP? When would you use an MCP server to solve for something or to add to your agent or when would you use a skill?

Jesus (20:44)
I think for me, I see the value of MCP every time that you have to interact with an external system.

especially when you need authentication. For example, imagine that we have Notion in Dataminded and they provide an MCP so you can plug your agent into Notion. But of course, you need to give permissions to say, hey, I am Jesus. I have access to Notion. Notion needs to make sure that you are who you say you are. But apart from that, I would...

avoid MCPs. MCPs are also things that are quite difficult to create. They are engineering heavy. So if there are MCPs out of the box, that's nice to use. But even nowadays, it's even changing because agents can execute CLI commands, there are integrations that you can do as a skill, bypassing this idea, where it says, hey, use a CLI of Notion to do stuff.

use the CLI of Obsidian, for example, to that is a local note taker. For example, Obsidian, they have an MCP. And they also, Obsidian also has a CLI. For me, it's a bit of yeah, over engineering team to use MCP when it's local, you you have Obsidian locally, you have your Claude Code locally. Just use a skill and say, use this few commands. That's way more efficient.

Jonny Daenen (21:38)
Yeah.

Yeah. So do I read this correctly that you see MCP being deprecated in favor of the normal APIs and CLIs and have that be used by a skill

Jesus (22:06)
Hmm.

Not fully. think like as a developer skills, I would always say that is a good way to go unless you really need this kind of more complicated integrations. However, I am working on a, on an end to end agent that does a bunch of stuff and we do need MCP or at least we need a way to connect two different systems.

Jonny Daenen (22:27)
Okay.

Jesus (22:30)
and do proper authentication and keep the connection private in a private network and JWT tokens and engineering stuff that it is needed. But if I am doing my day-to-day work or a hobby project, I will always try to use a skill.

Jonny Daenen (22:38)
Yeah.

Yeah, so for personal productivity and automation, you would definitely go to the skill route. Okay.

how do you share these skills? You made this skill in a repository. It's just a folder. How can I now use them?

Jesus (22:56)
That's a good question. There are two ways. So for example, if you use Claude Code, they have their own way of versioning skills, sharing skills. So they use this concept of a marketplace.

So you can create plugins that are basically a bunch of skills together, for example, or agents. And then you can install them really easily. Another way is, yeah, if you clone a repo that they have a skill and you put it on your .claude folder. And recently,

Vercel came up with a kind of a marketplace because the things that skills now was created from Anthropic, but they open source the specs. now everyone also if you use Copilot or OpenAI, ChatGPT, whatnot, you can use skills. So that's quite nice.

Jonny Daenen (23:32)
Yeah.

Jesus (23:42)
And then here, Vercel came up with this kind of marketplace where you can upload your skills and then people can find them. For example, recently I wanted to create a video and there is this library called Remotion to create videos. So I found these skills and I downloaded it. And here you can see like

what are the things that you have. I download it, and that's it. And you just start using it, and it's quite easy to interact with.

Jonny Daenen (24:06)
And in your local setup, could you maybe point us to the .claude folder and show where you installed them?

Jesus (24:14)
So it is .claude.

And here you see these are the skills. Yeah. But also you can go to your root if you want skills that are used across projects.

Jonny Daenen (24:17)
yeah, there is your...

Jesus (24:24)
in that folder you also keep track of the conversations and the subagents and permissions, so it's not only the skills, let's say.

Jonny Daenen (24:31)
All right, this looks really nice. think people can try it out themselves if they check out a repository. Anything else you'd like to add Jesus?

Jesus (24:37)
No, just give it a try. think the barrier to try them is quite low. You don't need to do much engineering, but they can speed up your workflow significantly. That's how I feel it.

Jonny Daenen (24:49)
Yeah. All right. Thanks a lot for explaining. There's many other aspects that we didn't cover like marketplaces, hooks and more advanced concepts. We'll cover those in the other video where you will also showcase a bigger agent that you build also using these skills principles. So thanks a lot for explaining us Jesus. And thank you everybody for watching and we'll see you next time. Bye bye.

Jesus (24:52)
Thank you.

Thank you.