Searching for null0

Summary

In this podcast, Kirk Byers and John Capobianco discuss the  impact of AI on network automation and engineering. They explore the significance of ChatGPT, the challenges of inference, and the concept of Retrieval-Augmented Generation (RAG). John shares insights on using LangChain for building AI applications, and the role of AI agents. The conversation emphasizes the importance of adapting to AI technologies and the potential for enhancing productivity in network engineering.

Takeaways
  • ChatGPT marked a significant turning point in AI awareness.
  • Retrieval-Augmented Generation (RAG) enhances AI capabilities.
  • LangChain simplifies the integration of AI with network tools.
  • AI agents can automate complex tasks in network management.
  • Fine-tuning models can improve AI performance in specific domains.
  • AI can significantly reduce the time needed for project development.

Chapters

00:00 - Introduction to AI and Network Automation
01:42 - The Impact of ChatGPT
05:50 - Understanding Hallucinations and Inference
09:53 - Retrieval-Augmented Generation (RAG) Explained
14:42 - Building with LangChain
18:37 - Exploring Models and Local LLMs
22:55 - Exploring Fine-Tuning and RAG Techniques
25:34 - Integrating AI with Network Data
29:34 - The Rise of AI Agents
34:28 - Modernizing Code
39:53 - Future Directions for Network Engineers

Reference Materials
Selector https://www.selector.ai/
John Capobianco YouTube Video on "Multi Agent AI for Network Automation" https://www.youtube.com/watch?v=8GwSIRGae10
LangChain https://www.langchain.com/
LlamaIndex https://www.llamaindex.ai/
Streamlit https://streamlit.io/

What is Searching for null0?

A podcast about AI and network operations. How network engineers and network operations can benefit and improve their productivity by using LLMs and associated tools.

Kirk Byers: 00:00
Welcome to this podcast on AI and network automation. I have John Capobianco here, and we're going to talk about various topics in AI, AI for network automation, AI for network engineering. I'm just going to get started, John, with, why don't you give me a little background on yourself and where you're working now and what kind of things that you're doing?

John Capobianco: 00:19
Sure. Great, Kirk. So, hey. Thank you for having me and to kick off this podcast series. This really means a lot to me. I'm really excited to have this discussion. I'm currently with Selector AI as a product marketing evangelist, and I sit somewhere between helping with the product and and customer needs and use cases, but also, sharing and getting the word out there about the technology at a few different levels, at a high level, just artificial intelligence for network automation in general. But as we get closer to my actual role, it's really to talk about this idea of a network language model and a copilot for network infrastructure and more. Prior to that, I was with Cisco for about 3 years. I started as a boot camp instructor and then moved into a technical artificial intelligence leader role for about a year and a half.
I've actually been doing AI for network automation for about 2 years. ChatGPT is going to celebrate their 2 year anniversary of going public, on November 30th. So just a few days from this recording or a few weeks from this recording. That was the inflection point for me. I got caught up in that early wave of like, on the release date, I signed up and had an account, immediately. I was part of those first million users in 5 days or whatever it was.

Kirk Byers: 01:42
I sort of, at least, in the near term and, obviously, people have been working on this for much longer periods of time, but I sort of view that ChatGPT release sort of as the watershed moment for lots of people of realizing, “Oh, wow, this is really, really amazing!” Was that your sort of watershed moment, or was it sort of incremental? How did you really say that this is a big deal?

John Capobianco: 02:10
When I crossed over from just I would let's say, I would hesitate to say playing with it, but chatting with it and prompting it and trying it out through the graphical user interface or the chat interface, when I was granted API access, and I think that would have been November, the following November, it was earlier than that, February. November is when it came out. February is when I got API access.

John Capobianco: 02:40
Immediately, I was like, can I connect this to a PyATS job somehow? As soon as we were given API access, that's where my mind went to, because that's my field of study and my field of interest and what seemed like a logical thing. I can use PyATS to parse and get JSON back.

John Capobianco: 02:58
Can I just ask the AI to interpret that?

Kirk Byers: 03:02 Right.

John Capobianco: 03:02
My very first use case was, let's say, testing interfaces with PyATS. So that's several loops, and you're testing 17 different values per interface, per loop, per device. Then I'm interrogating it through my own code, with thresholds. If output drops are greater than 0, fail the test. That sort of thing.

Kirk Byers: 03:29
Yep.

John Capobianco: 03:29
I thought, what if instead I could take the JSON per interface and just ask an artificial intelligence, “Is this interface healthy?” and build that into my loop because it's just another API call. Right?

Kirk Byers: 03:42
Yep.

John Capobianco: 03:42
It was remarkable. That was a jaw dropping “Wow!” Then I started to have some fun with it. Explain it like I'm 5. If I did get output drops, what would cause it? You poke and prod at it even more and more.

Kirk Byers: 03:59
Yeah. I did similar things in various forms. I played with OpenAI’s straight API calls, and they have something that they call structured outputs where you can basically provide a Pydantic model. You sort of describe what's the data you're supposed to get from this and what it's supposed to look like. Then you go and query some network device and just dump the data into OpenAI.

Kirk Byers: 04:26
You basically say, “Hey, parse this and return the output.”

John Capobianco: 04:37
Right.

Kirk Byers: 04:37
Then, like you said, then it knows about it. So if you give it a whole bunch of IP information or VRF information, you can start asking it questions like “Construct me a JSON list of all the IP addresses that you saw (or what have you)” that you can then do additional things with and that definitely starts to be like, “Oh, okay.” This is pretty amazing. Right?” Because you're not really doing anything more special than saying, “Here's what the data should look like and extract it for me.”

John Capobianco: 05:03
Right. I found some edge cases, but there's no need to Jinga template anymore. You can send it a blob of JSON and say, “Give me a CSV formatted version of this data.” and it will happily convert that for you. You no longer have to write that Jinja2 template to do that conversion anymore. There were just so many things. So it really lit a childlike wonder in me. I really was reinvigorated and I'm clearly enthusiastic about it. But I tried to keep up with the shifting tides because one of the early things that I faced was this challenge of hallucinations.

John Capobianco: 05:50
For anyone wondering, there's a few terms that maybe we should clarify. There's the training of a model, and that is taking hundreds of thousands of GPUs all synchronized with a backend network, and it trains for weeks or months, and it costs tens of millions of dollars to create. But then there's the inference and I'm interested in the inference side of things, potentially even fine tuning which we can talk about a bit later,

Kirk Byers: 06:20
Yep.

John Capobianco: 06:18
Let's leave fine tuning aside. The inference, when I prompt it, it's a closed book. So that's a nice example, that it's been trained, and think of it as a massive book full of billions of parameters of information, human text. Well, it can tend to hallucinate and get things wrong and be incorrect and be wildly incorrect. So there needed to be, you know, adjustments in the approach to inference, and we start to get into this retrieval augmented generation idea.

Kirk Byers: 06:57
I'd add one one thing before we go into RAG there. But even not when you say, like, hallucination and straight out getting things wrong and making stuff up, but there's also an aspect of variance. Meaning if you sort of ask the same thing a bunch of times, especially in a lot of systems, you really don't want fundamentally different answers to those same things. There's this aspect of how much just sometimes it almost seems like with AI that the AI is being lazy that I'll ask it something, and it'll basically give me a third of the answer, and you'll sort of have to tell it, “No. I want the full answer.”

Kirk Byers: 07:42
But there's this aspect of variance of there's too much that changes when you're sort of querying what you view as the same system every time.

John Capobianco: 07:52
Yeah. I would completely agree. It's not atomic or it's not idempotent. It's not you know what I mean? It's a very different form of output. Now you can sort of try to adjust the temperature setting..

Kirk Byers: 08:10
Can you explain what you mean by the temperature? Because I had to just look that up the other the other day on what that actually meant.

John Capobianco: 08:17
Right. Well, I don't know the mathematical underpinnings or what it actually is affecting, but there is an idea of a temperature when you're making the API call. So that's one advantage of using the API system over the GUI is that there's a temperature between 0 and 1, and you can make it more ‘creative.’ That would be the best word for it. It could be more creative in its answers or more stringently adherent to, you know, just answering the question.

Kirk Byers: 08:49
This is going to directly relate to that variance part that I was saying earlier. So maybe sometimes you're like troubleshooting, debugging a complex problem and you try the strict answer and you don't really get the answer you want. So then you want a little bit more creative solutions to what might be causing this weird issue potentially. So it goes out into a little bit more edge cases for what might be potentially causing this.

John Capobianco: 09:18
There was sort of a mini revolution within the AI revolution itself, and I really think that a lot of people may have heard of RAG approaches already, but it really is sort of self explanatory. It's going to retrieve data from an external source to augment the generation of the inference, the output. So think of it as an open book exam. If we can actually supply a PDF, or in my case what I was trying to do, and successfully done, was say a routing table or a ‘show IP interface brief’. Because, and I'm going to start talking about a tool here, I found a wonderful open source framework called LangChain, and in itself just celebrated its 2 year anniversary. So this is very new stuff if you haven't heard of LangChain. There is also, to throw out alternatives, another framework called LlamaIndex that you can start within the cloud, and it is a very easy way to get started with RAG.

Kirk Byers: 10:22
Both of those and correct me if I'm wrong here. Both of those are Python libraries that basically are going to enable you to create a set of these building blocks, RAG being one of them and agents being another very common one, which we'll probably talk about later. But they're going to give you a lot of these building blocks that can make it easier for you to do these types of things.

John Capobianco: 10:45
You nailed it. It's if anyone's built anything with Django or PyATS or any Python framework type thing, it really tries to abstract as much of it as of the complexity as possible where you can just ‘.’ into things and make classes and functions, and it's quite.. In 15 lines of Python, as a practical example, you can build a retrieval augmented generation solution to chat with a PDF. So you source the PDF. It goes through a 6 or 7 stage process, and we could talk about that in a bit, but at the end, you end up with a database with vectors, the vectorized version, the mathematical representation of the characters from the PDF that the LLM can retrieve, and it does like a semantic search. So give me the top closest ten values to this prompt, and it does the mathematical, semantic lookup, and then it augments the original prompt with those vectors.

John Capobianco: 11:51
Now when the user says, “What is my default route?” It's not going to wildly guess or maybe just explain what a default route is or how to find your default route. It's going to say, “Oh, you know, based on this vector data that was attached to the prompt, your default route is 10.10.20.254 using outgoing interface gi1 or whatever.”

Kirk Byers: 12:16
The reason for the vector creation is to sort of consolidate your input information. Is that what it's doing there? Is that why they're typically with RAG sort of taking the input data and converting it into this vector format?

John Capobianco: 12:33
Yeah. I think it's to allow for the similarity search, and the LLM is very good with numbers. The mathematical representation is also fewer tokens. It's just a floating bit representation of the number.
Maybe to back up a step, why LangChain was so successful with me with PyATS, or other tools that can parse commands to JSON, is that it returned JSON. In LangChain, there is a loader system, and you can load a PDF. You can load a Word file. You can load a GitHub repository. You can even load this YouTube video.

John Capobianco: 13:09
They have a JSON loader that supports jQuery, and that was letting me tap into the network state information and putting that into the vector store. On Selector AI, we just released and this is free, and I built it, and it's an open source version as well as is available on my GitHub as well, but we call it Packet Copilot, where you can upload a packet capture, and we use Tshark to turn that into JSON and then load it into the LangChain and let the users quite literally chat with their packet captures. It's pretty neat, and it works in multiple languages. It'll do visualizations.
So, you know, not to dismiss Wireshark, it actually is an augmentation. Because if you can do a bit of filtering in Wireshark and make it a nice concise set of packets, this tool works better than with a bunch of noise, obviously, because of the vectors.

Kirk Byers: 14:12
I think that's sort of a general principle that I have that a lot of these tools are augmentations to human beings. Meaning, if you're coding and having AI, it's sort of like you have a supercharged, really smart IDE in a sense that it's a crazy empowerment to your skills and your capabilities. So instead of you having to memorize, “what exactly is the packet flow of a real time audio stream?” Who the hell knows/ remembers that? It can start to tell you where that might be and things like that.

John Capobianco: 14:57
I’ve been kind of calling it “just in time” expertise. Like, at any moment in time, if I need to, oh, jeez. How do I do a ‘try expect’ to clean up this Python a little bit? I'm not doing any error handling. “Here's my code. Can you handle this with an except?”

Kirk Byers: 15:17
Yep.

John Capobianco: 15:18
It's really a force multiplying augmentation system. Now I want to talk a little bit about models and maybe LLMs and some other tools that we haven't really touched on, and how we can do these RAG approaches. So within that RAG system…

Kirk Byers: 15:38
Let's just step back on RAG for a second.

John Capobianco: 15:39
So sure.

Kirk Byers: 15:39
At a high level, what we're doing is we sort of have some other information store. It can be a database. It can be a PDF, but some other information store, and we're probably preprocessing it into a form that can be more easily ingested by this large language model that we're going to feed it in and then when we're asking questions we're going to feed in this additional information as part of basically what we're asking, is that sort of a correct reasonable framing of what we're doing?

John Capobianco: 16:10
Yes, think of it as adding your own domain specific information. Through those sources of external information, the model hasn't been trained on anything other than human text, typically from the Internet. If you want to supplement its knowledge, you can use these retrieval approaches and point to, like you said, external sources, typically documentation. Typically, you know, there's a SharePoint loader, and you can chat with your SharePoint type sort of thing.

Kirk Byers: 16:42
For example, one thing I was thinking of doing and probably going to add a “to do” to my GitHub issues today is but, basically create a text format of the Netmiko documentation. Then I could potentially load that in as a RAG application. Then it would be better aware potentially, especially if I provide more examples than it might be from what it just gets on the Internet if I start saying, “Hey. Write Netmiko code to do x, y, z.”

John Capobianco: 17:14
I think that would be pretty neat. You could actually have a natural language interface into the Netmiko documentation. You could actually build this LangChain using Chroma DB for the vector store and the local embeddings model, Llama 3 for the LLM. You could package up a natural language. You could even go a step further and actually do a RAG implementation that lets you chat with the Netmiko GitHub repository…

Kirk Byers: 17:45
There, the information on Netmiko is partially public, so it maybe the LLM could sort of know enough about it as is, but then say you added information about your specific network. The LLM might be good enough on information that's just already public. But then if I say I want it for my network that has these devices with this IP scheme, with this way of building our configs, that's not going to be information that it's going to know about just from its learning on the Internet.

John Capobianco: 18:21
You might be able to combine some things. It's natural knowledge of Netmiko with, say, your NetBox as an API call and say, “Build me some automation flows with Netmiko based on these real source of truth information.”

Kirk Byers: 18:37
Yep. Now so are there other things you wanted to cover there on RAG? I know there's a lot of terms we haven't really been defining or talking about. Like, what do we mean by models and LLM, and there's other terms that we've been using as well.

John Capobianco: 18:53
On RAG, don't be intimidated by this. It is another abstracted flow in LangChain. Like I said, you can chat with a PDF in about 15 lines of code. Now you're going to need an embeddings model. Now when we talk about models, these are the pretrained models that do the inference.

John Capobianco: 19:13
A large language model is, you know, over 7 billion all the way into the tens of billions of parameters. There's small language models that are 2 to 3 billion parameters, even smaller. An embeddings model specifically changes the text into those floating point numbers. Now this kind of leads to, where do we get these tools? How much do they cost? Where do we pay for them? There are typically, for all of these models, a cloud paid service that you can just pay for on demand, embeddings, or inference with an LLM, and there's several of these now. I paid for ChatGPT. I've paid for Claude to play with it and for Anthropic.

Kirk Byers: 19:59
Yeah. I'm the same way. I have both ChatGPT and Anthropic, and I have both API for both of those as well. I know that probably adds up to something like $80 a month for all four of those things, but I am in no way sorry that I'm paying that. No, there’s a lot of things I'm paying $80 a month for that I would get rid of well before I'd get rid of those things.

John Capobianco: 20:29
I'm in the same boat, a very similar boat, but there are open source free versions and free mixtures. LangChain doesn't care if you want to point your LLM or your embeddings to the cloud or to a local Ollama instance. Now I know there's the model, Llama 3.2, they're up to now from Meta, but there's also Ollama, which is a local server that you can run. It's very lightweight and it actually will help maximize the GPU in your system if you have a gaming system. I've seen Ollama run on a Raspberry Pi. I've installed Ollama on a Catalyst 9300 in the guest shell just to do it…

Kirk Byers: 21:14
Ollama is basically a model loader. I think it's a Python package, if I remember right. It's basically like you can just simply specify “Hey. Which model do you want to use?” It'll go get the glue code, glue libraries, and make it very easy for you to or relatively easy for you to use that thing.

John Capobianco: 21:42
Exactly. You just say, “Ollama run and say Llama 3.2” and maybe the parameter size if you want to run a small or a large one. Now you're running a local LLM with an API system. You can chat with it and just ask questions by saying, “Ollama run” and then the model you want to examine. Or if you start it up as a server, it has a REST API. So within your LangChain or whatever code you've written, you can point to the Ollama API to act as your LLM. It's quite remarkable, and it's free, and it's open source. A lot of enterprises are looking at things like this because they don't want to necessarily either have to enter into a cloud agreement to make sure their data isn't trained on or or there's governance or privacy concerns or they just can't use a cloud service. But they don't want to pass up the opportunity of artificial intelligence.

Kirk Byers: 22:38
Yep.

John Capobianco: 22:38
So now they can roll their own, get their own hardware, host their own models, and these models are very comparable. But NVIDIA just released a model called Nemotron, which is a mutation of the Llama 3 model. It's quite an exciting time.
One other thing we haven't really talked about is that I just mentioned that NVIDIA has a version of that Llama 3 model. You can do something known as fine tuning. Now the outcome with RAG and fine tuning is sort of the same, but they're different approaches where fine tuning, you're actually mutating the model, training the model, injecting the data into the model. Then as a closed book, let's say you've added 2 or 3 chapters with that analogy of new information that's domain specific. Now when it does inference, it doesn't need RAG, and it can answer those questions like, “What is my default route and stuff?” So there's sort of two different approaches, but what's funny is they come together in a technique called RAFT, which is retrieval augmented fine tuning, where you're using RAG to generate the dataset that then you use to fine tune the model. So it's artificial intelligence…

Kirk Byers: 24:00
Heard you talking you were just recently on a Packet Pushers podcast, and I think you were I heard you talking about RAFT in that context and that you were mentioning that something that Selector is doing where, from reading about it a little bit, it's sort of this combination of some aspects of RAG and some aspects of fine tuning.

John Capobianco: 24:25
That's exactly it. We like to describe it as the network language model, which is Llama3, and then we fine tune it with 2 different things. We fine tune it with SQL to human language, so when you can just say, you know, “How is the WiFi in New York?”, that gets translated into a complex SQL query against the data, and then we fine tune it with actual network state data that we use RAG to get from a vector store. The whole idea with fine tuning is that you build a JSONL file, is what it's called. It's a JSONL file of data that you want to inject into the model. Well, why not use RAG to generate that list of data in the JSONL file?

Kirk Byers: 25:11
I guess we might have to quantify how big of data we're feeding in. How long does the fine tuning [take], if you were trying to do something like RAFT and you had some amount of data, and I guess we'll have to define how much data, but how long are we talking for this fine tuning process?

John Capobianco: 25:34
I've tried it in the cloud. OpenAI, if you actually have it, so I'm not sure if everyone's aware of this. If you go to platform.openai.com and you're a paid customer, there is a fine tuning lab. Now it costs more money to fine tune a model, but if you ever want to play with fine tuning, you can do it with your ChatGPT account, and then you actually get your own version of a model back.

John Capobianco: 25:58
Kirk, you could train a model on your Netmiko data. You could put Netmiko data, your README in a vector store, generate a few hundred questions about it with AI, get the answers using RAG, and then use that to fine tune Llama 3. Now you have a model that you could put on Hugging Face and people could actually download the Netmiko bot or the Netmiko chat interface or whatever. So I mentioned Hugging Face there…

Kirk Byers: 26:29
Another way you could potentially do the same app is do exactly what you said, but then just wrap it in a web interface with a chat window. Then if somebody sort of replacing Nemiko’s place where they ask questions, just basically try to build a sort of a database where somebody could come in and just ask Netmiko specific questions. Then AI gives them an answer.

John Capobianco: 26:53
Have you started playing with Streamlit at all, Kirk?

Kirk Byers: 26:55
I have not played with that. What is that?

John Capobianco: 26:57
Streamlit is another Python utility. It's streamlit.io. with 4 or 5 lines of Python, you can actually give your Python app a web interface. It's quite remarkable, and you and so you can do like st. markdown and then write some markdown code. It's all st.code. It's very simple to use, but it interfaces with other tools like LangChain.

John Capobianco: 27:26
When I write that LangChain to do my RAG interface to chat with a PDF, I can add 4 or 5 lines of Streamlit code and give it a web interface with a chat interface and a send button and everything. I wanted to mention that tool for people. Once you have something that you want to chat with, you want to share it on a web interface. That's the next natural step.

John Capobianco: 27:49
Look into Streamlit.io. I don't know if it's Apache or what it spins up under the hood, but when you do a Streamlit run and then your Python script, it turns it on with a web listening interface, and all of the requirements you need to build a UI for it...

Kirk Byers: 28:08
Another thing I thought was pretty cool and sort of tells starts to hint at, some of the powers of this is it was very easy for me using Claude Sonnet 3.5 and what they term ‘projects’, which is sort of a way of doing mini RAG in that you update upload content to them, to Claude, and then you can do things. It's very easy for me to build a Flask application that was a chat window that was then querying some AI thing. I don't know Flask. I'm a Django person, so I could count in my hands how many times I've used Flask. Claude wrote the Flask app. It was writing most of the Python code to actually interface to OpenAI. Ironically, I was asking Claude, “hey, I need to interface to OpenAI. Can you write Python glue code for this?” But I mean, it just sort of blows you away how quickly you can have a useful chat window now. It's not productionized.

Kirk Byers: 29:16
This was fine running it on a local system where I'm interfacing to it and there's a lot I'd have to do to actually if I wanted to release it to other end users, but still it's pretty amazing that I can just get this up and running and query it and start asking it questions and so forth.

John Capobianco: 29:34
It's incredible. I've started hearing a lot about AI agents as sort of the next thing after RAG or the thing to supplement RAG and it's not either or. You can actually write an agent that is RAG based. But this idea of agentic AI and it's not like there's a book I could read or or take a course on how to build agents or anything. I just used ChatGPT and Claude back and forth and iterated and said, “here's what I'd like to try to build, this idea of a react agent.” It's called a reasoning and acting agent. The idea is that it is so I find it simple.

Kirk Byers: 30:15
If we step back for a second and we start talking about agents, what in this context is an agent? What is an agent here?

John Capobianco: 30:27
That sort of idea of, let's say, RAG, if I ask “what's my default route?” I have to write the LangChain code and all the framework. I have to basically do all of that work in the code to get the right JSON, make the right call, and all that stuff. The idea of an agent is with that same question. “What's my default route?”

John Capobianco: 30:53
Can I offload the reasoning and the thinking and the actual autonomous actions that would someone would need to take, log in to the router, run the command, parse the command, return the default route, autonomously with very little code, with more prompt engineering than Python code? Because, again, we're talking about a…

Kirk Byers: 31:17
Instead of us writing a bunch of ‘if-else’ statements where we're trying to parse the question that John puts in, we're in some way interfacing to an LLM in some form and we're saying, “Hey, you work out what to do from what we're asking you in some way.” Is that sort of reasonable?

John Capobianco: 31:41
Very reasonable. Here's an example, you still have to provide some of the framework, but you can decorate Python code, a Python function specifically, as a tool. So @tool with the LangChain framework. I give it some tools, such as how to connect to the device, a tool to run the ‘.parse’ command, some very minimal code for tools.

John Capobianco: 32:11
Then you basically tell the AI, here's the prompt, and here's a bunch of tools you have access to. You figure it out. You just figure out what you need to run, what commands you need to run, how to connect to, and just come back with the answer. It's quite remarkable in that I had a few of these agents working independently of each other, and then I put a parent agent in front of them. Now when I ask a question about, you know, routing tables, it routes it to the IOS XE agent.

John Capobianco: 32:43
If I ask it about tenants, it routes it to the ACI agent. Then those agents, they look up the API call they need to make or the command they need to run to answer the question, and they sort of do it on their own. What I found very interesting, Kirk, was that with, let's say, ACI, which is API driven, I asked it to create something. “Could you add a tenant called “John's tenant” with the description of “created with an AI agent,” or something? It tried to post it, and it got a 400 response back. It knew that it failed based on the response, so it readjusted the body of the post and tried it again until it got a 200. These agents can actually iterate and understand when they haven't met the race condition to answer the question.

Kirk Byers: 33:37
I know you have a podcast, where you talk about agents and your IOS XE agent because I remember either watching it or I might have had Claude summarize the transcript of it. I don't remember which of the two I did, but I'll link that in the show notes because I know you did a whole YouTube video, where you were talking about your agents and building your agents.

John Capobianco: 34:01
Thank you. If anyone is interested, I have two. 1 on a singular agent that I had built, and then I added it and then did a follow-up video on multiple agents. The audio is not great, so I might reshoot that video, to be honest with you. I've heard I've had some complaints about the audio on that video, but what it opened my eyes to was, I think that we're going to go through an almost like a Y2K global rewriting of code, where we're not throwing away code. We're adjusting it to the times and we're modernizing the code. I think everyone who already has an existing, let's say you've written a wonderful calculator tool in Python. You can throw in numbers, and it does amazing calculations. It's got a function and some classes. You don't have to throw that away. You can repurpose that with a simple decorator and call it a tool and let the LLM have access to your calculator. Now I pick a calculator because traditionally, LLMs, no one claims that they're super great at math. They can do some basic math, but their mathematical capabilities are still well below human capability....supplementing it…

Kirk Byers: 35:16
I remember seeing that early on where I just tried to have it do some simple addition. What it did - this is probably like 9 months ago, 6-9 months, it basically just punted the summing, the literal summing to Python. It was made a little…it was like elevations on a map,and I was asking it to change, do a plus or a minus, and it did not want to do that on its own. It punted it to Python and then [it said], “Ok, now run this Python code to do the math.” I was like, “oh, okay.”

John Capobianco: 35:52
It's moving so fast. I'm not sure if you've had a chance to play with the search integration with ChatGPT now.

Kirk Byers: 35:59
I have, and I did see that. Another thing, and this sort of relates to this agent's discussion, Anthropic has the computer use capability where you can, I mean, I would want to give it a separate workstation unto itself, a VM and AWS or something, and there's some underlying coding you still have to do here as far as telling it how to do various actions, but you can instruct it to do things. I listened to this podcast, and they were trying to get it to deliver a coffee using DoorDash or Uber Eats or something like that.

John Capobianco: 36:36
Oh, cool. That's a fun example.

Kirk Byers: 36:38
See it opening the browser and you would see it trying to find a coffee, the agent trying to find a coffee shop, and trying to work out how to do this. It's sort of obvious, a lot of this stuff is very early days, but you could sort of see this question of, “Well, do we need to build APIs into everything or potentially for things if the agent is smart enough and knows how to use a computer? Can it just basically recreate what humans do in some way where it's sort of trained to do some set of things and use the browser as its interface or whatever humans do?” And you can already see this with images. Right?

Kirk Byers: 37:22
One thing that I was blown away with at a certain point of time is where you just capture an image that has data on it. Take a screenshot of something that has tabular data, and you just say, “Extract this data,” and you load it into Claude or OpenAI. Then you say, “Extract this data as this JSON data structure,” and it just extracts it. This is a picture of data, and then it captures it and gives you the content.

John Capobianco: 37:56
It's quite remarkable stuff. It really is. What was I going to mention?

Kirk Byers: 38:00
Yes. We were talking about agents.

John Capobianco: 38:03
On agents. If someone wants to play with an agent system that is totally free and open, it's on my GitHub, and it has nothing to do with networks, it's a Pokemon agent. Sort of the recipe, Kirk, that I found is that if I put a list of APIs in a JSON file with the URL to the API and a human description, and in this case, for Pokemon, it's the name of the Pokemon and the URL for the API for that Pokemon, that's one of the things that your agent can reference. Now I've overlaid this, but you can actually have them simulate battles.

John Capobianco: 38:41
You can ask about statistics. It's pretty incredible. It's a natural language interface into the world of Pokemon. The code is on my GitHub. It's totally easy to follow. You could start showing your kids how to write AI agents. I saw a wonderful conversation between the CEO of NVIDIA and the CEO of Salesforce, and they spent 3 or 4 minutes talking about agents. I think Salesforce is going to have something like 40,000 agents by the end of the year. They're all in on this agentic approach, and it's all Python.

Kirk Byers: 39:17
Let's talk a little bit of sort of directions or things we see that network engineers, operations teams could potentially do today, could potentially do, I don't want to get too far out because this world is changing so insanely rapidly that I'd ensure if we go more than 6 months, 12 months out, that we're guaranteed to be wrong. But, what are things we can do today, and what are things we can probably do in the, you know, near future sometime in 2025 in the network engineering, network operations areas?

John Capobianco: 39:53
Well, I think that there's a few different aspects to it. There's actually, ramping up your skills to build and support if that's what your enterprise is going to be interested in doing. Now it's a very small segment of hyperscalers that's doing training of models from scratch. However, most organizations are going to want to do inference of some kind. So really start to understand the cloud offerings, the privacy. Are they training from your data? Is it something you'd rather build on prem and support? Then consuming it by getting access to the API and integrating it into your flows. The number one thing is privacy and security and data governance first.

John Capobianco: 40:37
Then where are we going to host this? Because you're going to want it on the cloud or on prem. What are the skills needed for either of those scenarios? I think in your day to day life, try to get an approved way to use these tools to augment your code, network designs, configurations, you name it. Whatever it is you're doing, augment yourself if you haven't already started to adopt this, this being artificial intelligence, generally, broadly speaking. Make it part of your life. Make it part of the way you solve problems.

Kirk Byers: 41:10
Yeah. I would definitely sort of echo that. Personally, I think this is such a big thing that you should really be diving into this wholeheartedly, or you are going to be rapidly falling behind other people. In the coding space, I definitely will hear people say that the coding quality is not very good. That is really not consistent with what I'm seeing. Now, it might mean you need to play around with different tools, with different LLMs.

Kirk Byers: 41:46
But if you find the right combination of tools, which I don't think is that hard because I've seen it both with ChatGPT 4.0 and with Claude Sonnet 3.5. I'm sure it applies with others as well that the coding quality is actually really good. Then it's more a matter of you having sort of a good programing workflow or you're making changes, you're linting, testing your changes, and you have some sort of feedback loop of is it doing what I want it to be doing? But if you're doing those kinds of standard things, I'm sort of blown away with the code it can generate, and it's incredibly useful. But the bottom line is I would definitely be diving into this wholeheartedly. I think you're falling behind if you're if you're not.

John Capobianco: 42:38
You know, I tend to agree. I've spent 2 years trying to really evangelize this, and it's really out of passion and just the amplification of my own abilities. It's not going to replace you or be better than you. It's going to amplify your own capabilities. I have a full time job. I've got a family. I've got other things that I've committed to, my dogs and things. It used to take me 6 weeks, 8 weeks, weekends and evenings, wherever I could squeeze some time to really come up with a proof of concept. I am doing that sometimes under an hour now or within the 3 hour mark because I am adopting and embracing this new capability.

John Capobianco: 43:27
That's just one example. What I think is neat is that, you know, infrastructure copilots are coming, from Selector and and a few others, where you're going to have this natural language interface, just like software developers have Copilot from GitHub and from Microsoft and things like that. Software has actually been doing this for 6 or 7 months now with Copilots and some of the numbers are astounding. Amazon, who's fully embraced this, has saved 100 of human hours just with the embracing AI.

John Capobianco: 44:03
So well, Kirk, I can't wait to see you in Denver. I'm really excited, and thank you for all you've done for everyone around the world with your work. You've made such a mark. This was a real pleasure and honor for me. I've been looking up to you and your work for a very long time. It was actually part of some of your early work with Ansible that got me inspired with Ansible. So thank you for having me here today.

Kirk Byers: 44:29
Thank you very much, John. You've been sort of at the forefront of this AI in a networking space. When I was thinking of who to have on the podcast, you were sort of the first person that came to mind. I appreciate it very much for having you, and hopefully, this is helpful to other people at least in things that might be possible, things to look into, directions to start heading, and to get excited.

John Capobianco: 44:54
I wish you all the best, and I look forward to seeing you in Denver.

Kirk Byers: 44:57
Yep. See you in Denver.

John Capobianco: 44:58
Alright.