Podcast audio-only versions of weekly webcasts from Black Hills Information Security
Hello, everybody. Welcome to today's Black Hills Information Security webcast. Joining us today is Ethan and Derek, and they're gonna talk about learning to trust AI agents with automation. Because I think that's the big thing. Right?
Jason Blanchard:Like, because you feel like AI can do the thing, and you probably believe it can do the thing, but I don't know if you trust it to do the thing without you knowing that it's doing the thing, and doing the thing without you verifying that it's doing the thing. So I also have trust issues when it comes to AI. And so I'm glad you joined today's webcast. You've taken the research that you've developed here at Black Hills Information Security, using it practically over and over again in all the research that you've done. Also, you're a part of the AI Security Ops Podcast, which we'll drop the link to if you wanna continue to learn more about AI security and how they they coincide together or don't in many ways, join us for that podcast.
Jason Blanchard:But thank you so much for spending your time with us today. We got Ethan and Derek. I'm gonna turn it over to them. They're gonna talk for like the next fifty some minutes, then we'll do some q and a at the end, and then that will be it. Alright, Ethan.
Jason Blanchard:Derek, you ready?
Derek Banks:It's
Jason Blanchard:ready. Alright. I'm heading backstage. Good luck. Sweet.
Ethan Robish:Jason, I think you came up with a better title for Locast, AI trust issues.
Derek Banks:Well, I mean, information security folks generally have trouble trusting, well, anything. Right? Because I think the very nature of evaluating risk for a living makes you learn that maybe you shouldn't trust things. For sure. So why why are
Ethan Robish:we doing this webcast? So Derek and I both have spent quite a bit of time working with AI agents. Derek, much more than than me. I don't know about that. So my well, my journey really started in earnest probably around December.
Ethan Robish:That's when I, you know, installed Coding Agent locally and started playing with it and stuff. So, really, I mean, we're all we're all kinda new with this. I just realized that I had some hang ups when when I first started. Like, yeah, these these are the things, the trust issues that I have that that and not not trusts, just trust, but, like, there were several roadblocks I had getting over trouble getting over, and I just thought it would be a good topic for sharing because I if I have these issues, then maybe maybe other people have the same kind of thoughts about around AI and automation. So our primary experience has been with Claude through Black Hills, where Black Hills kinda standardized on using Claude code.
Ethan Robish:But I personally started when I first tried it out using OpenCode. Derek, I think maybe you've you've tried a couple others, but those are kinda your main ones too.
Derek Banks:Yeah. So I guess I like to so basically, how I so I teach a class on AI and information security, and the in the last teach, I kind of framed it this way. I think that, you know, in 2025, you know, folks are certainly getting more and more familiar with like a chatbot interface. And that's certainly the workflow that I had in 2025 up until about, I'd say, October or or November. So I don't have a whole lot more experience than you, but it's new for everybody.
Derek Banks:I think it was early in 2025 when Claude code first came out. So coding agents in in general are, you know, in terms of IT years are kinda new. Right? And so my workflow last year was mainly go to a chatbot, And I like Claude for coding to to ask it, you know, hey, you know, or to get it to give me code for like whatever project I was working on, I would copy and paste it from the chat interface into my own IDE. And and then I also like had an integrated like inline IDE and then I would do all of the like testing and debugging and then go back, you know, back and forth which there's nothing wrong with that.
Derek Banks:Right? But somewhere around October, I realized that, oh, you know, I can do all of this like with call it code. And then, you know, it it starts to handle a lot more of the workflow, for you. And then it was somewhere I kinda took a break from all technology over the holidays because I was working on remodeling a bathroom. And when I came back from from Christmas break, Clawed Code was all the rage because I think that what happened is
Jason Blanchard:Anthropic released a model, Opus four five or it's four four four five, I think. Somewhere in October, people didn't really have a chance to start playing with
Derek Banks:it until over the holidays and realized just how capable not only the model, but the we'll call it scaffolding code around the model was. And now, I think 2026 is gonna be, like, kind of the year of agents.
Ethan Robish:To that end, we have experience with the tools that we have used. There's a ton out there. Like Derek said, we're moving the the this space is moving so fast. We obviously don't have experience with everything. We'll talk about some of the things we know about, and then we'll talk about things we have direct experience with just to give you an idea of, you know, what what's what's out there.
Ethan Robish:So maybe to start, what what even is a coding agent? So Derek mentioned a chat GPT, going to the web interface, copy and paste stuff back and forth. Hey. That works great. Like, why do I need something else?
Ethan Robish:So I think Anthropic released this coding agent idea in their Cloud Code, and it's pretty powerful. It couples kind of that same workflow that you were doing in in ChatGPT. Instead instead of having to copy and paste back and forth, they're like, okay. You're gonna run a program on your local system that interfaces with the LLM, the, you know, the the AI. And that AI can respond back with specific instructions of, hey.
Ethan Robish:Run these commands on the system. So, basically, it gives you it gives it the ability to actually do real things on your system, like look at files, modify files. And, I mean, that that's kind of in essence there. And it alongside it is a a loop. So you can it it it can iterate and do things, like, over over and over.
Ethan Robish:And the the other term I've heard is kind of agent harness or agent decartus. Derek, do you have anything else to add on that?
Derek Banks:Yeah. So I've heard agent harness too. Yeah. I think the the big difference is, like, so you have a chatbot where you, you know, you go and you put in in a web interface, like your query, your question, your prompt into into the chatbot, and you've probably heard the term like prompt engineering. Right?
Derek Banks:Where you're you the better your prompt, the the better your output would be. And I mean, it's true with any LLM for sure. And so to get a really good coding results, I would have to have a pretty sophisticated prompt. And for example, I would have to say, here's your, you know, your identity and purpose. Here's the scope of the work we're doing.
Derek Banks:Here's step by step what I want you to do, here's the output that I expect, and, you know, that that kind of is kinda like an upfront software design document essentially that I'm giving to the chatbot. What makes the agents different is I can I can more loosely describe, the outcome that I want, and then the look at it as like that really specialized prompt is now baked into the loop and the code along with your prompt to give you better results to accomplish your goal? So instead of like having a really like, you know, detailed and structured prompt, now I can just describe the outcome that I want and the coding agent will go off and go through a loop and end up, either giving me the output or stopping and saying, hey, I have a clarifying question that I would like to ask. And so I think that's kind of like the big main difference in my opinion.
Ethan Robish:So it's it's basically able to iterate itself and improve its own, like, content Right. Then Yeah. Or prompt. Without you having to
Derek Banks:go back and forth with
Jason Blanchard:it.
Ethan Robish:Exactly. Because because you've given it extra abilities to query external sources like files or MCPs or you know, we'll we'll get into some of that more. But first off, so what what even is what, like, what what are some examples of coding engines? So we've already mentioned quad code. That's that was kinda the first one the first popular one for sure.
Ethan Robish:You Anthropic makes it available. You act you do need a subscription to utilize Cloud Code, so they've got different subscription tiers. That's actually, spoiler, one of one of the first hurdles that I had. I'm I'm kind of a frugal person, and I didn't really want another subscription. So we'll we'll go over some other options.
Ethan Robish:But Claude has a CLI interface, but then they've got a desktop application as well. And some of the other options, we've got OpenAI, obviously, the first big, know, company behind ChatGPT. They've also got a CLI that you can install. They also have a desktop application that you can use, which does largely the same thing. It's just, hey.
Ethan Robish:If you're not comfortable in in the CLI, it's just a little different interface. Google jumped on board with their Gemini CLI, and I think this is the first one that actually allows you to use it for free. So the other OpenAI, Cloud, and Anthropic, I think, required that some basic level of subscription, and I think Gemini bakes in, you know, some some free usage. So if you're comfortable with Google, that's that could be an option. GitHub actually has their own CLI.
Ethan Robish:And there's plenty of others. I'm just going over, like, the, you know, the the main the main ones that you might run into. But, yeah, GitHub, so backed by Microsoft, has their own coding agent as well. And then we get to OpenCode, which is kinda what we're gonna focus on mainly. So OpenCode, similarly, has a CLI, has a beta desktop application.
Ethan Robish:But because OpenCode is open source and not tied to any one provider, a lot of development has gone behind it, like, in the in the broader community ecosystem wise. So there's lots of plugins and wrappers around OpenCode that all all sorts of stuff. And we'll show you one of those next here. So OpenWork, this Emma play on OpenCode. They're not they're not related, but OpenWork does use OpenCode behind the scenes to to run its desktop application.
Ethan Robish:So Claude Anthropic released something called Claude Cowork, which is kind of a middle ground between what we're talking about the chat the the CLI agent coding interface and what Derek was talking about with ChatGPT. So Cloud Cowork is kind of a middle ground of, hey. You can actually do stuff, like interesting stuff, execute stuff on your system, but it's not quite as daunting to get started with if you're not familiar with with with the CLI. This brings us to our first tip. There's there's a lot of options out there.
Ethan Robish:There's more every single day.
Derek Banks:It can be overwhelming.
Ethan Robish:It's overwhelming. Just pick one and start. Like, if if you need a recommendation, start with OpenCode. If you're more comfortable with any of the others, by all means, just just pick one, start start learning it. A lot of the things that you learn for one could those same concepts can transfer over to to another as as you go.
Ethan Robish:And then I think, Derek, you mentioned an IDE earlier, but, like, your your your code editor. There's also code editors that have this AI stuff baked in, and those are those are also options if you're more comfortable with that, like Versus Code or Cursor. Those are kind of well known ones.
Derek Banks:If I had to give a recommendation, I there's ways to get started for free, and then there's ways to get started for cheap, and then there's ways to spend a crap ton of months.
Ethan Robish:Yeah. We're joking about your your runaway Yeah. As you said earlier.
Derek Banks:Well, I it I mean, I meant to do it, but yeah. Yeah. And even even so, you know, in my journey like, I started with Claude code because I was lucky enough to be able to to use it through work and learn the technology on Yeah. Work's dime instead of my dime. It's kind of a lucky thing, I know not everyone is in that scenario.
Derek Banks:And then I was using it with the API and then they, you know, we were basically just tying an API key to to call it code. And then they changed their subscription model, and so now, for Anthropic, it's like something like $25 a month, and you get a certain like level of of usage. And I don't know exactly what that level is. It's something I think like 50 queries in a five hour time or something like that. Yeah.
Derek Banks:And I think that's really great to get started.
Ethan Robish:Yeah. And there's that's that's actually that's a good point. There's kinda two paradigms with
Jason Blanchard:how you
Ethan Robish:can use AI beyond beyond the free tier. There's API based usage where they bill you based on the number of tokens that you're sending, like, kind of the raw current currency or the raw metric that LLMs use. And then there's the subscription models, which kind of smooths that over. You you just pay a certain set amount each month. Definitely.
Ethan Robish:But you get a certain a certain level of usage then.
Derek Banks:Yeah. So definitely with Anthropic, I think the API is more expensive than the subscription for what you get. Yeah. For sure if
Ethan Robish:you if you're using it
Derek Banks:Yeah. If you're using it, I would use the API for like, you know, business and work related, like, billable kind of stuff if if that's what you wanted to do. And then they have like a max subscription where it's $200 a month, which most people are like, holy crap, $200 a month. You don't really need that until you start simultaneously building multiple things and running multiple, we'll call them agents. Right?
Derek Banks:That's when it starts to get expensive. When you go from just like that account using this one coding agent, and now I've expanded past, well, while this coding agent is doing this, I'm gonna work on this, and I'm gonna work on this and this. And next thing you know, you have like an army of of of things going on. That's But that's down the road. More advanced.
Derek Banks:Yes. It's a more advanced. And so for most people, if you're not opposed to the $20.25 dollars a month, that's what I would suggest. I saw someone post in in chat as the memes were scrolling by that they were using Codex and really like it. That's what I hear.
Derek Banks:Right? I hear that Codex is really good. I just I haven't really, like, used it a lot because basically, it's one of those things where if something's working for you, you kinda keep using it, right, until it doesn't work or doesn't do something for you. But if you were trying to get started personally on the cheap, I think OpenCode is probably the best best
Ethan Robish:way Let's to go ahead and show how to do that. Right? So at the okay. I'm gonna I'm gonna do a couple things quick to I'm I'm gonna run this in a Docker container, but this is these first parts are optional. We'll we'll get to more sandboxing and Yeah.
Ethan Robish:More of
Derek Banks:why I would be paranoid and want to run this in a Docker container.
Ethan Robish:Yeah. We get to this point. Let me go back to OpenCode just so you can see where this comes from. So OpenCode, installation instructions, this command. So Derek, I think you would agree with this, but we we recommend using these coding agents either either through a desktop application, which probably takes care a lot of the stuff for you on the back end, or if you're gonna do it via CLI, you should either do it on a Mac in Linux or in WSL on Yeah.
Derek Banks:Think you'll have to actually.
Ethan Robish:Some some of them some of them I think can work on Windows natively, but Okay. It just it's you're you're kind of fighting against the street there.
Derek Banks:Your mileage may vary.
Ethan Robish:If so you got a couple options on Windows. And Docker is an option across all of these.
Derek Banks:Yeah. But WSL is one of them. So I think Mac users have an advantage at the moment. Right? And that's just because of the, you know, Apple's foresight, I guess, of having the m chips, the m one chips, or the the m series chips basically having onboard it's not GPU, like an AI inference chip.
Derek Banks:And so okay. So one thing about like AI technology is just kinda at a high level. You have training I'm
Ethan Robish:I'm gonna gonna run the OpenCode installer in the background while
Derek Banks:you're not just I don't wanna tell people what I'm doing here. So Alright. So Ethan installing OpenCode while we're doing this. So Yeah. You have training an LLM, training an AI model, and then you have running it, and that would be inference.
Derek Banks:If So you hear somebody say inference, you're actually running the model. The neat things about Macs is that the memory that you have on your your m series Mac is unified with the AI chip, the inference chip. It's the metal metal platform shader. And so that means like how much RAM you have in your system is how much you could use for running an AI model. And so if you have 64 gig of RAM on your Mac, you could run a local model and and not have to worry about, you know, using someone's remote model.
Derek Banks:And so when you say you're getting started for free with OpenCode, when Ethan launches OpenCode here, he's going to pick a model. Most of the time getting started, that's gonna be running somewhere in the cloud.
Ethan Robish:Yes. So when you this what OpenCode looks like when you launch it, and you can immediately just start talking to it, you know, sign up or anything. It they they are amazing because they have just some free usage built in. So if you wanna get started, it was like super low friction.
Derek Banks:Which is actually really cool because, I mean, it it you have to run this on GPUs, which means they're offering for free for people to get started the ability to run a certain level of usage. It looks like it said 6% usage for free to kinda, you know, get you get you kinda started. That means that they're running their free models up somewhere where just they're talking to it through an API.
Ethan Robish:And so after you ask your initial question, you get kind of this two pane interface here. So on the right, you can see a little bit about your context, how many tokens, you know, just metrics that have gone through. And then there's this little getting started box that even says, hey. OpenCode includes free models. You can start immediately, or you can connect a provider.
Ethan Robish:So if if we wanna look at what see what that looks like so if I if I do switch model, these are several models that come for free in in OpenCode. Now that being said, I don't I suspect that they're probably training on your data. So usually when stuff is free, that means Yeah. They're
Derek Banks:We haven't gotten to the privacy part yet.
Ethan Robish:Yeah. Yeah. So well, I mean, that's part of the trust.
Derek Banks:Yeah. Right? Exactly. How do you
Ethan Robish:trust the providers too? So
Derek Banks:Mhmm. That's actually We should all talk about that then.
Ethan Robish:Yeah. Just know if it's if it's being provided for free, your data's likely being used for training.
Derek Banks:Well And in the top providers cases, even if you're paying for them, they're going to use your data.
Ethan Robish:Yeah. It's usually toggle off, but if you're paying.
Derek Banks:Well, so for the way I understand their license agreement, and I swear these these license agreements are more difficult to understand sometimes than they should be. The way I understand Anthropic at the moment is that if you are paying as an individual 25 or $200 a month or whatever, they still say that they can use your data for training. Yeah. Where if you're a company on a business account, then, you know, so a team or an enterprise account, then they're not. For I understand OpenAI, if you're paying them, they're not using your data to train.
Derek Banks:If you're using it for free, they use it to train. So I mean, I think that's like the biggest thing that security professionals like question. Right? It's like, where is my data going? Yeah.
Ethan Robish:Here's here's free. Free for the price of your data. The built in providers are pretty pretty good here. So you've got OpenCode has a subscription that you can pay for it. I think that OpenCode Go is $5 a month.
Ethan Robish:So if you're if you're running out of free usage or I I guess I don't know if OpenCode Go uses your data for trading. Maybe maybe not. Maybe that would be another benefit of getting the subscription level, but that's that's an option to get started. OpenAI and GitHub Copilot, I know are kosher. They approve of using their subscriptions with with OpenCode with, like, this harness instead of you don't have to use their own CLI.
Ethan Robish:Anthropic, you can use with an API key. So we talked about the two two levels of subscription. So if you go with the, you know, the per month subscription, like, $20 a month, their their their pro plan, max plan, Anthropic does not like you using that with anything else other than Cloud Code. Yeah. And they would they will ban you.
Ethan Robish:But you can you can use it through an API key. And then Google, I think, is similar in you this probably sticks to API key. There are plugins that work around this for open code, but they also come with disclaimers that use this at your own risk because companies have different stances on whether they allowed you to use their subscription with the with the set quota with with other other CLIs. Alright. So what did I do here?
Ethan Robish:This is one of the tips later, but I just kinda asked it introspectively, like, hey. What can you do for me? Like, this is a little more specific, but I I know that because we talked about what an agent harness is or a coding agent is, these have, like, tools on the back end. They can I can actually do things? So I asked that, what tools do you have available?
Ethan Robish:And it it answered. So you can see it. It's got basic tools for working with files, read, write, edit, searching. It has the ability to interact with the web. So if you give it a URL and say, hey.
Ethan Robish:You know, go to this URL and use it for reference, it can go fetch that page, or it can just search the web for you. It has tasks and to dos so that that kind of if you give it a bunch of stuff to do, it can kind of internally manage that without worrying about running out of like, forgetting its tasks when it runs out of context or anything. Skills, we can talk about later. And then Bash is the probably the most interesting one because that gives it arbitrary command execution on your system, which sounds really scary. We'll talk about that in a bit, but that allows it to also be really, really useful.
Ethan Robish:So that's how you would get started. And if you want more resources or, like, hey, I really wanna stick on free models as long as I can, but I'm running out of usage out of, you know, the just the built in stuff. There are so there's this list of awesome free models. Like, there's providers that give you quotas for free just, again, for the price of your data. But you can sign up for some of these and and get a, you know, like, 50 queries an hour or, you know, something like that.
Ethan Robish:Like so though that is an option. And then Derek also mentioned oh, this one okay. So with those free ones, there's also, like, a bunch of people are on the free ones. So rate limiting applies and peak usage applies, so they might it might be slow at certain types of the day. So this is a tool that you can use.
Ethan Robish:You plug in your API for whatever free thing that you set up. Which ones are the fastest right now? So you can you can do a lot to work around paying money if you want to. Derek mentioned running stuff locally, running models locally. So if you have the hardware to do it, you can you can also do that.
Ethan Robish:And this is a tool that you can run that basically tells you what models, what open weight models will actually work, will fit on your your hardware. So it, like, profiles how much RAM you have, what kind of GPU you have, and it'll give you recommendations of does this actually fit, how well is it gonna run. And this is there's a tool you can just run locally to get that. Or even easier, go to this website, and it does the same thing using, like, a web technology to access your GPU, but it gives you the models and how fast they're likely to run.
Derek Banks:I mean, there was a a question in Discord, like, what local model would you recommend? Don't know. Yeah. So if we're talking about coding agents, the one right. If we're talking specifically coding agents, I think Quinn three coder next, I've had really good luck with, but I will also admit that I haven't really built anything as serious as I've done with Claude with an open weight model.
Derek Banks:Most of the open weight models at the moment are Chinese in origin, and I mean, that might be a concern for some folks. I think if you're I'm not talking about going off to, like, using a service that's hosted in China with these models, but if you're downloading the open weight model through like Olama or, you know, LM Studio or whatever you're using locally, I use Olama because it just seems to work really well for what I wanna do. I've had I've had luck with, Ollama and and Quinn three five coater next, and it is on my list to actually try and kick the tires a lot more like I do with, open coat or with, sorry, with Claude. But again, it's a it's a being productive with a tool that you know versus trying out new thing kind of kind of thing. But I would say that the Quinn three five would be a good choice.
Ethan Robish:Yeah. I would I would agree. With the caveat, it moves so fast. It's changing all the time. But Oh, yeah.
Ethan Robish:In the current state, I see people really impressed with QN three five, and there's different levels of parameters and quantizations quantizations.
Derek Banks:Yeah. And you definitely would you the LM fit thing is really really handy because I mean, you're lucky enough to have 64 gig of RAM on your Mac or just because you have GPUs or whatever, the the bigger model you can run, the better, in my opinion, in in most cases.
Ethan Robish:Alright. So we told you how to get started, what you know, how to get over that hurdle. Like, you know, it it's it seems daunting to even start. We've gone on over the pricing that, like, hey. I don't wanna pay for another subscription.
Ethan Robish:So some of you might still be thinking, well, why do I why do I even need this? I don't I don't code. Why do I need a coding agent? Well, turns out the coding agent is maybe too specific. I mentioned earlier, it has the tool access.
Ethan Robish:It has access to Bash and, you know, several other things, but it can do anything on your system. Like, it's beyond coding. I mean, that's the head of I don't know what else to say.
Derek Banks:Yeah. Here's a a good example. I we have an AI rig at work that has two, what, r six thousands in it, two two forty eight gig GPUs, ninety ninety ninety six gig of RAM, and it updated and the kernel module was broken again on the driver, so I couldn't load any models and it's such a pain to fix for anyone who's gone through that before. Like, it usually takes me a bit to remember, hey, what do you do? I'm gonna Google this.
Derek Banks:This time, last time it happened, I said, hey, Claude, use my SSH, access into this machine and go fix the driver, please. And like five minutes later, it's like, yep, everything's good to go, test it out, ready to go. I was like, yep. So systems administration, another, example recently or another example I use in my classes, I have some compromised, or compromised in some SSH compromised in some log files, and I actually just did this before coming in here. I used Quinn three five coder and then also one of the free, open code models and said, hey, look at these Linux logs and tell me if you see any signs of compromise, and both of those found the compromised account.
Derek Banks:So log analysis is another one, especially when you have like a lot amount of data. Here here's one that doesn't have anything to do with security that's kind of fun. My daughter's a competitive swimmer, she came into the wall and hit her thumb and kind of bent it back, and she thought she broke her thumb. And so we went to the doctor, they took x rays, And it didn't look broken, but it was the weekend, and they gave me a CD of the x rays and said take it to your orthopedic doctor. And and so I I got home and I wanted to look at the x rays, and I put it the only CD ROM I have left is in my Windows desktop.
Derek Banks:And I put it in there and it was just a like an installer program for some janky like viewer and DICOM files. So I copied all of those off and went to Clog code and said, hey, can you help me view these image files? And like, I swear, sixty seconds later, I had like a little web app to view the files. I could zoom in everything. And so you say you might not be a software developer, you don't need a coding agent.
Derek Banks:Well, like, the you you basically can have it be your software developer. Like, I want this thing. Have one on demand. Yeah. You have one on demand now.
Derek Banks:Right? And so that that's really the power is and, you know, like, for log analysis, you could say, hey, if anybody's ever looked at a m three sixty five log file and it's that CSV JSON nonsense. Right? If you have one of those, just have your coding agent go whip you up a script and start to analyze it for you, and then you can just do it over and over again every time you need to look at one.
Ethan Robish:So this isn't this isn't what exactly what Derek is talking about, but I saw it earlier today, and I thought it was such a cool demo. I put it in here. But some someone posted on Reddit, hey, I had Claude make me a molecular or protein, like, viewer in the terminal. And so that is I mean, you you can just have a custom bespoke software, like, made for you. Yeah.
Ethan Robish:It it it's pretty cool. So the other example that I have was I got an audio file or a video from someone, and I couldn't I couldn't hear anything. I, like, I didn't I tried, you know, I tried the usual things, tried different different systems, different programs, and I was like I I even asked the sender, like, hey. Does the audio work for you? And they're like, yeah.
Ethan Robish:It it works here. So was like, okay. I'm gonna I'm gonna get to the bottom of this. So I I prompted Claude, hey, here's here's my file. Can you analyze the audio and determine why I can't hear it?
Ethan Robish:And so this this is just an interesting thing to show, like, how Claude responds to this type of this is Claude in code. So at first, it decides, hey. I'm gonna I'm gonna try to inspect the file stream the audio stream on this file with a tool that I know about. Well, it so so then it then it runs Bash, the Bash tool we were talking about. So it tries to run FF Probe a couple times, different different usages, and it comes back, oh, it's not found.
Ethan Robish:So it just it just guessed that, like, oh, this tool would be available. So this this is maybe a common theme that you'll see, but, like, AIs can still hallucinate, can still get things wrong, can still just, like, assume things are there. And but but it gets back here here's where the loop comes in. Like, it gets the feedback. Oh, this command isn't found.
Ethan Robish:So it it decides, oh, I have to I have to change my approach. And so let me let me check what's what's available. And it it says it was gonna decide to install it, but I don't think it ended up actually installing it. So what it does do, I should mention, so every time we'll we'll get into this later with permissions, but every time it before it ran a bash command, Claude, by default, will prompt you, hey. Do you wanna do you wanna allow this command to run?
Ethan Robish:So I I was watching this because I I didn't actually want it to install anything on my system unless unless it did it a a certain way that, like, you know, that using the package manager I use and whatnot. But so it it looked for other other tools that it could maybe use to help me solve my problem. Found a couple, ran those, got results back, decided that wasn't enough. So it says, let me die dig deeper into the file structure. It wrote just a a bespoke Python script to to to unpack the the binary structure of the video file and analyze the audio streams.
Ethan Robish:Like, it it know knew enough from its training data, like, how an m p four file was structured, and it just went straight to binary unpack. Probably not the most efficient or correct way of doing this. Well, if you
Derek Banks:did let it install software.
Ethan Robish:Exactly. Yeah. So it it will try to work around constraints that it encounters. I'm surprised it didn't, you know, try to install stuff, but maybe maybe it's because it saw that I didn't
Derek Banks:have Bruin. Yeah. And so what what you're seeing on the screen here is essentially that iterative loop that we described before. Like Yep. It's basically going through and looping through and trying to solve the problem, the task that you gave it.
Ethan Robish:So eventually, it decided that it was it had done enough that it was gonna respond to me with with its results. So it says, hey, there there doesn't seem to be audio. Like, I I did some further analysis later, but, like, basically, the audio track was at zero. Like, there's there's no variation in in the data there. So, like, whatever however I got the file, like, didn't didn't have the audio.
Ethan Robish:But then I gave recommendations like, hey. To fix this, you could try in installing the thing. So I didn't even Derek, like you said, you we we were talking about earlier, but sometimes it will just install stuff for you or prompt to install stuff for you. Yeah. And for whatever reason, this time it didn't, probably, again, because I don't have brew installed, it didn't know how to do it.
Ethan Robish:But it it gave me the commands to run, and so so then I tried those. It didn't work. We went on we we went another round just to to see. But anyway, that's the question. And this Cloud action is running on a local rig, and I think
Derek Banks:what Ethan was showing was probably just Cloud code and using the subscription.
Ethan Robish:Would Yeah. Idea. It was a it was a Cloud subscription. The what what I was showing was just, like, the transcript. So, like, I wasn't doing it live.
Ethan Robish:It was just an export of what we had done together.
Derek Banks:It was an HTML version. Like, it was either a plug in or you just said, hey, make HTML out of this transcript. So Yeah. Either way would work, probably.
Ethan Robish:It was a tool. Yeah. So this is this is Openwork, which I mentioned earlier. It's it's kinda like Claude actually, I'll show you both. So we've got Claude code or Claude desktop, I should say.
Ethan Robish:The naming schemes are just particular, but this this is the chat interface, and then this this would be like if you're in web the web GUI, you know, chatting with Claus. This is the chat. I need to
Derek Banks:this is the new I need to Google something. Right?
Ethan Robish:Yeah. I need
Derek Banks:to Yeah. Yeah. The AI. Like, I don't need It's like a one shot thing I'm gonna ask.
Ethan Robish:They have a code tab, which is similar to if you open it up in your in your terminal and you can interact with it that way. And then they've got co work, which is, I guess, more of a nontechnical friendly interface, but I think largely does the same thing as
Derek Banks:Yeah. So I think the so the fun fact about Cowork is Anthropic says that they coded it over the course of ten days using Cloud Code almost exclusively, which is kind of fun, and now it's like a flagship product. Yeah. The idea was is that it essentially would bring Cloud Code to folks who were afraid of the terminal or the command line interface. And I mean, I mean, it's not there's no worries about, you know, if you don't like the command line and you like the point and click, more power to you.
Derek Banks:I think this is
Ethan Robish:And our coworkers prefer the
Jason Blanchard:Yeah.
Derek Banks:Absolutely. Like, I'm not being like, I prefer the command line just because that's what I've been doing all my life, is the command line. Right? And so and so it's just a preference thing, but the the power is like for like quote normal office workers, like, hey, take these 10 spreadsheets and give me, you know, 10 insights out of the spreadsheets kind of thing. And so it'll go off and do that the iterative loop in the background, and then give you what you asked.
Ethan Robish:Yeah. And just to show off, I I already pre prompted this, but this is OpenWork, which I mentioned earlier, uses OpenCode on the back end. So like this is this is Claude that I was just showing, Claude desktop, is tied to my Work subscription. But OpenWork here, I just downloaded it. It opened it, and there's no subscription.
Ethan Robish:I didn't have to sign up for anything. It's using the free OpenCode usage. So, I mean, there's limits, obviously. You won't be able to automate everything in your life, but it it's enough to get started and to show proof of value of, like, real work. So I just showed I I wanted to see, like, hey.
Ethan Robish:Can you actually execute stuff on my system? I said, can you list the files in my home directory? And it does. Like, it these are these are what's in your home directory. And oh, yeah.
Ethan Robish:So I I did this the other day. So I'll I'll run this again. I told it run run Apple maybe I'll just I'm curious. If I just say open the calculator, we'll see what it does. It might still use AppleScript because this is in our our context history.
Ethan Robish:Yeah. But, yeah, you can see it just opened up calculator on my system. So that's it it can do stuff. Right? That's the whole point.
Ethan Robish:Alright. So you've got something installed. You've got you've got it in front of you. Great. Yeah.
Ethan Robish:I'm in I'm in the desktop GUI or I'm in the the command line, and you're staring at a blank screen. Like, what how do I start? What do I even do? Ask the agent. Start start using it as a partner.
Ethan Robish:Like, someone who knows how to do something, and you can ask them, like, okay. How how do I start? Like, here's here's what I wanna do, or what what can you do for me? Like, just go back and forth and learn kinda what what it can do, and you'll you'll slowly start to realize what it can do is just broaden broader and broader.
Derek Banks:Yeah. I mean, I I think that it basically can operate on your computer. And the more access you give it, the more it can do for you in a lot of respects. But I think that when you wanna make changes, like let's say that you wanted to say, I don't want you to ever read any of my SSH keys in my dot SSH directory. I would actually just tell the coding agent, go make that change, like Claude, go make this change, and it'll go modify the JSON for you.
Derek Banks:I mean, you can hand modify it, but anyone who's, you know, if you've not had experience hand modifying JSON and adding something, that might be problematic cause it has to be pretty specific. Right? We're jumping Jumping ahead to our permissions slide. Oh, yeah. That's a good point.
Derek Banks:Right? So but I mean, the idea is is that you can essentially just use it as your, you know, your assistant to do whatever it is you're doing on your computer. And what I found to be very useful is be specific in what you ask. Right? Like to say, I want to do the specific thing.
Derek Banks:So for example, you have a bunch of PHP source code, just don't say, hey, go analyze this source code and find vulnerabilities in it. Instead, what you would wanna say is, in this directory, I have a PHP source code, go off in the internet and do research and come up with a plan to to do to to find vulnerabilities in the source code. Still not like a real, like, prompt, a well thought out like, you know, prompt, but being very clear and specific in what you're, you know, asking to do.
Ethan Robish:Yeah. So this Derek just touched on a whole bunch of stuff that's coming up, so setting the stage. The next hurdle, trust issue that we're we're encountering is like, hey, how do I how do I even trust believe what the AI is telling me? Like, I this isn't true. Everyone using.
Ethan Robish:Yeah. Yeah. Who has used the chatbot has had it hallucinate or confidently answer your question, and turns out, like, the information it gave you is just completely wrong. Like, incorrect. So so how do you combat that?
Ethan Robish:How do you recognize that? And that part of it comes with experience. One thing for for coding agents in particular, because keep in mind that the LLM that they're trained on was that point in time, and stuff is changing constantly. Right? So one of the tools we we showed earlier is it has access to go search the web or pull out pull recent documentation.
Ethan Robish:So you can you could tell it to do that. Like, if you don't tell it to do that, maybe it'll decide to, or maybe it'll just it it convince itself that it knows enough to give you an answer, but it'll be an outdated answer. So if if you know, hey. The latest, greatest documentation is here, give it a link to that and be like, hey. Go read this documentation, and give them that.
Ethan Robish:Tell me how to do this thing.
Derek Banks:Yeah. I have an exact real world example that happened here recently at Black Hills for this. Like, so we also some of our folks historically have used Copilot, not not GitHub Copilot, but the Microsoft Copilot that is Naming things. You know, right? That is the, like the base Copilot, like chatbot offering that you get with your Microsoft service.
Derek Banks:And without I won't be mean, I'll just say that that that co pilot offering is limited and not very good. Because of like, you know, limitations that Microsoft has put on it and lots of different reasons that we don't have to worry about, but they were asking it to they're trying to troubleshoot TCP four forty five outbound from Digital Ocean and we're having an issue and Copilot kept telling them that it had to be the firewall and and and was telling it wrong information. I did exactly what Ethan has on the screen with Claude. I said, go off and get the latest Digital Ocean documentation and tell me if they started blocking four four five four five outbound because they didn't in the past for sure. And sure enough, Claude went off and, you know, baked and percolated for a couple minutes and came back and said, no, it's definitely not blocked outbound.
Derek Banks:And so the copilot was confidently right in its wrong answer, where Quad went off and and did a bunch of testing like and went and, you know, like research rather, and went and like read the doc, so to speak, and found out, you know, no, it's gotta be something else. So
Ethan Robish:So one issue that you might have is you get in a conversation. You you and you probably experience with the chatbot too. Like, you it's it's giving you wrong answers, and you keep telling it, no, you're wrong, and this is why. And they'll say, oh, absolutely. You're you're absolutely correct.
Ethan Robish:And then and then it'll go off and tell you the exact same thing again, or it'll get get it wrong again later in the conversation. So one kind of mental model analogy I like to think about is Plinko. So if you ever seen the Price is Right or it it's basically a pegboard where you drop a a token and it bounces back and forth to the bottom. So if you think about kind of that as your conversation, so at the bottom, you've got these cups that some of them are desirable. Like, they're worth more points.
Ethan Robish:Like, this is this is a desirable outcome. And and maybe this group is, like, all high point values. But then over here, like, this is zero, or, like, reset your points or whatever. Like, bad outcomes. So where you start where you where you start by placing your Plinko chip is kinda like how you start your conversation.
Ethan Robish:And every subsequent message that you send to AI so, like, in LLM, like, when when you're inter interacting with, like, human chatbot, it seems like it's a whole conversation and you're going back and forth. But from the LLM side, it it's all one shot. Like, it's it it does in inference pass through the LLM given your your conversation history. So that's that's how it keeps its state is because it sends your entire conversation history all over again. So as you're getting a longer and longer conversation, you're moving down and you're like, that's you're you're placing your clinker chip lower and lower to try to, like, aim towards your goal, but you're constantly getting wrong answers, going lower over the wrong answers isn't gonna help you.
Ethan Robish:Right? So that's where it might be better to back up, start over. Like, start fresh with a new context. Like, no conversation history. Let's take a different approach.
Ethan Robish:Let's, you know, do do something differently, like, based on the lessons learned from and and you can, again, you can even ask the AI in your previous conversation, like, where everything is wrong, say, hey. You keep getting this wrong. Tell me tell me why. Tell where did we go wrong? Like, tell me how to start a better conversation.
Derek Banks:Yeah. I mean, I've definitely used AI to give me prompts to make my prompt better and gotten better results. But yeah, what Ethan's describing is all happening in what's known as the context window. And so, it's basically look at that as like you have the entirety of your conversation in that session with the the large language model. And so, if you're if you're getting bad results, it's always a good idea to try and start over and and get better results.
Derek Banks:So there's a there's a
Ethan Robish:lot to learn. Derek just mentioned context. There's a lot of different terms that go into coding agents and environments. I kinda go ahead. I was gonna say this
Derek Banks:could almost be a four hour workshop. Mean, it's forty eight forty eight minutes in, and you're like Yeah. Trying to, like, scratch the surface a little.
Ethan Robish:I think I've got, like yeah. More than we can get through. Probably. That's okay. Some demos, but We can
Derek Banks:ask everybody if they would like a four hour workshop. Yeah. Go for a workshop, please. Alright. So there we go.
Derek Banks:Well, Lisa, do you
Ethan Robish:These concepts, I kinda grouped into different categories how it made sense for me. Like, you your agent has access to tools, and whether those be the the local tools that we're I was showing earlier, or you can connect things called MCP servers that let it interact with, like, remote APIs. Those are those are tools. Those are connections, like, that it can use to to do real real work and interacts and integrate between different different areas. And then you got your context.
Ethan Robish:So I mentioned your chat history. That's part of your context. There's something called agents dot markdown or in the club world, it's claud.markdown..md. That's like instructions that are are sent every single time. You've got agents and sub agents.
Ethan Robish:They they are all just like terms that you'll hear and get see thrown out. There's there's skills. There's slash commands. All of these fall into ways of constructing your context. So if you if you invoke an agent, it's just a markdown file with instructions that, okay, now now this loop has these instructions to start with.
Ethan Robish:Skills, just markdown files with instructions in it, and the agent decide like, knows of the existence of skills, and it can decide when to use them. It can decide when to go read that skill file and pull in and follow those instructions. Slash commands are very similar. It's just a prefilled prompt. Like, I mean, honestly, I think I think Anthropic is kinda combined skills and slash commands in some way, but very, very similar.
Ethan Robish:And then you've got your config. So this is the actual configuration for the the agent harness, the program that you're running. So this isn't so much getting the AI to do a certain thing. It's like how it's like setting boundaries, setting preferences and configuration around the the actual tooling, the harness. And this is where you can have hard security boundaries, like like like setting permissions or putting in hooks to verify or check or deny things that the AI attempts to get the agent harness to do.
Ethan Robish:Alright. So context usage, I think I'll show that real quick. But that's back to the open code example demo I showed earlier. You you can see it it's talking about context and how many how many tokens are used, and it's saying 6%. What does that mean?
Ethan Robish:Well, context window is limited. Like, on the back end, there's a hard limit that the provider sets of how many tokens you can send at once. That's your that's your context window. So if we go to we go to Claude, I'm gonna run context just because it has a nice visualization, and you'll see a bunch of stuff that I have installed. But if I scroll up, I I so this is a brand new conversation.
Ethan Robish:I haven't sent it anything other than flash context. And you can see, like, part here is the system prompt. So the coding agent, your harness, has a system prompt that it just comes baked with that that takes up context, that every single concert conversation starts with that. There are ways you can, like, clear that or replace it or add to it with the the command line flags, but that's just that's there by default. So custom agents.
Ethan Robish:So I mentioned agents, just a markdown file. Right? Well, those get those at least the existence of those, Claude has to know about them because it has to know when to when to use an agent, when to use a skill. So those definitions of agents and skills are also in in your context. Like, it's it's something that's just taking up room.
Ethan Robish:And then you've got messages, which, again, I there there'll be more messages if I was going back and forth in an existing con conversation. And then and then it's a bunch of free space. So I started this claud specifically with a 200,000 token limit. There there's models with a million token limits as well. So Yeah.
Ethan Robish:This is
Derek Banks:just the a comment it seems like the LLM providers will be able to manage context automatically. Well, they kinda are. Right? Because the the the toe the 200 k limitation that you see here for SONNET four six is is is a a limitation on the provider side. Yeah.
Derek Banks:That means that for your session, like, they'll and it has to do with how much compute that they have. And because when you have a conversation with an LLM, that LLM is running, but there's like a layer, the embedding layer of turning your words into numbers that is also running on the GPU. And so like the ability, their limit to the context, because they have to provide this for all their users, so it has to run on a GPU. And so it's only been here recently that Anthropix had to compute where OPUS 4.6 now has a million tokens. And so real quick backing up, a token just means a word or a stem of a word.
Derek Banks:Like, could be typically, it's gonna be a word or if it's compound word, the parts of that word instead of like a character. Right? And And so we measure this in tokens. And then I think that both Grok and Gemini are up to a 2,000,000 token window at the at this point.
Ethan Robish:So nice. Thank you for that. By the way, Derek teaches a class on AI security. Yeah. But yeah.
Ethan Robish:So to your point about managing the context, so the the agent harness does, you know, manage a little bit. There's plug ins that can help you, like, manage your context better. But notice notice these x's here, and the the key says auto compact buffer. Well, this is still within our our limit, but once once our context usage crosses this, your harness mini harnesses will go into what's called auto compaction, where it will take all of your context so far, send it to the LLM, and say, summarize this for me. Like, summarize this text.
Ethan Robish:Summarize this conversation. And how exactly they do it is gonna vary from from harness to harness, but the whole point is it's it's a lossy operation. So that if you ever experience using a coding agent and it forgets things, this is likely why. It's it's dropped out of the the context. And there are so many projects and so many people trying to solve this problem right now.
Ethan Robish:If you go searching for memory, that you're gonna find so many options. So it is it is an open problem, but there's tons of solutions out there. No one's Emerge is the best in my opinion.
Derek Banks:Yeah. I mean, I I think my best advice at this point in time is if you can afford it, use one that has a million a million token context window. Because ever since Opus has made the switch, like, I have noticed a dramatic difference. It it has been a lot better. Also, kinda keep an eye on it, and then, like, try and gate, like like, time your work to where, okay, here's a good place to, like, go ahead.
Derek Banks:Like, I'll often say write a checkpoint file, and then I'll compact manually instead of letting it auto compact, and then I'll Yeah. There's different strategies. Yeah. There's different strategies. And like Ethan was saying, this is like you know, if you're feeling like this is a lot and this is overwhelming, like, it's okay.
Derek Banks:We are all, like, basically just now getting started. And I think somebody just, like, had Daniel Measler's personal AI infrastructure build bio in the chat. Right? Daniel Measler actually had a a quote here recently on X that I really, like, liked. He basically paraphrased, like, look, if you feel like that you're behind and and, like, you know, all these people are ahead of you, we're really not.
Derek Banks:Right? Like, we're all now starting at this new agentic wave, like, at the beginning. And at best, people have a couple months on you. And so the best advice I get is don't don't panic, and just dive in and start using it and and get familiar with it because it'll really make you, like, a more productive kind of thing. Just don't don't feel like you're behind.
Derek Banks:You're right on time. So
Ethan Robish:So this this next section is more best practices of how to use an agent to, like, do meaningful things or to work on a project, like, maybe a more than just, you know, hey. Go do this isolated task. Like, it's it's something that's gonna take a lot of steps. It requires a lot of planning. But you may have heard us mention planning.
Ethan Robish:So you saying, hey, go go do this big long project, and you just tell the AI, like, go do that with very, you know, one prompt, just go, is is kinda like giving it a signpost. Like, go that direction. Like, head there. Maybe it'll get there, but likely it's gonna get lost or go off the off the trail along the way. So a better way is what's called plan mode.
Ethan Robish:So most most of the agents are gonna have some sort of plan plan mode that you can enter into where it's not gonna take action. It's purely about going back and forth chatting with you until the idea is cemented. Like, you're you've designed a specification or something. And once you have that specification, you review it. The like, you you agree, like, hey, this is the best way.
Ethan Robish:This is what we wanna do. That's kinda like giving it a map. So both both it's got a direction, and it's got a map to know, like, how how to get there. One step further is, especially if you're working in software, is the ability to allow it to to check itself, to correct itself, like, to know where where it is. That's kinda like giving it a compass.
Ethan Robish:Like, if it can run a command against to to verify that the work that it's done is correct, it can it can know where it's at, like, in addition to, you know, where where to go next.
Derek Banks:And that plan mode, if you look at it, that's essentially, like, making that, like, big complicated prompt that you're putting into a chatbot. It's basically doing it's and and it's all about providing the LLM more context for solving the problem.
Ethan Robish:So I've seen a lot of people talk about spend most of your time in planning and then the least least amount of time in execution, and that that will get you the best results. And so, Derek, this I think this is from your class, but No. It's it's bad. Yeah. No.
Ethan Robish:That's alright. It's a wait. We're talking about we removed the background and it kinda made the text all and wonky. Not not the not the concept. The concept's good.
Ethan Robish:But it's it's it's basically a loop. Like, you're you're iterating. You're iterating or iterating with the agent. The agent is iterating on its own instructions. So it it's a continuous continuous improvement.
Ethan Robish:Alright. I think we're out of time.
Derek Banks:But oh, man. That made it That's a the way through.
Ethan Robish:Yeah. We I okay. So because the last section is another scary thing, you're telling me to give AI command execution on my system. Like, how can I trust that? I'll just give you the the TLDR.
Ethan Robish:There's there's a bunch of ways to do it. You saw me run Docker. Just put it in Docker, start it in a virtual machine. Or if you don't have any of those and you're willing to pay money, like, go go to someone else. Have have DigitalOcean, spin up a droplet, or go to AWS or whatever.
Ethan Robish:Like, run run it on someone else's system, and those are all ways of sandboxing. If you want it closer to, like, your system, that's when you get into, like, fine grained permissions and setting up hooks and maybe even running, like, Cloud has a native sandbox feature on certain systems. But Yes. So If anybody asks
Derek Banks:to know what I'm currently doing for most of my work, that would be sandbox and like running locally with it on my machine, in a sandbox with specific security hooks that, Daniel Measler so graciously provided in PAI.
Ethan Robish:Yep.
Derek Banks:So and and I've actually seen it, like the security hooks actually stop stuff before. So, I mean, it's a good strategy, but you know, everything has some risks trade offs. Right? If I was doing something, like if I was pulling information like for an incident response off of like some unknown website, I would use Docker and use the coding agent in Docker because I really think that moving forward into the future, like prompt injection is going to be a big thing with these systems and
Ethan Robish:Yeah. You tell it to go out and research the latest documentation on the web. Yeah. We can mostly trust the web, especially if you're sending it to reputable documentation sources. But, I mean, what what's what's it called?
Ethan Robish:Supply chain risk? Like Yeah. Programming or, like, libraries and packages are getting backdoored all the time. How long before people are backdooing documentation websites by putting prompt injection in
Derek Banks:that That it's already happening. It's like, really. An air gapped island. Probably not that much. I mean, I do think that Docker and virtual machines are a real good way to and if Docker is something new to you, I would definitely encourage you to learn more Docker because it's very That's handy.
Ethan Robish:Have the AI help you.
Derek Banks:Have the AI Start asking about Docker. Exactly. Alright. Well, we almost made it through all that. So I
Ethan Robish:do have We've we've the call.
Derek Banks:Ability to make a workshop for sure.
Ethan Robish:The I think the the entire slide deck was shared in Discord, so you can check out the the few things that we didn't cover in-depth.
Jason Blanchard:I didn't wanna stop you because I had a feeling people would be like, let them talk. Let them talk. Let them and Maybe
Deb Wigley:it's like two minutes over.
Jason Blanchard:Yeah.
Deb Wigley:It's not bad. Well done, guys. It's fascinating. Looks like you're planning a workshop, I
Derek Banks:think I would be in a downlist doing one for sure. Yeah. And it seemed like other people would come come at least check it out. So Yeah. Alright.
Derek Banks:I don't mean to sign you up for work in front of everybody, Ethan. Know, we
Ethan Robish:can Not get too late to get tooth them.
Deb Wigley:So It's true.
Jason Blanchard:So I'm gonna have you do a final thoughts. Ethan, you'll go first. Derek, you'll probably have the same final thoughts as Ethan. Anyway, so Ethan, if you could sum up everything that you talked about today and one final thought, what would it be?
Ethan Robish:Well, I would address the the fears that I had. So you can get started for free. You don't have to know the CLI. There's desktop applications. There are ways to make the output more trustworthy.
Ethan Robish:If you if you approach it the right way, there's frameworks and approach and techniques to to do that. And there's ways to sandbox or isolate so that you can run safely on your system if you're worried about there's there's horror stories of Claude decided to r m dash r f my home directory. Like Mhmm. Or you see people having Quad, like, remove a production database, whatever. That's that's that's where guardrails comes in.
Ethan Robish:Like, you don't give it production access to the database. You have it write scripts that can manage your production database, and you verify the scripts. But anyway. But so there there's ways to address all of the the hurdles that that I had by getting started.
Derek Banks:Alright. Yeah. I think that I'll take a kind of like a a step back at a higher view and say, if you're feeling overwhelmed and, you know, and insecure and you're in this, you know, like, oh my gosh, this is gonna take my job. Well, okay. So take a step back, take a deep breath, that's probably not what's going to to happen.
Derek Banks:And I think that, you know, it's it you know, we've survived lots of other technology waves and, you know, the the cloud was gonna get rid of all traditional, like, servers. Well, it didn't happen. Right? So I think that, you know, I give the best advice I can give to you is, like, oh, feeling overwhelmed is is normal. I think that's part of information security.
Derek Banks:We we kinda live in that, like, kind of like, chaos and uncertainty. There will definitely be chaos and uncertainty, but the the best advice would be just to to learn the new tools and technologies and what they can do for you, and don't don't get too worried about it.
Jason Blanchard:Thank you both. Thank you for sharing your knowledge today. Thanks for doing the research and I want to tell to other people. I always tell, like, when we're coming up with ideas, like, hey, take something that you know today that you wish you would've known six months ago, so that way, people can get up to speed where you're at today. So you had a good day.
Ethan Robish:I took that to heart. Thanks for that advice, Jason.
Derek Banks:I think that's the what happened here. Alright,
Jason Blanchard:everybody. We're gonna stick around for post show banter. We'll do a little bit of q and a. We have some questions. But for now, if you gotta go, go.
Jason Blanchard:And then if you'd like to stick around even longer, Tom Smith is here to talk about what it's like to do business with Black Hills. Every once in while, people are like, you guys offer services? I thought you were like a comic book webcast company.
Derek Banks:Oh, I thought you did. Yeah. I I thought we were a video production company.
Deb Wigley:It made blank. Alright. Okay.
Jason Blanchard:For Gen z, there was a thing called Wings World. They Oh,
Deb Wigley:no. Oh, no. There are people who've done burger as well. Yeah. That's
Jason Blanchard:Wondering that I quit being a teacher because I like, I was talking about how Top Gun influenced, like, me as a child to wanna get into filmmaking, and I talked about the movie, and someone was like, spoilers.
Ethan Robish:And
Jason Blanchard:I was like, raise your hand if you've never seen Top Gun, like 60% of the boys there. I was like, I quit. Did these 20
Deb Wigley:plus told on GooseDine.
Derek Banks:Spoilers. That's exactly what I was gonna say,
Jason Blanchard:Yeah. Goose Eyed.
Deb Wigley:Goose Eyed. What?
Jason Blanchard:To be a seven year old in the theater and watch, like, Goose, you know, It's
Deb Wigley:something I will say is young to watch some movie, Jason.
Jason Blanchard:Hey. Good. Just now. On the way out. Okay.
Jason Blanchard:Alright. I did put a link in the chat at the beginning of the pre show banter. We posted this link. We had it yesterday for the Sox Summit. It's two free hours of lab time each week with step by step labs that we're gonna continue to add to.
Jason Blanchard:We were doing the CTFs with the webcast, and we felt like this was a better way of doing it. So every week you get two additional hours of lab time where you can go and get that. Now if you sign up and you're in The United States, that part's important, Deb is going to have something mailed to your house. It's the new SOC survival guide that we just finished, just published, just made available, and we're gonna mail it to your house. If you give us your address.
Jason Blanchard:You don't wanna give us your address? No worries. You still have at three lab time.
Deb Wigley:Yeah. Alright. If you're not in The United States, I'm gonna post the digital version, so you can grab it there. There you go.
Jason Blanchard:Alright. So someone here's one of the first questions, Ethan and Derek, and you talked a little bit about tokens, you talked about like budgets a little bit. But if you're if you're trying to like figure out how many tokens do you need, is it 700,000? Is it 4,500,000? How do you figure this out ahead of time?
Ethan Robish:So it's that that almost sounds like coming from the the idea of, hey, I'm gonna go buy tokens. Like, I'm gonna I'm gonna prepurchase a token pack. And that's not it's not really how it works. So in the terms of context management, that's that's a whole different animal. But it sounds like they're they're worried about, like, cost, like, cost management.
Ethan Robish:Mhmm. And one way to deal with cost management is start with start with a subscription plan. So then it's just a set price every every month, and you you go until you hit your quota, whatever whatever that plan allows. And then you don't have to worry about, like, runaway costs. If you're using the API and and you can use do pay, like, per million tokens, they usually charge I think what for most providers, you can set you can set limits.
Ethan Robish:You can set say, hey. Don't spend more than Derek Wigley. We've got a thousand dollar limit on our work account. Right? Like, so that way if if something does go wild and run off, then you you're not spending more than what you're comfortable with.
Ethan Robish:And for your workloads over time, you learn, like, okay. This is kinda how much token usage that this takes. Contacts management is a different animal where you've got 200 k tokens for your contacts window or million tokens for your contacts window. The main way to avoid hit like, hitting your contacts, the the limit of your contacts window and triggering compaction because it this isn't a price concern. This is a my agent will forget what it's doing halfway through what it's doing.
Ethan Robish:The main way to handle that is to spin off sub agents. So I we touched on agents or sub agents, but, like, most most of the harnesses are smart enough to do it now anyway. Like, if there's discrete tasks, it will spin off a separate agent which has its own block of 200 k context. And so that way, you're not using up your your main 200 k context to to do this other task. Like, it goes and does that reports back, and it's it's kind of a way of, like, getting more context window for isolated tasks.
Derek Banks:And when you're on a subscription and you hit that limit, basically, the agent will pause and say, hey, you're out of your allotment time. Would you like to wait till 5PM eastern to continue? And generally, you're like, sad face okay. Fine. Then Guess you have to go Yeah.
Derek Banks:Right. I'll go outside and touch grass right now. I'll be back at 05:00. Or you pay them more money, or you come up with something else. I mean, it's just kind of the nature of how it works at the moment.
Jason Blanchard:All that sounds there there's a bug being killed in the office.
Deb Wigley:What's happening?
Jason Blanchard:It's a bug. Okay. My and the team is killing a bug somewhere in the office.
Deb Wigley:Get that, Maddie.
Jason Blanchard:I hope. I hope it's not. Them killing lot. They're still killing this bug.
Deb Wigley:Is it a bird flying?
Derek Banks:Or like you need
Derek Banks:to like move your webcam around and let us see.
Deb Wigley:Yeah. Are they still going out? Anyway, there you go. Do you need me to take over?
Jason Blanchard:I've got
Ethan Robish:a question for Will.
Deb Wigley:Thank you. Yes. That's great.
Ethan Robish:So hey. Hey, Will. Hey. Hey. I so I I we kinda geared this towards people who haven't been using Coding engines before, but you have.
Ethan Robish:Like so I'm curious if you took anything out of this or what you took out of it.
William Corbin:I mean, you you pretty much talked about a lot of the stuff that I had to learn over the past couple months
Derek Banks:and what I've Why didn't you give it earlier? Like,
William Corbin:it's okay. This happened, like, six months ago. No. I thought I thought you guys did a really good opening intro into it and and covered a lot of things people were probably gonna struggle with. And my best piece of advice to anybody out there is, honestly, the best way to learn this stuff is just play with it.
William Corbin:Get yourself one of the cheap cheapy accounts and just have it start doing random tasks for you.
Derek Banks:Yeah. That's actually that's great advice, Will.
William Corbin:Use it use it even in like I first started playing with it, I started using it to look crap up at video games. Yeah. Played with it, like, there a
Derek Banks:little bit. Actually, somebody who said that in in chat a little bit ago, they said something along the lines of like, I always sit down and play with it, and I don't know what to do. Well, what are you interested in? Like, what do
William Corbin:you want to do? Have it go do that. Right? And just get like, just start using it And like work related, one of our supervisors said it best to me, it's think of
Derek Banks:it like you just got yourself an intern. And I really don't wanna look through these spreadsheets and and summarize these for my boss. So can you go do that?
William Corbin:These monotonous tasks over and over and over again, especially when it's stuff like data collection and having to read through like hundreds of lines of things. Why not just go, hey, Claude or hey, chat GPT. Do me a favor. Read this and give me the key points.
Derek Banks:Because that's what large language models are really good at. Right? And and so this isn't like a person in this, like, we anthropomorphize AI because we don't have words to really describe what's happening. But basically, large language models are really sophisticated pattern matching machines, and they're very good at taking a lot of data and making sense out of it. And so those are the kinds of tasks that you're really gonna excel at is looking at at day with a with a coding agent, is looking at that data and making sense of it.
William Corbin:Right. I use it mostly in my everyday work to basically just get me the key information I need. Yep. And if anybody's ever seen computer logs, they can be a nightmare to log through to get the key information sometimes.
Jason Blanchard:Yeah.
William Corbin:So it's and like, it saves me a good couple minutes on like everything I do, and that time adds up over the day.
Jason Blanchard:I I think I have an additional question about the token thing. Also, they got the bug. So so you you mentioned that, like, in the middle of the automation, it just stops. It forgets or just doesn't keep going. Okay.
William Corbin:It there's two ways it's gonna stop because the the the session there there's like section context, and then there's like the total context you're using overall for the entire thing. The total context is what's linked to your account. And that's the thing that's always gonna have the heart shut off on it and go, hey, you're out of usage at this point. Yeah. There's also a session context.
William Corbin:So a session is basically like each task you're trying to do should be its own individual session and what you're working on. That's what's limited by with Copilot or not Copilot, but Claude is 1,000,000 context for most things now with with the opus and the sonnet. Once you hit that 1,000,000 context, it's not a hard stop it'll do. The way Claude handles it is it'll actually summarize everything you've done in that current session, restart a session, and then give the new session that summary.
Derek Banks:And and it might not to say it always forgets isn't correct. Like, I've had auto I've had auto compact happen and I didn't anticipate it happening, and I just wasn't paying attention and it auto compacts, But then it still did what I wanted it to do. Right? It's There's just the possibility that it might lose track of something. It's essentially that's not hallucination, that's called drift.
Derek Banks:Right? And so, it's like drifting away from the original task that I gave it. And so for this is actually a problem now, but as we get more and more and more compute and context windows get bigger and bigger, it's eventually probably not even gonna be a problem.
Ethan Robish:But
William Corbin:Right. And even with the the compacting and it going into a new session, At most, usually, all I've had to do is give it tiny reminders.
Jason Blanchard:I have one last question because we are running pretty long, and then I'm gonna bring in Tom Smith to talk about how we can do AI security assessments for your organization, if that is something that you're interested in, that if you're thinking about implementing AI and you wanna have a security assessment of your implementation of AI, Derek and the team can help with that. So here's the last question, then we'll move into the sales. So is sandboxing plus the additional folks Derek mentioned limiting the blast radius of supply chain risks?
Ethan Robish:I would say, yes. That's not yeah. Yeah. That's not specific to AI. Like, if we're talking about supply chain risks, like,
Derek Banks:Well, so I think taking a supply chain risk is being it went off and it got like a prompt injection to take all of my private keys and post them to this website or something. Then, yes, sandboxing and hooks are a way to prevent that, but it's not a, oh, I set like some like, I'd have to go make sure I had the right stuff in place. Like, I I guess one way that I'm looking at it at the moment is isolation is not like, in in client layer coding agent security is not something that's at an industry level completely solved at the moment.
Ethan Robish:Right. Yeah. I I was mainly saying, if we're talking about supply chain risk, you can think about how would you limit the blast radius of traditional supply chain risk. Like, assume a bad package got installed because it was a dependency. Like, okay.
Ethan Robish:Well, now that wherever that's running is compromised, What what can it reach? Does it have access to your secrets? Can it delete critical files? Can it exfiltrate data to the the Internet? It's all the same concerns for sandboxing AI.
Derek Banks:Yeah. This actually just happened too. And so, like and and Ethan's right. So this is really no different than I'm pip installing beautiful soup, and I misspelled it, and I got a malicious package. Like, that's this it's the stain No l m.
Derek Banks:L l m. L l m. Yeah. Light l l m.
Ethan Robish:It's just got compromised like this last week.
Derek Banks:Yeah. Two like, a couple days ago, Light LLM got compromised and was pulled in as part of software building packages by, you know, coding agents. And it was exfiltrating like AWS keys, API keys, SSH keys. And so this is a real risk and it's just going to get worse and worse. So agent security is definitely something.
Derek Banks:So sandboxing and Docker, NVIDIA actually just came out with NVIDIA Shell, think, which is probably Docker under the hood or or or something similar to it. Mhmm. So there's definitely there's a And I recognize. What's that? It's Kubernetes.
Derek Banks:It's Kubernetes. Oh, okay. So, yeah. And so, this is a problem that we're gonna have really quickly industry wide, I think. Well, it's probably already a problem.
Jason Blanchard:Okay. So we're gonna bring Tom in. Thank you so much for sticking around. If you wanna know what it's like to do business with Black Hills Information Security, please continue to stay here. Thank you so much for giving us so much time so far.
Jason Blanchard:So Tom, as I listened to this, I joined, you know, I entered in the cybersecurity industry eleven years ago, and when I got into, I had no idea what a pentest meant or metasploit or a shell or any other words that I was hearing. And this past year of being a part of all these AI security, AI implementation webcast remind me reminds me of that eleven years ago. So when Yeah. When comes it to Tom, you know, people were coming in and they're like, hey, we have this new thing and we're trying to see if it's secure or not. Like, how do you even help people and potential clients who don't even know if what they're doing makes any sense yet, and if they need to secure Yeah.
Tom Smith:Well, I mean You know, there's the old sort of look for there's the old saying about, you know, a bunch of blind men in a room with an elephant, and everybody can feel a little piece of it. Like, at a sports, this feels like lag or this feels like the you know? But no way you can really get their arms around the whole thing and understand it. That's kinda how I think AI security can feel right now. I mean, especially, probably for me personally, more so the year ago perhaps.
Tom Smith:But I mean, you know, so that's a normal thing. I mean, we're kind of the phase where everybody's figuring it out and, you know, the experts are not expert yet, and the newbies are really new, and the people who are intermediate levels of knowledge are really rookies, you know. So I guess this is why it's saying that knowledge of of how this stuff works is not widespread. And and so consequently, everybody kinda feels that way. So so to really answer your question, Jason, like, do we figure it out?
Tom Smith:What we can do for people who come to us asking for help is I mean, it takes a while a lot of times. People people come to us and say, like, hey, we have this new AI tool that we're gonna be rolling out, and, you know, it it you know, we'd like you guys to try to pen test your word. And so it oftentimes it takes it takes a lot it takes, you know, probably five times as long to figure out exactly what it is that we need to do for them, than it does for something that's, you know, sort of an industry standard known quantity. Like, if someone comes to us and says, okay, we need an external network pen test. Like, we know what that is.
Tom Smith:The client knows what that is. Everybody knows what that is. Right? Auditors know what that is. But somebody comes and say, hey, we did an AI test.
Tom Smith:Well, what does that really mean? You know? A lot of I mean, it's free. We get that request. And then it turns out that really what they're what what what's gonna deliver the level of comfort that the client needs is oftentimes gonna boil down to like, check the web application that we've created to operationalize this this AI tool that we have.
Tom Smith:Take a look at that. Look at our guardrails and see if you can get any hallucinations to happen in the chatbot feature. Right? And and it it really can be, you know, that simple a lot of times. So, I mean, you know, and that that I mean, that's probably the vast majority of the AI stuff that people bring to us.
Tom Smith:It's along those lines on the on the, you know, sort of offensive security side. William could probably tell you a little bit more about the defensive cybersecurity side, things we could do. But but yeah. That's that's so I don't know, Jason. That's a really roundabout way in answering your question.
Tom Smith:Yeah.
Ethan Robish:Got to
Tom Smith:what you wanted there.
Jason Blanchard:So if someone wanted to do business with us, what's the first step?
Tom Smith:I mean, just reach out. I mean, if somebody wants to reach out to us, like, for a pen test, general consulting work, or they just wanna, like, talk to somebody with more expertise in a particular area than they have, just reach out. They can email they can email me, you know, if you want directly. I don't care. My email's on the website.
Tom Smith:You can email you can hit the contact us form on the Blackfields infosec page, and then that'll alarm that they should email to, like, the team that I work on. And then we'll invite you to schedule a call, and then we'll get on the phone. You know? So I'm showing major.
Ethan Robish:We'll get on a a video chat, you know. We'll still email it a phone. Yeah.
Tom Smith:Yeah. I know. Yeah. Right. I know.
Ethan Robish:Yeah. It's I still take the
Tom Smith:I don't think of it as like a thing with a currently core coming out straight under
Derek Banks:There's actually a question in chat. Do want me to answer it? Yeah. Alright. So for an AI pen test, how do you handle consult customers about potential costs associated with fuzzing and increased token usage?
Derek Banks:So at like, during the scope like the, you know, the if if there was a question question about that, we would discuss it with you. We're not there's actually an OWASP top 10 about unbounded consumption and resource exhaustion. We're we're well aware that if we're going to do something automated against a a large language model that there is a cost associated with it depending on how things are set up. So we would definitely work with you to figure out. We're not gonna stick you with a $20,000 AWS bill and be like, oh, that sucks to be you.
Derek Banks:Like, yeah. No. That's not how that's gonna work at all. Like, if I was the one doing the the test, which would be maybe be likely anyway, like, we're gonna look at the architecture and and, like, if we thought that you had, like, a scenario where, like, hey, somebody could really spend a lot of money here if you're not careful, we would probably talk to you about that. So and what guardrails you should put in place to prevent something like that.
Jason Blanchard:I hadn't even thought of that. That someone it like, that's the you could just exhaust financial resources of a company by exploiting their AI.
Derek Banks:Yeah. Whatever. Unlike other open source things, like open source and AI, like, just gluing a bunch of open source together and and and doing a thing is not always the best idea in the AI world. Mhmm.
Jason Blanchard:Alright. So we do this work, we now let you know we do this work, and if you want to reach out to us to talk about what that work might look like, you know where to find us. Thank you for sticking around for this part. We're keeping it kinda brief today because we've already gone pretty long. Deb, what are your final thoughts though?
Deb Wigley:Final thoughts. We've thank you guys for hanging out and for showing up. Yesterday was crazy, so we're a little low energy today. But we just really appreciate you guys being here and listening to Ethan and Derek share their smarts. And be nice humans, be nice to each other, be kind, and help people that are just couple steps behind you.
Deb Wigley:Help them help them. Bring them on. Bring them bring them on. We love each other.
Ethan Robish:Or else the protector of happiness and kindness will
Jason Blanchard:fight you. Protector.
Deb Wigley:The defender of happy.
Ethan Robish:Neither happy nor kind.
Deb Wigley:But somehow still happy and fit.
Jason Blanchard:Yes. We're the the people that would do the work if you need the work, and we appreciate you being here. We'll see you all next time.
Derek Banks:Bye bye.
Jason Blanchard:Ryan. Hi. Kill it, Ryan. And then you can just Fire. You're done, Ryan.
Jason Blanchard:Have to
Derek Banks:I have.
Deb Wigley:Done. You're you're done. You've done a lot.