{
  "version": "1.0.0",
  "segments": [
    {
      "speaker": "Brian Fehrman",
      "startTime": "0.08",
      "endTime": "27.064999",
      "body": "Hey, everyone, and welcome to this week's episode of AI Security Ops. So AI security tools are amazing. That is until your monthly bill shows up and you find out that your AI AI triage assistant is suddenly your third highest paid analyst on your team. Why is AI and security uniquely expensive? Well, we have some techniques that can potentially save you some money when you're using AI in your buy in your environment."
    },
    {
      "speaker": "Brian Fehrman",
      "startTime": "27.064999",
      "endTime": "58.325",
      "body": "But first, let's talk about Black Hills information security, who brings you this episode. If you or your company are in need of any security service that you can think of, external, internal testing, web apps, physical pen testing, social engineering, mobile apps, embedded testing, wireless testing, whatever kind of services you might need. We also offer a security operation center type services, threat hunting services. So, whatever you might need, reach out to us, and we can probably help you out. Blackhillsinfosec.com."
    },
    {
      "speaker": "Brian Fehrman",
      "startTime": "58.7",
      "endTime": "97.29",
      "body": "Additionally, we have a training branch at antisiphontraining.com where our analysts and practitioners who are doing these tasks day in and day out, take the time to package up their knowledge in a very digestible and affordable form before you to consume and hopefully, help you out in your, current work role or help you work towards that work role that you are dreaming of. So check them out at antisyphanttraining.com. So let's get down to business here, gentlemen. I am joined today by Derek Banks and Ethan Robish. Hey, guys."
    },
    {
      "speaker": "Brian Fehrman",
      "startTime": "97.85",
      "endTime": "116.170006",
      "body": "Hey. How's it going? Yeah. So let's talk about some of the ways that people can save money, when they're trying to use AI, for, you know, whatever work tasks they they might have. You know, the a lot of you know, a lot of people now are jumping on the bandwagon."
    },
    {
      "speaker": "Brian Fehrman",
      "startTime": "116.25",
      "endTime": "153.905",
      "body": "They're integrating AI into daily flows. They're trying to automate a lot of a lot of different tasks with AI. But there are a couple, I think, areas that that people can try to save on cost because it's easy to run up a giant bill if you're just throwing a bunch of data into AI, without really thinking about, how you can potentially, you know, minimize minimize the impact on your budget. So let's talk about one of the first ones right off the bat, the right size model. What are your guys' thoughts on this of of what kind of model you should use and and when and how it kinda makes a difference."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "156.14499",
      "endTime": "166.79001",
      "body": "Oh, I I have some thoughts. This is very, very timely topic. So this is literally I mean, in in our notes, it talks about, like, the sock. Right? It's literally what I'm working on."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "167.03",
      "endTime": "169.75",
      "body": "That's why I was late to this podcast. Right?"
    },
    {
      "speaker": "Derek Banks",
      "startTime": "169.75",
      "endTime": "171.51",
      "body": "That's why we waited on you, Ethan."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "171.51",
      "endTime": "200.77",
      "body": "Yeah. So, yeah, some of the points being, hey, suck gets a lot of alerts. You're running the model, like, very frequently, so you can't you can't just throw, you know, the the biggest tool in your in your box at something that's happening, like, thousands or hundreds or tens of thousands, what whatever. Like, tons of times because it's gonna run your bill up. Right?"
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "201.65",
      "endTime": "225.73",
      "body": "So approaches. I I was toying with, hey. Stay till the end, and I'll give you one bonus tip, but I I don't think I can hold it in. So here's here's up front. Take take what you are take what you would do, your your process, and use the AI agent to help you code that process, like, turn it into code."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "225.73",
      "endTime": "274.885",
      "body": "I know we've talked about this in on the podcast before, but, like, a lot of what we're doing at in in the SOC is, deterministic, like, grouping various alerts together. So the the key of dealing with tens of thousands of alerts is you don't deal with tens of thousands of alerts with AI. You you you group them all together into into cases, and then, like, once you have enough signal and you've got a case compiled, that's that's where we're bringing in AI. So we're getting, like, tens of cases instead of hundreds or thousands of things. So so even then, getting back onto the topic of right size the model, so I'm I'm using I I have the option of, you know, several different either OpenAI or or Anthropic models."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "275.24",
      "endTime": "307.51",
      "body": "So we'll just focus on Anthropic because they've got neat little, like, Haiku is the lightest model, and then you got Sonnet in the middle, and then Opus is, like, the best, the top of the line. So for the application of, like, hey, do do an initial pass on this case, like, this case just got created, like, do a do a quick triage. I'm absolutely going with Haiku. And I actually started with Haiku. I know one of one of the strategies we we talk about is, like, hey."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "307.51",
      "endTime": "324.625",
      "body": "Maybe, run run your models side by side. Like, try the same prompt or the same strategies and see, like, which one performs better and see if it's acceptable to to do the the lower model. I took a little different approach. I mean, that that approach is totally fine. So the approach I took was, hey."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "324.625",
      "endTime": "345.785",
      "body": "I I I need to use Haiku for this. I know I need to use it, so I need to make it work. And I've gone through, like, tons of iterations on, you know, what I I I think the biggest lever I had to pull was the context. What context does it have? But I'm gonna tell you how I how I kind of approach the the project."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "345.785",
      "endTime": "371.765",
      "body": "So, basically, we I took a bunch of past cases. I took all the details from those cases. I took, know, you what was the result? It's it's kind of like a classic, data science or machine learning type problem. Like, you you get your labeled data, and then I I took that, and I I'm working with, like, Claude Opis this whole time, like, designing this this experiment, we'll say."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "371.925",
      "endTime": "420.53",
      "body": "And how how we have it set up is we're using so Claude dash p to, like, send in a custom prompt so that we're we've got a triage prompt that we're testing, and we send in the context and the the prompts and we've got the output that we want specified in the in the coming coming out. So that that goes into Haiku. So we send in input, get output, and then we've gotta judge. So I I pit another model, the smarter model, OPUS model, to judge, like, okay. Based on this input and this output and our expected results, which, you know, the the Haiku model doesn't know, the triage model doesn't know what the expected result is, did it did it come to the right conclusion?"
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "421.25",
      "endTime": "449.65",
      "body": "And then I've I've run that across a ton of cases and, you know, stop stop at different points to tweak the prompt, like and work with work with Opus to be like, hey, what are the failure modes here? Like, what patterns are we seeing? What is it what is it constantly failing on to to to tweak the prompt, or is this a context issue? Like, it's could it even have, like, come to the right answer with the data that we gave it? And if not, like, okay."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "449.65",
      "endTime": "460.165",
      "body": "Well, let's go back to the drawing board. Like, how can we get this data? So that was a very long answer, but it's just exactly what I'm working on right now. So I wanted to to to share it. So"
    },
    {
      "speaker": "Derek Banks",
      "startTime": "460.64502",
      "endTime": "474.69998",
      "body": "Yeah. I think that also, like, it depends on how deep the the download the rabbit hole of coding and customization that you're going. Right? Because like one of the examples that's in the show notes is like a classification problem. Like, is this fishing?"
    },
    {
      "speaker": "Derek Banks",
      "startTime": "474.94",
      "endTime": "489.005",
      "body": "A small fast model is fine. In fact, you don't even really need a transformer to detect fishing. Like, this is something that's like near and dear to my heart. I've done a lot of detecting phishing. Now, in albeit in, you know, curated data sets that are pre labeled."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "489.005",
      "endTime": "509.4",
      "body": "Right? And so, you know, you can use a a and you can also train a transformer to be a binary, you know, phishing detector. Is this phishing or not? And so the same thing that can be OPUS 4.7 can also be a small the same technology can be a small classification model. It's all in how you like train the neural network."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "509.71997",
      "endTime": "540.135",
      "body": "And so yeah, it's right that a small model is fine for that, but it depends on your ecosystem. Right? Because in in like what Ethan is describing, the ecosystem of quad doesn't offer, at least to my knowledge, like a small classification model. So you'd have to like, and this is where I think like the academic and like the practical, there's not a good like path all the way like between those sometimes. And so, yeah, if your classification if you have a classification problem, there's cheap ways of dealing with it."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "540.135",
      "endTime": "557.29004",
      "body": "But it depends on your capabilities. Right? And so and, you know, I I think that it's a great example. But I think that for most of us, practically speaking, it depends on your inference provider and like how you're set up to do. Now, I'm on the other side of the fence than Ethan these days, which is kind of fun."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "557.45",
      "endTime": "584.11",
      "body": "Because Ethan is the first like pen tester that we had like at a job that I used to work at a long time ago when I was at the SOC and Ethan came in as we hired Block Hills as a penetration tester. Ethan and I have known each other for a long time. Right? And now, like, he's working on the SOC here and I'm doing more like offensive work. Well, I've been doing a lot of external penetration testing type things with AI agents, like agentic external penetration testing."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "584.11",
      "endTime": "624.19",
      "body": "When I was I started using Opus four point I think it was six at the time, right when they came out with the million context window on Bedrock, and it was really expensive. Like it was costing, you know, up to, like, 500 and even a thousand dollars, like, per run for the some of our, like, larger customers that I was testing with. And so I did exactly what Ethan did. I went and looked for a a cheaper model. And now, you know, using a combination of of models, using some open weight models, and then, like Ethan was saying, for the harder task, the heavier lift, use the the the bigger model, the right size model."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "624.19",
      "endTime": "645.05",
      "body": "So if we're making like, I have some judges in there for like hallucinations and stuff, that's using Opus rather than g l m five one, which is like 90% cheaper than Opus. So we went to from, you know, $500 a run to $50 a run. And that way, I don't cost John Strand $10 each month in bedrock inference. So"
    },
    {
      "speaker": "Brian Fehrman",
      "startTime": "647.69",
      "endTime": "675.46",
      "body": "Yeah. And and just bringing up one of the one of the earlier points from Ethan too before we move on to the the next portion of talking about, the deterministic, you know, portions and and having a, basically output code where it can, for there are multiple benefits to that beyond just the cost benefit. You have repeatability, which is always great. Like, you know that you run it multiple times in a row. You're going to get the same result with the same input."
    },
    {
      "speaker": "Brian Fehrman",
      "startTime": "675.7",
      "endTime": "688.46497",
      "body": "You don't have to worry about it going off the rails, hallucinations, or, you know, it's you get good repeatable results as as as much as you can throughout the pipeline, then let the AI do the AI stuff."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "688.46497",
      "endTime": "716.435",
      "body": "I mean, I think that's just, like, good general advice for working with coding agent or working with agentic stuff insecurity in general. Right? I gave that exact advice to two different people this morning before this podcast, and kind of the same way as like what I would do is instead of making like it determine what you're doing and figure out because it can totally go and figure it out. Right? Instead, have it write a script that will go do the thing, and then have it run the script, and then make improvements as you go."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "716.435",
      "endTime": "722.27496",
      "body": "Right? Then If you're And then also copy the script off and put it in GitLab, please. Sure."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "723.15497",
      "endTime": "730.85004",
      "body": "Yeah. If your goal, your process is deterministic and you want, like, repeatable results"
    },
    {
      "speaker": "Derek Banks",
      "startTime": "731.17",
      "endTime": "733.25",
      "body": "Which is like 95% of the time."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "733.41003",
      "endTime": "763.555",
      "body": "Yeah. Then AI couldn't you you can use AI to try to do that, but you're not gonna get the 100% repeatable results, which is the whole point of of AI. So yeah. Like, use the right tool for the job, just like use the right model for the job. The other thing that I'm doing is scoping, like, what the prompt what the agent is trying to do very tightly rather than say, oh, hey."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "763.555",
      "endTime": "783.43",
      "body": "Do all this stuff, like, altogether and then try to fit that Yeah. Into a small model. Like, a large model, you know, the top of the line frontier models, like, they they might be able to handle something like that, but also you're gonna pay for it versus, hey, let's just call a smaller model to do this task, and then this task, and then this task, instead of a larger model to try to do all of them."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "783.43",
      "endTime": "797.685",
      "body": "I actually just fixed a bug in my code that was essentially doing that for a portion of it. I was like, why is this inconsistent? And when I dug into it, it was because I was essentially letting the model decide. And I now there's more framework around what the model should decide. So"
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "798.245",
      "endTime": "826.08496",
      "body": "Another another benefit of deterministic, like, outputting code is you can write tests for code, and those tests aren't gonna be flaky because it's deterministic. So you can you can try to write tests and guardrails and have other judge models, like, around your AI project, your your engineering. Yeah. But it's it's it's not always gonna work because it can just, you know, sometimes decide to do something else."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "826.805",
      "endTime": "838.89996",
      "body": "That actually kinda leads into the next one. Right, Brian? I mean, the next one is tip. Well, it's probably tip number four or five at this point, but tip number two in the show notes is prompt caching. Prompt caching is important."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "838.89996",
      "endTime": "844.01996",
      "body": "It's a big lever you can pull. And so do you want do you wanna explain what prompt caching is?"
    },
    {
      "speaker": "Brian Fehrman",
      "startTime": "845.375",
      "endTime": "875.925",
      "body": "Yeah. So prompt caching is the idea that oftentimes when you are performing tasks, you likely are using the same if if you're alright. Let me take a step back. Oftentimes, if you're if you're gonna be performing tasks in something like a sock or, pen testing or you're using, you've put together a suite of AI tools to perform different tasks, there's likely prompts that you are reusing throughout that process. You're just changing the context that gets sent in with those prompts."
    },
    {
      "speaker": "Brian Fehrman",
      "startTime": "875.925",
      "endTime": "886.165",
      "body": "So The system prompt. Yeah. The system prompt. Yeah. So, you know, things like rules on how to analyze, a a potential issue and generate an alert for it."
    },
    {
      "speaker": "Brian Fehrman",
      "startTime": "886.29",
      "endTime": "915.32",
      "body": "Those instructions for doing that are likely going to remain the same. It's just the context or data that you're sending in along with that that changes. And so the idea of prompt caching is rather than having the model, basically, see your prompt for the first time every time you send it in, it can cache that, on on the back end, and it it does two things. One, it reduces lag time because it's already cached. It's already done a certain level processing upfront."
    },
    {
      "speaker": "Brian Fehrman",
      "startTime": "915.48004",
      "endTime": "948.94006",
      "body": "But in the spirit of this episode, it can also save you a lot of money. It's somewhere around potentially, like, 90% of cost in terms of processing that section of text. Now any of the new text you send in, obviously, that has to get processed as it were seen for the first time because it because it basically is. But you're at least I mean, you know, you run you run these, these queries, over enough enough times over a length of time, every little bit adds up, especially if you're using the higher power models for a lot of this stuff. So I think that's the idea."
    },
    {
      "speaker": "Brian Fehrman",
      "startTime": "950.06006",
      "endTime": "950.54004",
      "body": "Think"
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "950.54004",
      "endTime": "974.11",
      "body": "I think there's a key key point here that I wanna stress or call out. So when we think about, like, request caching in, like, the web world, we think, oh, you make the same request or databases. Like, you just make the same query. The results are cached so they can be returned faster, but that only works when it's, like, the same. But here with with prompt caching, it's actually like a spectrum."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "974.11",
      "endTime": "982.75",
      "body": "Right? It's like from the beginning, how much have I seen before? Oh, here, this is the first new token. Okay. We can cache everything up till here."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "982.75",
      "endTime": "1006.26",
      "body": "So changing your data later, it doesn't bust the entire cache. Like, it it will bust it up until or, like, until until the point where it changes. So so that's why you can actually take advantage of this rather than thinking, oh, well, I never said in the same prompt to the AI. Like, how could I even get any benefit of caching? Like, I'm it's always doing something different."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "1006.66",
      "endTime": "1028.0399",
      "body": "But the the point is, like and and harnesses take advantage of this too. They they put, like, the system prompt that doesn't change up front. They put the tool calls that don't change up front not the calls, but the tool definitions, like, the skills that those are all loaded in the order of, like, this is least likely to change so that it can take advantage of the the processing, the the caching. Yeah."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1028.12",
      "endTime": "1056.78",
      "body": "And and if that sounds like foreign to you, come take my class on how large language models like work, or Brian and I's class in in August or sorry, October of this year. Because knowing how these things work when you're engineering solutions around them is very important. Like, why would I wanna because, like, why would that even matter? It's because how LLMs, like like, how they process token by token, and, like, how they predict things. And so your explanation, and I think was spot on there, Ethan."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "1057.1",
      "endTime": "1081.6649",
      "body": "Thanks. I actually made this mistake not too long ago, when I was using I I was sending a prompt, so it had a system prompt and a user prompt to the agent. And I thought when I first started that, oh, it would be a really good idea to customize the system prompt. So I put a variable in the system prompt, like, oh, hey. This is a you're you're looking at this type of entity, and he oh, here's the host name."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "1081.985",
      "endTime": "1099.205",
      "body": "Just as as the example, I inserted the actual host name into the example that it was supposed to use. And I realized later, oh, I am busting the cache every single time doing this. Maybe I better stop doing that and rework my prompts so it can just stay static and yeah."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1100.165",
      "endTime": "1126.1951",
      "body": "Well, that's why and I've said this many times, when you get into coding around large language models and trying to solve, like, problems and do, you know, especially, like, enterprise level problems, I've not found a bottom to where this all ends. Like, I learn new stuff every day, and it's really fun. Alright. So the next one, tip seven or three. Batch API for anything that's not urgent."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1126.1951",
      "endTime": "1148.4299",
      "body": "And this is actually one that I've not personally used. I haven't tried batch processing, but the idea is is that you you have you know, you want to get output for your input. And so, like, with the quad's API, you can, like, batch stuff, and basically, it'll work on it overnight. So that way, outside of peak hours, they give you a reduced rate on the inference. Right?"
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1148.43",
      "endTime": "1149.07",
      "body": "So"
    },
    {
      "speaker": "Brian Fehrman",
      "startTime": "1149.795",
      "endTime": "1161.2351",
      "body": "Yeah. Like, some I'm assuming similar to, like, electric companies that, they might charge a different amount for when everyone's home with the air conditioners firing versus, you know, during the workday or whatever."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "1161.75",
      "endTime": "1166.07",
      "body": "Wait. Are you saying that AI access is is a utility, Brian?"
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1167.11",
      "endTime": "1194.9199",
      "body": "It won't be long until we look at it that way, I think. Know, you know, just as a side note, you my my dad lives in a pretty rural area. He just recently got, like, fiber, like, broadband. And up until this point, he was using LTE, and he had, you know, gigabit gigabyte caps and, you know, data caps and stuff. And so basically, that whole county is was in, like, they had a a municipal effort, like a, you know, a government effort to get broadband everywhere."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1194.9199",
      "endTime": "1214.405",
      "body": "And so I wonder how long it will be. I mean, this is really like an education thing. Like, the kids that were in the in the county, like, were really suffering with not having high speed Internet access in the modern world. Right? And and so I wonder how long it's gonna be until we look at AI and inference as the same thing."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1215.125",
      "endTime": "1223.8201",
      "body": "Right? Probably not very long. Alright. Here's, you know, tip number four or eight or whatever"
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "1223.8201",
      "endTime": "1226.06",
      "body": "number You're we're you're counting like an LLM."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1226.06",
      "endTime": "1230.4601",
      "body": "Yeah. Know. Right? Well, LMs are very bad at arithmetic. That's just sort of how they work."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1230.4601",
      "endTime": "1234.545",
      "body": "Like, they're good at math if it's word math. Right?"
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "1234.545",
      "endTime": "1235.5851",
      "body": "But Logic."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1235.665",
      "endTime": "1243.505",
      "body": "Logic. Yeah. Not so good at the actual arithmetic. You would want to this I've given this advice too. How come the LLM can't count?"
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1243.505",
      "endTime": "1257.69",
      "body": "Well, what you need to do is get it to write script account, then it'll count fine every single time. Right? Like, that's what you do. But this tip is stop dumping entire logs or entire sets of data and curate your data first. Right?"
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1257.69",
      "endTime": "1269.285",
      "body": "And so basically, here, the idea is is that you don't want to put the entire log in, you want to put in stuff that's relevant, which maybe is like security advice from the past anyway."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "1270.82",
      "endTime": "1299.5399",
      "body": "I would yeah. I would say in addition to cutting costs, it also improves accuracy. Like, if you tailor your context to what is actually needed, and that goes back to my my testing of, like, hey, what context is actually needed to get the the result I want? That it makes it makes a huge difference. Like, you can you can fit more relevant, like, denser data in if you're if you're tailoring it, and you're not just, you know, dumping everything."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "1299.5399",
      "endTime": "1311.4249",
      "body": "And then it it depicts on some irrelevant detail and anchors on that, and then your result is just, like, way out of whack because you gave it too much and it got confused. Confused."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1311.4249",
      "endTime": "1325.025",
      "body": "Yeah. It's like an inverse pyramid of pain, where you have like the raw data and and and that's coming in, and then like 95% of your data is like just stuff you've seen before. Like, we know this is fine. Right? Like, this is a log entry we don't care about."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1325.025",
      "endTime": "1342.0349",
      "body": "So those 95% of it's gone. Right? And then you take that other, like, 4% and you apply, you know, machine learning statistics, AI, or whatever. And then what comes out of that, a human applies domain knowledge to find the point o 1% that's malicious. Right?"
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "1342.355",
      "endTime": "1368.8501",
      "body": "Speaking speaking of human knowledge, domain knowledge, if if you're struggling to figure out, like, what context should I put in, what should I give it, think about what information you would need to solve this problem. Like, whatever problem you you wanting the AI to to do to whatever process you want it to take, what what would you need? What would a human need to to do that? Because the AI is gonna need the same information, and you you could even talk with it. Like, hey."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "1368.8501",
      "endTime": "1384.98",
      "body": "What what are I missing? Like, what here's what I'm thinking for what is required to to accomplish this. Like, what don't you need out of this, and what am I missing? Like, talk to it. Talk talk about how to design the the workflow that it's gonna follow."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1385.62",
      "endTime": "1387.94",
      "body": "That's Yeah. Surprisingly successful."
    },
    {
      "speaker": "Brian Fehrman",
      "startTime": "1388.42",
      "endTime": "1402.255",
      "body": "Yeah. Well, yeah. I mean, it's, I mean, same concepts as with, like, traditional machine learning or traditional classifiers, if you will. Right? Like, more features isn't always necessarily better, and typically one of the steps is determining feature importance."
    },
    {
      "speaker": "Brian Fehrman",
      "startTime": "1402.255",
      "endTime": "1416.76",
      "body": "So you got a bunch of data with a bunch of features. You can you run it through your classifier. And then what With things like random forest classifier, it's built right in. With others, there's methods to pull out the same information, but you can get a measure of feature importance. Like, okay."
    },
    {
      "speaker": "Brian Fehrman",
      "startTime": "1417.0",
      "endTime": "1428.965",
      "body": "You a robot you arrived at this decision. Which of these features were actually relevant to arriving at that decision? Okay. Out of these thousand features, these 50 were important. I can get rid of the other 950 because they're not doing me any good."
    },
    {
      "speaker": "Brian Fehrman",
      "startTime": "1428.965",
      "endTime": "1434.6449",
      "body": "And, to Ethan's point on accuracy, it might even be making things worse. And in this case, they're just costing you extra money."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1434.6449",
      "endTime": "1442.5399",
      "body": "Oh, man. I'm having flashbacks to test the statistics five zero one in running relevancy tests and such. Oh, god."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "1443.02",
      "endTime": "1459.205",
      "body": "So the the awesome thing with LLMs is if you don't understand anything that Brian just said, just pick pick out some words and ask the LLM, hey, I I want I want feature feature extraction, feature importance, like, for this problem. Help me help help me with that."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1461.925",
      "endTime": "1479.13",
      "body": "Well, there's a lot of tips in here. So the next tip is rag instead of stuffing. I've actually not used rag with agents very much, but I get the the idea behind this is don't waste the whole library when you can just get the relevant page. Yeah. I"
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "1479.13",
      "endTime": "1501.21",
      "body": "would say this is pretty much like an extension of the last tip where Yeah. Designing the context rather than and, like, being very specific versus giving it everything. But just talk talk instead of, like, raw data, raw events, raw logs, it it's talking about, like, reference material, like, documentation."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1503.6101",
      "endTime": "1512.935",
      "body": "Alright. Then the next we've actually talked about this one already, and that's dedup and cache the responses. Your SOC sees the same alert pattern over and over. Don't ask the AI every single time. Yeah."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1512.935",
      "endTime": "1521.895",
      "body": "I mean, if you know that this is not likely going to be malicious, don't send it to the AI, or if it's not relevant to the context rather."
    },
    {
      "speaker": "Brian Fehrman",
      "startTime": "1525.42",
      "endTime": "1538.6749",
      "body": "Yeah. And I think that bring us to the final point here, which isn't really not necessarily cost saving tips so much as a safeguard, so much as a, like, help you stay solvent type of stuff."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1539.0751",
      "endTime": "1542.355",
      "body": "Not get angry phone calls from the people who sign your checks."
    },
    {
      "speaker": "Brian Fehrman",
      "startTime": "1542.675",
      "endTime": "1578.485",
      "body": "Yeah. Which is, setting, basically setting budgets and circuit breakers. So, you know, I know we know know that Bedrock has the ability. I'm sure, like, Azure and probably others have the ability to, set a cap on how much you're spending per month, which is a good idea, to to put into place. Because if if you're not careful, especially if you have a large company or you're running a lot of complex tasks and not taking advantage of some of the tips that we mentioned earlier, I it's very realistic to run up a eye popping, bill at the end of the month."
    },
    {
      "speaker": "Brian Fehrman",
      "startTime": "1578.9401",
      "endTime": "1590.0599",
      "body": "So putting in safeguards of saying that, no, like, we are not going to spend more than this amount of money, cut it off. If we hit this limit is a good idea. And if you reach that limit, then you can"
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1590.3",
      "endTime": "1604.945",
      "body": "reconsider Yeah. Right? I mean, you're then you and it's also good, you know, in the unfortunate circumstance is if the key somehow got leaked or stolen. Right? Which is definitely something that happens."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1604.9451",
      "endTime": "1614.6",
      "body": "Right? And, you know, we all like to think, oh, it'll never happen to me. Well, I don't know. Right? I think Light LLM would have something to to say about that."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1614.6",
      "endTime": "1614.92",
      "body": "Right?"
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "1614.92",
      "endTime": "1627.015",
      "body": "Few few episodes ago, we talked about, like, shadow shadow API brokers for the AI. And, I mean, I would not be surprised if a lot of those are being served by stolen keys."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1627.015",
      "endTime": "1637.51",
      "body": "Yeah. And and so I I think that, you know, setting caps and limits, like like, that's a necessity. Like, it's an operational necessity. Yep. It's what we do."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "1637.91",
      "endTime": "1638.95",
      "body": "Yeah. For sure. Absolutely."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1643.1901",
      "endTime": "1657.095",
      "body": "So rapid fire recap. Right size model prompt caching, batch API, preprocess your logs, rag over stuffing data into the LLM, cache verdicts, and instrument and budget."
    },
    {
      "speaker": "Brian Fehrman",
      "startTime": "1658.615",
      "endTime": "1661.1749",
      "body": "Yes, sir. Get Bonus tip. And"
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1661.575",
      "endTime": "1662.455",
      "body": "a bonus tip."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "1662.455",
      "endTime": "1674.49",
      "body": "Bonus tip. Convert your processes to, like, deterministic code. Like, have AI write the code, and then you pay it once, and then you just you don't have to pay the AI inference again."
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1675.05",
      "endTime": "1706.49",
      "body": "Yeah. I think that, in general, I try and limit the AI processing to things that I would have expected a human to do. Like, you know, investigate this and report back what you find kind of thing. It's and everything else really I mean, finding that, you know, spot might take trial and error. But I think that that, you know, 95% of your harness or your code should be deterministic, and only like five percent of it should be AI, I think."
    },
    {
      "speaker": "Ethan Robish",
      "startTime": "1706.49",
      "endTime": "1727.8899",
      "body": "And there's there's nothing saying you can't start with a broader, like, AI Oh, yeah. Solution. And then after you get some results and you get some data, you can make it a project to go look at and analyze that for patterns, and you can pull out those patterns and codify them, and reduce the amount of work that your overall general AI work, has to do."
    },
    {
      "speaker": "Brian Fehrman",
      "startTime": "1731.25",
      "endTime": "1748.515",
      "body": "Great. So, yeah, think these are all, great tips and points, and I hope that certainly help out some of the people out there, things that you can integrate into your daily workflow at your company to save save some money. So hope everyone enjoyed that, and see you next time. And what's the line, Derek?"
    },
    {
      "speaker": "Derek Banks",
      "startTime": "1748.595",
      "endTime": "1758.83",
      "body": "Keep on prompting. Keep on prompt. Now it might be more like, keep on context engineering or keep on harness creating"
    },
    {
      "speaker": "Brian Fehrman",
      "startTime": "1758.83",
      "endTime": "1761.5499",
      "body": "or Just keep on doing the things."
    }
  ]
}
