AI Security Ops

In this episode of BHIS Presents: AI Security Ops, the team looks at what it actually means to own your AI stack.

Open-weight models and open-source harnesses are no longer just lab toys. They are becoming practical options for security teams that care about where their prompts, code, client data, findings, and tooling actually live.

The core question: when your work depends on AI, how much control are you willing to give away?

We dig into:
- What data sovereignty means for security teams
- Why token sovereignty matters in agentic workflows
- How provider terms can become a business risk
- Open-weight models vs. truly open-source AI
- Why harnesses like Hermes and OpenCode matter
- Where cloud providers may apply fewer restrictions
- The tradeoff between local control and hosted capability
- Supply chain risk in models, harnesses, and plugins
- Running local models with Ollama, VLLM, and similar tools
- Why “local” does not automatically mean “safe”
- How to start experimenting without buying expensive hardware
- The next risk frontier: local prompt injection

Owning your AI stack does not magically eliminate risk. It moves the risk. Hosted models create exposure around data, terms, pricing, and availability. Local models create exposure around maintenance, supply chain, permissions, and prompt injection. The security win is not blindly choosing local or cloud — it is knowing which layer you need to control, and why.

⸻

📚 Key Concepts & Topics

Data & Terms Risk
- Prompts can contain code, client data, findings, and operational context
- Hosted providers may inspect, retain, or restrict usage
- Terms changes can affect entire security workflows
- “Allowed yesterday” does not guarantee “allowed tomorrow”

Token Sovereignty
- Agentic workflows burn far more tokens than simple chat
- Rate limits, usage windows, and pricing changes become operational dependencies
- Local hardware shifts the constraint from API quota to compute capacity
- Cost control is part of architecture, not just procurement

Models vs. Harnesses
- Open-weight models provide downloadable weights, not always full training transparency
- Harnesses provide the tool loop, permissions, memory, and provider adapters
- Hermes, OpenCode, Claude Code, Codex, and similar tools shape what the model can actually do
- Risk often lives in the harness around the model

Local Stack Tradeoffs
- Local models improve control over sensitive data
- Self-hosting adds maintenance, patching, networking, and monitoring responsibilities
- Tools like Ollama, VLLM, and Llama.cpp lower the barrier to experimentation
- Expensive hardware helps, but it is not required to start learning

Supply Chain & Prompt Injection
- Model weights, plugins, skills, and MCP servers are all supply chain decisions
- Local agents with shell access can turn prompt injection into local impact
- “No provider guardrails” means you own the safety controls
- Permissions, sandboxing, and audit logs matter more as the stack gets more autonomous

Practical Starting Point
- Pick one harness and go deep before chasing every new tool
- Test real tasks, not toy demos
- Compare hosted and local workflows honestly
- Decide which layers you need to own before you need an emergency exit

#AISecurity #LLMSecurity #CyberSecurity #ArtificialIntelligence #OpenSourceAI #LocalLLM #AIAgents #SecOps #InfoSec #BHIS #AppSec #PromptInjection #SecurityArchitecture

----------------------------------------------------------------------------------------------
About Brian Fehrman - https://www.blackhillsinfosec.com/team/brian-fehrman/
About Bronwen Aker - https://www.blackhillsinfosec.com/team/bronwen-aker/
About Derek Banks - https://www.blackhillsinfosec.com/team/derek-banks/
About Ethan Robish - https://www.blackhillsinfosec.com/team/ethan-robish/
About Ben Bowman - https://www.blackhillsinfosec.com/team/ben-bowman/

(00:00) - Intro: Owning Your AI Stack

(01:43) - Data Sovereignty, Token Sovereignty & Terms Risk

(03:38) - Provider Inspection, Prompt Data & Business Exposure

(08:09) - Where the Guardrails Live: Model, Harness, or API

(12:12) - Open Weights, Frontier Providers & the Innovation Race

(14:53) - Local Models, Open Harnesses & Real Hardware Tradeoffs

(24:24) - Self-Hosting Reality: VLLM, Ollama, VPNs & Maintenance

(31:25) - Getting Started: Pick a Harness and Run Real Tasks

Click here to watch this episode on YouTube.

Brought to you by:

Black Hills Information Security

https://www.blackhillsinfosec.com

Antisyphon Training

https://www.antisyphontraining.com/

Active Countermeasures

https://www.activecountermeasures.com

Wild West Hackin Fest

https://wildwesthackinfest.com

🔗 Register for FREE Infosec Webcasts, Anti-casts & Summits
https://poweredbybhis.com

Creators and Guests

Host

Bronwen Aker

Bronwen Aker is a BHIS Technical Editor who joined full-time in 2022 after years of contract work, bringing decades of web development and technical training experience to her roles in editing pentest reports, enhancing QA/QC processes, and improving public websites, and who enjoys sci-fi/fantasy, Animal Crossing, and dogs outside of work.

Host

Derek Banks

Derek is a BHIS Security Consultant, Penetration Tester, and Red Teamer with advanced degrees, industry certifications, and broad experience across forensics, incident response, monitoring, and offensive security, who enjoys learning from colleagues, helping clients improve their security, and spending his free time with family, fitness, and playing bass guitar.

Guest

Ethan Robish

Ethan Robish has worked with Black Hills Information Security (BHIS) since 2008 — first as an intern and then as a full-time Security Consultant starting in 2012. In his current role as a Threat Hunter, Ethan is involved with customer engagement, research, working with Active Countermeasures’ AC-Hunter, as well as improving BHIS HTOC and SOC offerings. Previously, he implemented defensive security solutions for the Exchange Online security team as a Microsoft intern. While in college, he competed in the International Collegiate Programming Competition (ICPC) World Finals. In his time off, he enjoys cooking, playing the piano, and reading fantasy novels.

What is AI Security Ops?

Join in on weekly podcasts that aim to illuminate how AI transforms cybersecurity—exploring emerging threats, tools, and trends—while equipping viewers with knowledge they can use practically (e.g., for secure coding or business risk mitigation).

Bronwen Aker: 00:00

Welcome to AI Security Ops, the podcast where we cut through the hype and explore the real world intersection of artificial intelligence and cybersecurity. Each week, we examine how AI is reshaping both sides of the security landscape, the threats we're facing, and the defenses that we're building. I'm Bronwen Aker, and today, I'm joined by Ethan Robish and Derek Banks. In this episode, we are going to look at local open weight models and open source agent harnesses like Hermes and OpenCode. And we ask what it really means to own your own AI stack.

Bronwen Aker: 00:46

We'll dig into data sovereignty, token sovereignty, and a quieter risk. What happens when the provider reading your prompts decides your line of work is no longer covered by its terms of service. Yeah. That's a biggie. This show is brought to you by Black Hills information security and anti siphon training.

Bronwen Aker: 01:11

BHIS helps you helps organizations like you identify and close real world security gaps through penetration testing, adversary emulation, purplety engagements, and managed detection and response. Anti Siphon delivers hands on practitioner led training built around real attacks and real tools so you can apply what you learn immediately. Learn more at blackhillsinfosec.com and antisiphontraining.com. Alright. Derek, you've been doing lots of experimentation with this stuff.

Bronwen Aker: 01:50

I'm gonna hand this off to you with a question. What is what is all this about? Data sovereignty, token sovereignty? Seems like there's a lot of sovereigns in this room.

Derek Banks: 02:02

Sovereignty. Sovereignty. It reminds me of, you know, my my past time of reading Sounds like some magic spell or something. Token sovereignty sounds like, you know, how to conserve your mana or something. Right?

Derek Banks: 02:15

Anyway, sorry, I'm a nerd. So data sovereignty, the definition would be your data. Right? Your prompts, context code, client data, findings, and a report. Like, anything that you might put into input into an AI model is data, and the sovereignty would refer to where that that data is, who has access to it, who who who owns it.

Derek Banks: 02:46

And so, you know, I think that, you know, I've been doing AI and security work now for a couple of years at the intersection of this. And, you know, it's kind of less than it used to be, but, you know, the general theme from security practitioners and hackers is, you know, you kids get off my lawn when it comes to sending data to, you know, third party providers such as OpenAI and Anthropic. Maybe it's a little less now, but I I I do know that there are agreements in place, you know, with, you know sorry, the privacy policy that's stated by specifically those two companies, you know, says that they won't use your data to to train models, but that doesn't mean that they're not inspecting your data and and and looking at it and and storing it at all. Right? So it's kinda one something that happened to me here for the last couple of days, kind of made me want to do this episode.

Derek Banks: 03:46

And then token sovereignty would be the same kind of idea, as we move into the agentic era, I mean, I guess we're already here, everything is basically in the AI world revolving around tokens and how much tokens cost. You might have heard the term token maxing, like, the subscription, like, with Quad or or or with Codex, basically means using all your allocated tokens in the window of time to get the most out of it. Right? And so, there's a we're in an era of GPU, therefore, token shortage. And so I think that people are starting to realize that this stuff gets expensive the more tokens that we use, especially if you're paying API rates.

Derek Banks: 04:30

And I think that it won't be long before before, you know, folks start to realize or that, you know, inference providers start to jack up the rates a little bit, because, you know, they've been subsidized for some some time now. However, Anthropic just posted its first profitable quarter. And so, at least that's what I heard. So I I think that, you know, our you know, can you what what computer do you have access to? Like, if you have these business processes that are relying on, you know, a a certain cost of token, what happens if that gets jacked up?

Derek Banks: 05:08

Right? So token shortages, rate limits, usage windows, token cost inflation, I mean, your capacities, your either your API and, you know, your provider or your hardware, which is a whole another ball of wax. Right? And so and and then the third one is something that just kinda happened to me. Right?

Derek Banks: 05:26

Where, I guess, you know, Ethan mentioned in chat earlier that Anthropic mentioned that they're gonna start tightening, like, cybersecurity use cases in terms of services. And just, you know, last couple of days, I've had difficulty doing the exact same thing that I've been doing, you know, coding wise on a project that is cybersecurity related. It's offensive tooling, and I was hitting a lot of errors that I was now violating terms of use on Opus four eight. And yes, I know there's the cybersecurity verification program. We're in it.

Derek Banks: 06:03

I mean, I I We're we're in it, and we're

Bronwen Aker: 06:05

still getting the alerts.

Derek Banks: 06:06

Yes. And then, we'll say mean, look, I'm not being super critical because surprisingly, Anthropic has responded to us this morning, which, like, I I was actually kinda surprised. They're like, well, if it happens again, just like, you know, here's what you do. And I was like, I didn't know that. And so but I mean, I you know, it's it's, you know, not just cybersecurity.

Derek Banks: 06:28

I mean, found was it last summer or last last spring, actually, I think. I think it was Anthropic also did that, like, in the medical world. Like, you can't give out medical advice to some degree. Like, you have to be licensed. And so anyway, I'm just saying that they're reading your prompts, and so, you know, they can decide, you know what?

Derek Banks: 06:50

We're gonna open up a our own, you know, hedge fund, and we're not gonna let you to use this for financial transactions or financial trading anymore. I'm being arbitrary. Right? But I mean, that's the risk is there. That's what we do in information security.

Derek Banks: 07:02

We point out risk, and it's your ability your your, you know, your responsibility to interpret that risk as it applies to you. And so and then Yeah. Go ahead, Ethan.

Ethan Robish: 07:15

Oh, that it just seems like the common theme between these that ties it all together is risk. Like, you you just said it. Like, data sovereignty, token sovereignty, terms exposure, are all different types of business risk. You know, with with the service you're using and relying on, with the data that your customers, like, expect to have a maybe a legal obligation to protect. And then just yeah.

Ethan Robish: 07:37

I I do have a question about the kind of guardrails that you're talking about, like, you hit with Anthropic Mhmm. Does how much of that do you think is baked into the training on the model? And how much is it, like, protections they put around their API? So for instance, if you try to do the same thing using Anthropic's API versus using the Anthropic models in Bedrock, like, do you think you would hit fewer restrictions in Bedrock?

Derek Banks: 08:11

That has been my experience that it's kinda hard to tell tell sometimes from the error message that you get. I think that in this case, the same thing works with SONNET or, like, with what I was doing. So I've I've actually hit the two things that happened to me yesterday. I was trying to do a merge request on offensive tooling and and get lab through clause like, hey, go ahead and do, you know, the request. Right?

Derek Banks: 08:42

And that's what triggered it. Okay? So there must have been something in the code I didn't like, I guess. Right? And then I tried to go back, you know, hit escape twice and go back and redo it, and it still triggered the same thing.

Derek Banks: 08:54

And then another thing, I was having it try and clean up a disc and download nuclei results, which I mean, come on. That's pretty benign. I'm just and then and I don't know, maybe data exfiltration, but I was downloading it. Like, I I don't know. It just these were things that I had been doing a lot, and they stopped working with OPUS four eight.

Derek Banks: 09:14

So I think that if I had to give percentages, I'd say 60% in the harness and the API, and 40% in the model.

Ethan Robish: 09:24

And and when you say harness, you're you're talking primarily, like, the back end harness, like, whenever they're running on the server side.

Derek Banks: 09:32

Like, you know, like, there were, I think, in terms of, like, the protections all around, I think quad code, like, built in the system prompt or certain protections. And it was actually

Ethan Robish: 09:40

Okay.

Derek Banks: 09:41

You know, what, late last fall where they actually put in pen authorized pen testing and CTFs and to claw at system prompt as an authorized behavior. Right? And then I think in inside of that, after that, there are probably protections or there should be protections on the API, Right? Looking at your prompts and the input. And as a side note, I bet their telemetry is amazing.

Derek Banks: 10:07

Like, what's coming in from everywhere. Talk about a big beta problem. Same with open API or OpenAI. But then I also there are also guardrails in the model. So I think there is like three at least three, like Yeah.

Derek Banks: 10:20

Defense in-depth points, I guess.

Ethan Robish: 10:23

It's it's interesting that they would go to the trouble of putting that on the the the client side, the Harness side, because it it's kind of the same thing you get with web application security. Like, you can put all the logic and permission checks you want in the client side JavaScript. None of it matters if you're attacking The

Bronwen Aker: 10:41

longest server too.

Derek Banks: 10:43

You have

Ethan Robish: 10:43

to enforce it on the server side to to

Derek Banks: 10:45

Your make it actually point, like, your question, you know, like, you know, do you, you know, do you think that, you know, where is it? And what Bronwen said earlier, like, I've been doing a lot of agentic work. I've been using Bedrock, and I found it like Bedrock is like less restrictive in terms of like it's just like raw API calls to a model, and Amazon doesn't seem to be putting much in the way.

Bronwen Aker: 11:09

So Well, if they did, malicious actors wouldn't be watching as many attacks from AWS systems. I

Derek Banks: 11:16

maybe. I gotta say that they got those attackers need to have some deep pockets because that crap ain't cheap. Right, John?

Bronwen Aker: 11:23

No, it ain't.

Ethan Robish: 11:26

So how do we how do we make it cheaper, Derek? I think you have the answer sitting on your desk, don't you?

Derek Banks: 11:33

Yeah. It's actually under my desk, but so two things. One, like yeah, I'm not knocking the frontier providers. This is all uncharted territory. And and, you know, I think you had mentioned also, again in chat earlier, that you thought maybe this was tightening of guardrails to try and for them to figure out how to release mythos level capabilities out Yeah.

Derek Banks: 11:56

Into, you know, the world. You know, and I think they've gotta do something, and now, you know, I get there's a, you know, executive order where the government is wanting to review models for thirty days before they're released to the public, which

Ethan Robish: 12:13

I mean That's crazy. I hadn't heard that.

Bronwen Aker: 12:16

Oh, Yeah. It's hits a new executive order,

Ethan Robish: 12:19

which I don't I don't

Bronwen Aker: 12:21

Originally wanted ninety days, they got talked down to thirty days.

Derek Banks: 12:28

But And the and the Frontier providers were like, we don't wanna do any of that. So no one's happy in thirty days. Right?

Ethan Robish: 12:34

What good is it? I mean, it's like that that meme of the guy, you know, doing this to, you know, not even touching people when they're supposed to be patting them down, like What do you

Derek Banks: 12:47

mean that there's probably nobody in the government that's qualified to actually do, like, an effective, like, evaluation of the capability?

Ethan Robish: 12:53

I mean, I wouldn't say I wouldn't say

Derek Banks: 12:55

that, but like I bet they don't work in the executive branch. Yeah. Well, maybe they do. I don't know. They're probably are, but I so to me, like, I like your analogy.

Derek Banks: 13:05

Mine's also was it the old like, the Dutch boy putting the finger in the the dike? That's what I think of. I was like, there is no way at this point. The cat is so far out of the bag Yeah. That the the That's It seems like most everything that we're gonna do is just gonna stifle US innovation.

Derek Banks: 13:20

Like, we're in the race. But

Ethan Robish: 13:22

Yeah. One one of the speaking of the race, like, most of the open models that get released are coming from China. And one of our points here, spoiler, is like, hey. To have true data sovereignty, you run a local model. You can only do that with open weight models.

Ethan Robish: 13:37

And one of the key differentiators of, like, open weight versus frontier is, like, the frontier models are better. They're more, like bigger for for sure. Like, unless you have access to the data center yourself. But that that but the argument the counterargument is like, the open weight models are just as good as the frontier models were like several months ago.

Derek Banks: 14:02

It's And It's true. Well, there's a little nuance to it. Right? Because like what I can run here locally on my right now, have Quinn three six twenty seven billion parameter model running with a 200 k context window because that's the context window size, I think, not two fifty six. Either way, there's a portion of it that's the system prompt for the harness and it takes up x amount, and then you get I think it's something like I I was getting like a 80 k or a 100 k, like, for the the agent.

Derek Banks: 14:30

Maybe I mean, that's not the math maybe, but so I've got, you know, a smaller like window, and also the model is smaller. But also, there's open weight models that you can run-in the cloud. And so Yeah. And since we just sold this, I guess it's like an offering. We now offer AgenTic AI pen tests at Black Hills.

Derek Banks: 14:47

Apparently, I do. And and I'm really excited about the platform, and and so basically, it's a a a custom built, you know, harness that runs using AWS, you know, Bedrock. And I use GLM five one for the heavy lifting. And the reason why is because it was specifically designed, again, a Chinese model for a long range agentic tasks. And in testing, when I was using it was Opus four six in the time when I was doing development of it.

Derek Banks: 15:18

And Opus four six is really expensive, and I racked up a pretty big AWS bill. But it was worth it because the platform, like early in testing, actually discovered a critical for a customer that our humans had missed. Right? And that's happened now, like, a couple of times. But there is also the converse that the the the AI is sometimes wrong, and the human has to correct it.

Derek Banks: 15:42

So but but anyway, when I started

Bronwen Aker: 15:44

Well, good hits and bad misses. We we deal with false positives all the time.

Derek Banks: 15:47

Oh, yeah. Exactly. And so but anyway, GLM five one, let's just say that it's 85% as capable as OPUS four six. And then that I was getting, like, basically the same results in testing, like the same findings consistently between the models. And it was and and GLM five one's like 90% cheaper.

Derek Banks: 16:08

Like, we went from it being like a thousand dollars a run to like a $100 a run. And I also then use Opis to come through and now like re rate and reval revalidate, and then do reporting because it's a larger context window. So I'm using an ensemble of of different models, and there's so many out there. Like, now I'm in a position where we could start testing. We could try it like a new Chemie model just came out.

Derek Banks: 16:31

But to your point, yes, the Chinese definitely have the open weighted model. Like, they're they're doing better, and I, you know, I want to believe in Gemma four that came out recently, but I'm having issues with it. And the GPT stuff that came out, I never really got to do anything useful with. But I will say that the Quinn so I've been using Hermes with the Spark, and I had it do

Ethan Robish: 16:59

Explain a the DJI Spark quick.

Derek Banks: 17:01

Oh, yeah. That's a good whatever. The g d g x Spark.

Ethan Robish: 17:05

DJI.

Derek Banks: 17:06

Yeah. So NVIDIA basically has a small form factor computer that has a GPU in it or an AI chip. I mean, it's not you're not playing games on the thing. It's a a Blackwell chip, I think, that has a 128 gig of RAM, a two terabyte disk. I forgot what CPUs in it.

Derek Banks: 17:27

It's basically built to be like a little mini desktop like supercomputer. Right? And a 128 gig RAM's pretty decent for running, you know, these edge, like, you know, open weight models. And so I put v l l m instead of o l m on it because it's faster, and I'm getting pretty decent results. I just started kind of down the road.

Derek Banks: 17:52

Our idea is as inference calls start to rise, can we effectively do, like, you know, everyday pen tester tasks from the DGX Spark? And the answer might be yes. But but then, you know, that GLM model, for comparison, that GLM five one model, I believe, is a 400 and some billion parameter model to run it. I actually just I'm writing up stuff now because now the question's coming from management at Black Hills is, well, what if we wanted to run the a that whole AI platform not in Bedrock, but here locally, what would we need? And the answer is 16 h two hundreds.

Derek Banks: 18:36

And Ouch. And and and two, like, clustered systems, eight in each system. And because we need about 1.5 terabytes of of GPU.

Bronwen Aker: 18:50

You realize D Rock is gonna kill you because now he's got a great year

Derek Banks: 18:54

to make you come up. D Rock's asked me who asked me to size it. So Hey, D Rock. It isn't me asking for it. I'm on the fence of which should we do it.

Derek Banks: 19:05

And that's with me, like, getting to be, you know, part of, like, making it. But there's a couple things you have to keep in mind. It's not just the it's not just the model itself. Right? It's also the context window or the k v cache.

Derek Banks: 19:18

Right? If you talk about, you like, the attention mechanism, that has to be in GPU two. So as you're going through this, you know, autoregressive loop and the LLM predicting the next token, you wanna cache what you've computed already in in in the attention mechanism, so you have to do it every single time. Right? And so that takes up GPU two.

Derek Banks: 19:40

And so to run it at the full context window, and then also run up to 10 tests, so like to 10, like, engagements a week. So we have anywhere from five to 10 externals. The calculation ended up being 1.5 terabytes of GPU, which is like 16 h two hundreds.

Ethan Robish: 20:00

So So I think you've covered several different levels here that we could tie back into our our different sovereignties. So if we talk about, like, data sovereignty, what one level is, hey, anthropic or OpenAI. Like, you're sending all your stuff to this AI company, which you have agreements with, but maybe you don't trust them, or maybe you just, you know, don't wanna have to trust them. So the next level would be, okay, take an open weight model, which you can host yourself, give if you have the hardware. If but if you don't have the hardware, you can host it on Bedrock, or what's the let's see a jury one, like, foundation or something.

Ethan Robish: 20:40

Foundry.

Derek Banks: 20:41

Well, and there are other providers too. Right? Like, NVIDIA.

Ethan Robish: 20:45

Yeah. Someone else's data centers.

Bronwen Aker: 20:46

Someone else's you're still hosting and sending data to a third party.

Ethan Robish: 20:52

You're shifting who you trust. You're shifting from the OpenAI's, the Antarp, the Frontier Models to whoever is hosting your data center. And then the third level is you bring your own hardware, and the DGX Spark is a way to to do that, to start doing real work. But I I think so data sovereignty, I mean, you're obviously shifting, but then also, like, the the terms exposure or the the limits the the rate limits and stuff. So with the frontier models, you've got your rate limits, and there's really no way around that.

Ethan Robish: 21:26

Especially if you're on a subscription, you got your five hour blocks and your your week blocks. If you start hosting that yourself in, like, the AWS data Bedrock, that that all goes away. Right, Derek? There's no, like, rate limits to speak of? No.

Ethan Robish: 21:41

Like, you just

Derek Banks: 21:42

So I think there are some rate limits, but I I call in, like, you know, eight concurrent agents at a time, 16 concurrent agents, and I haven't really hit it. I think there are some rate limits like per model, but so far, no, I have not had an issue. Okay. And which by the way, Amazon Bedrock does have a pretty clear and concise privacy privacy policy policy where where they they basically say we're not saving any of that data for anything.

Ethan Robish: 22:09

Yeah. They're just not

Derek Banks: 22:10

I mean, it's basically an API call to them. Right? Yeah.

Ethan Robish: 22:14

So I imagine they might have some restrictions on terms of use because AWS, in general, just, you know, has terms of use. So if you're doing obviously, if you're doing completely illegal things, you're on, you know, Silk Road or whatever and just or you're spitting up hacking campaigns. Although, problem to your point earlier, maybe people are doing that anyway and getting away with it, but presumably, it's against Amazon's terms of use. And so to get around, like, if you're doing something shady or maybe contextually okay, morally, arguably okay, but companies don't wanna be a part of it, then you you're kinda forced to shift it even further to your own hardware.

Derek Banks: 23:01

Yeah. I mean, even with Amazon, there's nothing, you know, with it. Like, they it's just a company with the terms of use that's theoretically possible for them to change their terms and be able to now take a cornerstone of, like, what's gonna be, you know, like, our future business away. Now, are they gonna do it? I doubt it.

Derek Banks: 23:16

We're not doing anything illegal. Right? Yeah. So but I mean, the possibility exists. Again, like with looking at risk, there is the risk is not non zero.

Derek Banks: 23:26

Just like the risk of using a Chinese, like, open weight model with a coding, you know, a coding harness like OpenCode or Call and Code or Codex or whatever, you're using a Chinese open weight model. The the possibility exists that it could locally, like, code a backdoor and do something. Now, is that going to happen? You know, practically speaking, no. But it's not like, the the risk is not non zero.

Derek Banks: 23:53

I mean, it's but

Ethan Robish: 23:55

I I Same risk would be present in frontier models as well. It's just Yeah. You which which country do you trust more?

Derek Banks: 24:03

Which

Ethan Robish: 24:04

which companies and where they which country they reside in, I guess, you trust more?

Derek Banks: 24:09

But I mean, that's going down oh, sorry. Go ahead, Bronwen.

Bronwen Aker: 24:12

No. Well, there there's also the fact that I know in the reports, I've seen more and more supply chain attacks seem to be happening. I mean, supply chain has always been an issue. But now it's like, okay. If I'm using Anthropic, well, Anthropic is a big target.

Bronwen Aker: 24:35

They've already had source code leaked. I'm sorry. I'm just I'm babbling here. But the the the whole idea is that even if if we're trusting the the third party frontier models, their targets, they might be attacked directly. We go with a hosting service as the same kind of thing.

Bronwen Aker: 24:58

So for the truly paranoid, local may be the only way to go. The problem is that the more you're doing yourself, the more maintenance and upkeep. How much work are you doing, Derek, to to develop these new tools that are running locally on that spark?

Derek Banks: 25:18

Wait. Are you asking me, like, how much, like, work have I done, like, to get the spark up and running?

Bronwen Aker: 25:24

Well, I mean, it's not just getting it up and running. It's the fine tuning. It's the torquing. It's the Yeah.

Derek Banks: 25:32

So the the heaviest lift was because I chose to go with VLLM. If I would have went with oLama or llama CPP as an inference engine. Because if you're gonna get an open weight models, the first thing is choosing an inference engine. Probably the most popular is Ollama, and it's it's pretty easy to get installed. If you're going down the road Very easy.

Derek Banks: 25:54

Yeah. Exactly. So most of this is pretty easy to get up and running until you start saying, you know, start using things like VLLM, where it's more of like a kind of like a production ready. I wanted to learn that because I was being there were rumblings of being asked of, hey, what if we wanna like, you know, host all this ourselves? Like, oh, yeah, that's a good question.

Derek Banks: 26:16

What if we wanna do that? And so VLLM is what, you know, the the, you know, the kids on the street use to host like production models. Right? And so I went with that. That was a little bit of a heavy lift.

Derek Banks: 26:30

I had Claude helping me, honestly, figure it out. Now I have, you know, a Docker project that I can just switch between models, so it's pretty pretty easy getting that. I actually wrote a setup guide. I could actually publish it on my GitHub or something. I mean, was to polish it up a little bit.

Derek Banks: 26:47

And so now the Spark, I got VLLM up and running, a couple of models on there, and then I have a Hermes agent and a Kali VM, just because I wanted to put in a VM instead of because I I wanted to be able to use the Spark when I wasn't at home. Right? Because that's, you know, cloud versus local. And so I I put a tail scale network on the Spark, and then on the the the Kali VM because I didn't wanna put tail scale on my production computer. I didn't know if, you know, systems would be happy with me if I did that.

Derek Banks: 27:26

So I just did it in a VM, and it works pretty well. Like, was up at the, you know, my daughter's, swim practice at the pool yesterday, 30 miles from my house, and I could, you know, access the Spark like I was sitting next to it. So everything like but the I I'd say everything was pretty easy, standard open source install stuff except for getting v l l VLLM up and running. But now that it's working, I only have a couple complaints. One complaint I have about the Spark is it's not easy to encrypt the hard drive.

Derek Banks: 27:59

I haven't gone down that road yet. Brian Fehrman did. It was an exercise in frustration apparently. I think he's now there's actually a cheaper one from Asus. I can't remember the model name that, it's about $3,500 because, you know, like Ethan was saying, there's kinda different levels to this.

Derek Banks: 28:17

Right? You know, the first level is, you know, cheaper inference, bedrock, open weight models. It's kinda, you know, it's, you know, not free, but it's pretty cheap depending on you're you're you're doing usage paying. It's a through a 3,500 to $5 for local inference of a 128 gig of RAM. And then, yeah, if you're gonna if you're gonna host the big models, well, you're looking at, what, a quarter million dollars?

Ethan Robish: 28:45

I see.

Derek Banks: 28:45

Yeah. Which Something in that neighborhood.

Ethan Robish: 28:47

You know Maybe more. I think this entire conversation seems very similar to like, we could be hosting a self hosted podcast right now. Yeah. Like, it's it's the same argument of why you would wanna get away from big companies, soft like software as a service Mhmm. And host it host it yourself.

Derek Banks: 29:12

So are we all using OBS and VPNs? Like, I don't even know how that works.

Bronwen Aker: 29:17

Yeah. It's But well,

Derek Banks: 29:20

I mean, so I think that the the whole idea here is that every company and every individual has different needs and different risk postures, different like Absolutely. And so it depends on what you want to do. If you're listening to this podcast, you're just getting into AI and cybersecurity, what I would do would be to install like Hermes agent on a VPS or a VM, and go to Nvidia and try their free inference to get started. Because then, you know, and setting up Hermes is pretty easy. They have like a if you similar to OpenClaw in the sense that they have a like, kind of like a like a walk through, like, configure it this way.

Derek Banks: 30:07

So you really just need to go to Nvidia, create an account, and and get an API key. You could use Anthropic API keys, or OpenAI API keys. They have a lot of different, like, model providers or options, and then just start using it. I I actually had the thought earlier that maybe I should take a step back away from quad code because that's what I've been using for so long and just kinda I'm gonna test the waters with some other stuff. I haven't even really used codecs that much yet.

Derek Banks: 30:39

And so, you know, maybe it's time to kinda to branch out a little bit. So but I guess what I'm getting at is like pick pick a harness and go and kinda learn it and get to and get to use it and, do it as cheaply as you can.

Bronwen Aker: 30:54

I know one of the things that I've been seeing more and more and I've been also seeing in my own experimentation is that it's it's easier to go deep with a single set of whatever tools, whether you know, regardless of the the model you're using or the harness you're using. Pick one, go deep with those. After you've gotten a a deeper understanding, then go and do, like, what you're gonna do. Branch out, see what the other kids are doing, and expand from there. And what I've seen is a lot of the knowledge is transferable from one set to the other, but it's not until you get into

Derek Banks: 31:35

the weeds that you get into the nuances. Yeah. And if you're sitting here thinking, man, I don't even know, like, what to what to even start to do like for as a project. I mean, that's okay. It's it's sometimes hard to like get started.

Derek Banks: 31:51

What I would do is pick something that you do at work. Like if you're a a blue teamer and you're looking at log files, you take a set of log files and have, you know, your your harness, Hermes, OpenCode, you know, QuadCode, Codex. Gemini has one. Gemini name there is the same that they named their models. It's called Gemini.

Derek Banks: 32:15

Why do you do that, Google?

Ethan Robish: 32:16

Well, just also like Google, they have already canceled or, like, deprecated that. Oh, now wow. And now they've rolled it into AGY and their anti gravity CLI.

Bronwen Aker: 32:30

Good lord. I

Derek Banks: 32:33

can't even keep up. And I have a like, that's the I pay for Google Pro, like, $20 a month just to, like, be part of their ecosystem.

Bronwen Aker: 32:43

Oh god.

Derek Banks: 32:43

To to, like and I still can't keep up.

Bronwen Aker: 32:46

I'm reading through alerts and articles and updates at least two to three hours a day over and above, you know, just and I'm a speed reader, and I still can't keep up.

Derek Banks: 33:00

Yeah. But that's that's the challenge for anyone listening this week is pick a harness and the cheapest inference that you can if you don't have it through work, and and try and run some real tasks through it. You don't have to be creative. You don't have to have it, you know, create you, you know, your personal GitHub or something. Right?

Derek Banks: 33:21

Like, don't have to have it recode Just, a platform or like, get started with, like, everyday kind of tasks. Because that's that's where I think personal, like, local open weight, open harness agents are gonna shine, is helping you do things like, you know, draft, you know, reports. I use Claude, I'm like, I should probably start doing this with, you know, Hermes instead. I have it help me come up with my weekly stat do a weekly status report just so I know even if no one's reading them, like, what I've done over the weeks. Right?

Derek Banks: 33:58

Because I have teenagers and I forget. And and so, you know, just the things that you do that you have to do all the time that are repetitive, and and see if it'll help you out.

Ethan Robish: 34:12

So we've we've talked about, like, hosting your own models. Right? And you don't have to go out and buy a DGX Spark. You don't have to go buy buy the most fancy MacBook Pro or whatever. So there's I don't know if we have show notes or not, but there's a GitHub project, Alexis Jones.

Ethan Robish: 34:32

It's called LLM Fit. LLM Fit. Yeah. And if you download that, run it, it will give you a list of open weight models that will work and, like, which how how well it predicts that'll work on your hardware. Yeah.

Ethan Robish: 34:47

And there's there's actually a website too, and I can't I don't remember.

Bronwen Aker: 34:50

That's one of the features of Misty that I like a lot.

Derek Banks: 34:54

Okay.

Bronwen Aker: 34:55

I mean, Misty is primarily a GUI, but you do have the ability to download and run multiple models locally or via API key. And you can do side by side chat. So you you're sharing the same prompt. You're sharing the same, data stack. You're sharing, all of this stuff across, you know.

Bronwen Aker: 35:17

I think three is the maximum number of chats you can run simultaneously. But it's nice because when you're going to connect to Olama or Hugging Face, it shows you right away whether or not a specific model will perform well or poorly, and that is another nice feature. And, you know, if if you're focusing on learning the differences between the models as opposed to the harness, Misty, m s t y, is a is a nice tool.

Derek Banks: 35:52

Well, I think that's probably a good place to to leave it. You know, we're thirty six minutes in, and so, I think, yeah, I'm sure that, we'll have more to say over the next coming weeks to months about open weight models and open source harnesses, and this seems like it's a a roller coaster ride that's, going into a dark tunnel. But maybe there's light at the end of that tunnel.

Ethan Robish: 36:17

So we kinda focused on different risk and data sovereignty and whatever the three different pillars here. One thing that one type of risk that cover like, cuts across all three of them that we didn't cover, and maybe we can do in a future episode, is like, local, like, prompt injection. Like, what what if what risk does your your harness pose? Prompt injection, or even just, like, hallucination and, like, your model goes rogue and starts deleting stuff on your system. Like, it doesn't doesn't matter where you're hosting or what what model you're using.

Ethan Robish: 36:54

The risk is the same.

Derek Banks: 36:56

I actually have slides on that in my class, so we can just take from that and do now we know what next week episode will be.

Ethan Robish: 37:04

Very cool. Alright. Alright. Just stay tuned. Alright.

Derek Banks: 37:08

And with that, yeah, keep on prompting.

More episodes

Chapters

Creators and Guests

What is AI Security Ops?