Join in on weekly podcasts that aim to illuminate how AI transforms cybersecurity—exploring emerging threats, tools, and trends—while equipping viewers with knowledge they can use practically (e.g., for secure coding or business risk mitigation).
Welcome to another fun and exciting episode of AI Security Ops with your illustrious hosts, myself, Joth Theyer, Brian Fuhrman, doctor Brian Fuhrman, should I say, and doctor Derek Banks. In fact, we'll all be doctors today. Why not? I'm not a doctor.
Brian Fehrman:Doctors all around.
Joff Thyer:That's right. Doctors all around. This episode is brought to you by Black Hills Information Security. If you are interested in AI assessment work, please visit us at blackhillsinfosec.com. Click on the contact us page, and we'll see if we can help you out.
Joff Thyer:Having said that, today's episode is all about our listeners' questions and answers. Hopefully, we'll give you some answers that are enlightening, and, somehow somebody probably shared document, which I didn't read. So I'm actually going to allow my fine colleagues here to kick it off with the first question. And, actually, when you ask a question that you want me to answer, you can just, like, play pick on Joff. You'll just have to read the question because I don't have the document up.
Joff Thyer:My bad. Okay. Let's go, and we will start with doctor Brian Furman.
Brian Fehrman:Alrighty. Could someone extract training data through model inversion attacks, and how realistic is that today? The answer is yes because that's basically the definition of a model inversion attack in which you are, sending a bunch of queries, to the data or to the model and getting data back and using, that the response data to make inferences about the training data that was used. A lot of times the examples that you'll see this on are actually image classifiers, and so essentially what they do is they'll just start feeding in a bunch of random pixels and what they might get back is, let's say, a facial recognition system and, they'll get back maybe some kind of a probability of how well those random pixels match something within, the database. And so then they just keep modifying the pixels until they get closer and closer matches.
Brian Fehrman:Then And eventually, they might be able to extract out training data that's there. And so that's, you know, one way that you'll you'll typically see examples about it.
Joff Thyer:So, I'm curious, though, about one thing here. I think, Brian, fundamentally, you're right because you are extracting the predictions from the model, which is which is fantastic. This is what models do. But are you extracting the original training data? That's a slightly different question.
Joff Thyer:I guess that's not the actual question being asked.
Derek Banks:So I guess that's one of the things that, like, one of the fights that I try and fight is that, you're not really storing data inside of a machine learning or AI model, right? You're basically creating math that will predict the data, right? And basically, if you looked inside of a model file, it would just be huge matrix of numbers. That's what it would look like to you, right? Like this big, just blob of numbers.
Derek Banks:And so you're not really extracting any data as much as you're inferring and reconstructing the data, right? So if you look at like ChatGPT as a large language model, it's not condensing or it's not storing data from the internet. It's not like a database, right? It's kind of distilling and condensing it into a mathematical construction that can then re predict, like predict the data, predict the most likely output based on your input. So we use words like data exfiltration because those are the words that we've used all this time, right?
Derek Banks:But that's not I mean, if you're gonna extract data from a you know, not not training data, but other data from a large or from a model, it'd have to be connected to the data through some kind of, like, tool.
Joff Thyer:Right. But actually, you know, sort of going digging a little bit deeper, you know, these attacks are useful because do they enable you to reconstruct a facsimile of the model? Then the answer would be yes, they do because you're extracting those inferences. And we are welcoming Brumwin to the show. Brumwin, welcome.
Brian Fehrman:Hey, Brumwin.
Bronwen Aker:I'm sorry. I didn't realize you were already recording.
Derek Banks:That's okay. That's alright.
Joff Thyer:You can just you're like the Brady Bunch. You just jumped in, and we're we're we're we're about to break out in song story about a lovely lady. Exactly. Okay. Let's go with the next question.
Joff Thyer:Derek, do you have a question up in your question list?
Derek Banks:Actually, the one I picked out is actually kind of close to what Brian said. Not the same one, but a similar question, so I'm gonna skip that one. I'm gonna go with what kind of telemetry should be collected for detecting prompt abuse or API misuse?
Joff Thyer:That's an interesting Prompt so let's deconstruct that question a little bit. Prompt abuse. I'm not sure what the listener is talking about when they talk about prompting. Maybe we can make the assumption prompt injection.
Derek Banks:Right? Yeah. I think that's what they mean. It's like basically people are trying to get our models to do our model to do something unintended. How do we detect that?
Derek Banks:And I guess taking a step back, I'll just say that I've been doing information security for many, many, many, many years now, many moons. And it's really not the norm to see people logging application traffic, web traffic in a meaningful way ever? Sure, web servers have logs. Typically what we see is by the time we actually get to those logs because of some kind of incident, they've well rolled over, right? They're not being offloaded to some central location.
Derek Banks:So that my first thing is I would say just as a general logging rule, are you putting your key and critical application data into some kind of centralized location, and I would start with that.
Joff Thyer:Yeah. And the other issue that's slightly related here too, and I've asked I've asked some, customers this on occasion, you know, are logging around AI model use, especially if you have a chat interface to that model? And the answer normally is is no. And there are instances where logging can actually potentially get you into trouble. Right?
Joff Thyer:Yeah. I mean, we know that there is such thing as too much logging in our industry as I
Derek Banks:mean, it all depends on what we're talking about, right? Like, I don't think that ChatGPT should be logging things centrally for what I'm putting into it because I we pay them and they say they're not storing those things. If it was an application in an enterprise setting, maybe that's a different story. And so I was really talking about more of like an enterprise like setting type of thing. Like we have this critical chatbot thing or critical process that people are interacting with and we want to make sure that no one's, you know, how do we detect if someone's trying to abuse it?
Derek Banks:And so collecting the data is step one. That's my point.
Joff Thyer:Yeah, and I would also add to that question. It is often the case in most AI model deployments with regard to dangerous or, you know, even potentially, you know, malicious prompting, that guardrails are deployed around that model. And, you know, you as a, if you're providing a model in some sort of local sense that's backing some sort of application, you would probably be implementing similar guardrails around that model. And if you are, you should certainly enhance the logging on the guardrail aspect of the model. If it trips, make sure that you're logging that instance, and it would also give you a sense of how effective or not effective your guardrails are.
Derek Banks:That sort of leads into my second take on that is, okay, now we have all this, like data. How do we determine whether or not it's prompt injection? I guess my response to that would be, well, is like, well, my friend, now you have a natural language processing problem. You're not gonna write like, traditional SIM rules to get yourself out of that one. You're going to have to essentially do some kind of NLP.
Derek Banks:And to your point, there are already pre trained models that would take in input and then return a Boolean, whether or not that matched some kind of, like, prompt injection. Right?
Joff Thyer:Right. So what was the second part of that question? We we covered the prompt injection, I think, pretty well. The second part was?
Derek Banks:Or API misuse. So prompt injection or API misuse. And what I think they mean by API misuse is probably a little broader than prompt injection, and probably goes back to kind of our first, the first thing that Brian was talking about, about model inversion is that, I mean, without prompt injection, I could send theoretically, depending what kind of information I can get out of the model, I could send prompts and measure the response and the tokens out and do some data science y type stuff and essentially try and recreate the model. In fact, there is a suspicion that the Chinese did this with DeepSeek against OpenAI. Right.
Derek Banks:Because you could buy a $200 a month subscription and essentially get what, like unlimited queries, right? $200 a month is cheap if you're trying to, you know, it's cheap compared to what it took to to pre train the base model for ChatGPT, I bet.
Joff Thyer:Yeah. Oh, one of the other aspects of API misuse would be the denial of service aspect, of course. But if you throw a lot of traffic at an API that's connected to a back end AI model, that's gonna produce a tremendous amount of compute load because, you know, even though it's only inference, inference is still using the multiple processing units of a TPU or GPU. Right? So that's gonna run up electricity bills.
Joff Thyer:It's gonna run up compute use. And so throttling the API is probably the first step there.
Derek Banks:Yeah, I would throttle the API, but even, you know, let's say I was able to get right under the throttle threshold, the good news is is that, like, that kind of API misuse is a little bit easier problem than an NLP problem. That problem is basically looking at like, you know, standard deviations of normal baseline traffic, right? Like, why does this user have a million more queries an hour than this other, like the rest of our users?
Joff Thyer:Yeah, exactly. Actually, of the, this I'm I'm gonna give OpenAI a compliment, but I was looking into this the other day because for, coursework, I wanted to, put some constraints around how the API key was being used, and and OpenAI is actually doing a fairly reasonable job. They're putting they allow you to budget constraint. They allow you to throttle, and they also allow you, to dictate specifically which models that API key will answer to. So it's actually pretty cool.
Joff Thyer:So I'm I'm impressed by what they're doing, and I'm sure other providers have similar things. And if you're doing a local model or hosting a local model, you would have to look for similar mechanisms in your implementations. So alright. We ready to move on to the next question?
Brian Fehrman:Yeah. I think we have time for one more.
Joff Thyer:So I think we should throw this one at Bronwyn because she's been sitting there patiently just waiting and itching to answer a question. I can feel it. So somebody
Bronwen Aker:Oh my goodness.
Joff Thyer:And if Brumman has the questions document up, she can, ask the question.
Bronwen Aker:So the the question is, have you seen instances of anyone implementing secure phrases almost as safe words to protect against things like prompt injection and confusion attacks. So this concept of a safe word I haven't seen regarding prompt injection, but I have seen it come up in discussions about how to protect against deep fakes. So I'm actually not going to answer that question. I'm gonna pivot though to the fact that this concept of a safe word as a human centric defense against deep fake attacks is something that I am seeing show up much more often. And it's coming up in a variety of different conversations because at the end of the day, if we don't have a human to human means of verifying what, is sometimes called proof of life, then these deepfakes have gotten so good now that it's almost impossible to detect, at least without doing an in-depth, insane forensic analysis.
Bronwen Aker:It's getting that good.
Derek Banks:This is
Joff Thyer:something Give us a context.
Bronwen Aker:A context.
Joff Thyer:Yeah. Where would this apply?
Bronwen Aker:Okay. This would apply. You're you're working for a company, and in order to prevent a chief financial officer from releasing funds based on a phone call from someone, There would have been there would have had to have been an out of bounds previously set up safe word that you know so and so treasury officer or manager comes in and says, I need x thousands of dollars payable to this bank account. And the CFO turns around and goes, Okay, what is the code word of the day? And the person, if legitimate, would hopefully have said code word.
Bronwen Aker:And if not, they would fumble or flail or whatever. And hopefully they wouldn't have had presence in the network and been able to find out what that code word was. But that would be something that could be used. Again, it would have to be arranged in advance.
Derek Banks:And if I could pick a safe word, that's danger. Right?
Joff Thyer:Well, no. I could
Bronwen Aker:Stranger danger.
Derek Banks:My safe word. Has anybody seen Bert Kreischer's birth stand up?
Joff Thyer:Like I could certainly see a scenario, as Brahmin describes, where maybe a company every month, let's say, let's just pick a period, publishes, a random list of words that are common, that are indexed somehow, actually, in a paper mechanism and distributes it out.
Derek Banks:Aren't you call you're you're basically talking about a one time pad. Exactly. Exactly. So
Bronwen Aker:And and there's
Derek Banks:a lot of precedent new again.
Bronwen Aker:There's a lot of precedent for this. We've we've seen all of the the scenes of the movies
Derek Banks:where World two and so
Bronwen Aker:and so can't access this and such security system, can't press the the big red button until they enter the correct security code that's kept in an acrylic thing, and and it's on paper.
Derek Banks:Reminds me of the the Navajo called to code talkers. Right?
Joff Thyer:So so Brian Brian has a comment, and and I have one as well. The key there is out of band. Right? It's out of the communications channel. But but, Brian, what's your comment?
Brian Fehrman:Oh, yeah. Well, I'm just gonna say I know that at, the previous credit union we were at, we did there was actually a code word that we had. It didn't rotate, but it was the one that, like, you picked. And anytime we called, like, in addition to other account information we had to give, you had to give code word. It was actually yeah.
Brian Fehrman:It it was I don't know. It was Did you get the pick? I'm trying to remember it. And then, like, sometimes, like, we'd say it and they'd be like, I don't think so, but it's it was the word, they just pronounced it differently and I'm not gonna say what the code word was, But
Joff Thyer:English is not And But it's very sad that it didn't rotate.
Brian Fehrman:Pick one
Derek Banks:that's not safe for work.
Joff Thyer:Well, I you know what? I think we've covered the waterfront today. I think we all do need to run off and do other things, unless anybody wants to try for another question, which I don't think January's
Derek Banks:not supposed to be this busy. I I don't and it's not. Yeah. That was It's cold that it wasn't gonna be this busy.
Joff Thyer:Alright. Well, thanks again to my illustrious host, Brahmin, Derek. Everybody wave. Brian, doctor, doctor, doctor. I hope you've enjoyed listening to this episode of AI Security Ops, and we will be seeing you net next time.
Joff Thyer:Keep safe and keep prompting out there. See you.