Join in on weekly podcasts that aim to illuminate how AI transforms cybersecurity—exploring emerging threats, tools, and trends—while equipping viewers with knowledge they can use practically (e.g., for secure coding or business risk mitigation).
Brought to you by the experts at Black Hills Information Security
https://blackhillsinfosec.com
--------------------------------------------------
About Joff Thyer - https://blackhillsinfosec.com/team/joff-thyer/
About Derek Banks - https://blackhillsinfosec.com/team/derek-banks/
About Brian Fehrman - https://blackhillsinfosec.com/team/brian-fehrman/
About Bronwen Aker - https://blackhillsinfosec.com/team/bronwen-aker/
About Ben Bowman - https://blackhillsinfosec.com/team/ben-bowman/
Hello, and welcome to another episode of AI security ops with my illustrious colleagues, Brian Furman and Derek Banks. Welcome, gentlemen. As usual, this episode is brought to you by Black Hills Information Security. If you are interested in an AI assessment for some AI application framework or local model deployment or whatever your context is, please feel free to reach out to us. You can find us by visiting the website blackhillsinfosec.com and clicking on the contact us form.
Joff Thyer:Today, we are going to talk about model evasion attacks in the realm of AI security. So I think the first thing is, like, what exactly are model evasion attacks? Right? I mean, how how do attackers evade machine learning models used in security tools? And and what exactly is going on here?
Joff Thyer:So I maybe I'll let doctor Brian Furman set the stage for us. What do you what have you got, Brian? Sure.
Brian Fehrman:When we're talking about model of agent attacks, we're talking more about models that are doing some kind of a classification, whether that be kind of a traditional classifier models, you know, random forest decision trees, support vector machines, or you could be using large language models to do classification tasks as well if you want to. But basically, something where you're giving input and on the other side what's coming out is some kind of a classification, a determination that's being made about that input. And in the context of cyber security, typically what we're gonna be looking at is malicious or not malicious, normal or anomalous, and then maybe some kind of a scale in between there. And when we're looking at model evasion, what we're looking at is somebody who is intentionally trying to trick that classifier such that they give it something that is doing something that's bad, is doing something malicious, is doing something out of the ordinary, but ultimately ends up getting classified by the model as normal or benign or not bad, know, whatever whatever you wanna call it. But basically tricking it into misclassifying, whatever, you know, behavior or malware sample you might be providing to it.
Joff Thyer:Gotcha. So, you know, I think one of the ways people can really get a sense of this is in the space of visual image classifiers, because that's one of the really early successful stories in, AI models. Right? We we did a lot of, visual identification work even as, you know, far back as a decade or more. And I've heard that, with an image that you are pushing into a model for classification or, identification of some sort, it doesn't take too much in the way of pixel modification to to actually get that model to misclassify.
Joff Thyer:So thoughts on that, gents?
Derek Banks:I mean, I think that, you know, from my experience, that's been pretty true because when you talk about image classification, you know, what's happening is that image and, you know, the same thing happens with, you know, with text or audio when it gets put into any kind of model. And, you know, drawing distinction again is, like, you know, this ecosystem. We have a large language models, but there's all these other types of models out there. Right? And so when you talk about image classification, those are typically like machine learning or statistical learning models.
Derek Banks:And what they do is they take the image in and it treats, you know, the the the image as like a grid of pixels, and that's the numerical vectors that get put into the image classifier. And often you can change the the picture where the human eye can't tell, but it still looks like, you know, a different you know, to the computer, it looks like a different picture. And so I I think that yeah. Yeah. That's what I think of.
Derek Banks:I always think of minority report and, you know, the the eyeball thing. Right? Like, that's the, like, the model evasion. But probably more practical would be, you know, the like, getting past phishing or malware by introducing some kind of noise into the sample. The same kind of ideas with the picture.
Derek Banks:You're just introducing data that makes it not look like the thing. One of these things doesn't look like the other.
Joff Thyer:Right? Yeah. Absolutely. So okay. So that's that's the example of images.
Joff Thyer:And we all know that, you know, convolutional neural network technology is typically applied to image classification, from an from an algorithmic perspective of building the model architecture. But, you know, that that kind of technology is really not very different when we're trying to classify other things. And so it's one of the classic, applications that's out there of AI models is to perform classification on malware samples, for example. Right? So I think very much, be because everything in data science and and in AI and ML is largely turning the this source data into vectors of numbers, I think the same kind of principle applies here that if you manipulate the malware just a little bit, you can find out that sort of threshold where you can get this misclassification to occur.
Joff Thyer:Thoughts on that, Brian?
Brian Fehrman:Yeah. And so that's, it's kind of the, the the general approach. And I think a lot of people have probably been doing this for years now without really realizing what exactly they what they're doing, right? Which is where you're trying to develop a a piece of malware that's ultimately going to get past some EDR that you might be going up against within your engagement. And so what what do people typically do?
Brian Fehrman:So you'll run it, you'll see that it gets caught. Okay. Now try changing some way that some function in there may be how you're creating a process or how you're actually putting the shell code into memory or how you're storing the shell code, within the program
Derek Banks:or pulling it out somewhere else. What's what's that? Or deleting the comments.
Brian Fehrman:Yes. Yeah. Exactly. And so you start trying all these things until you find the thing that allows you to get past whatever model you're going against. Now, what you're going against might not necessarily be like a machine learning model per se, but but it could also be.
Brian Fehrman:And so really what you're doing is you're finding those decision boundaries and you're finding that, hey, when I have a a program that has these specific traits inside of it, it's more likely to get triggered or flagged as malware than if I don't have these. And so you start playing around and figuring out what what traits can you actually have in there, what pieces of code and functionality can you actually have in there, but still be allowed to get past these different classifiers you might be going up against, finding that exact boundary, basically.
Joff Thyer:In a very real sense, what we're doing here is kind of exploiting the feature classification design of some of these neural networks. Right? Because if we can pull on a string that that isn't in the the feature classification of the model and succeed, then we could potentially get misclassification. So I think that's a it's an important thing to realize. And and then the other thing I think we need to point out too is that when you're dealing with model evasion, you're looking at a process that is playing out when the model is doing inference against the the sample data that's being supplied.
Joff Thyer:So typically, this is not something that is related to model training, and I think there's a lot of actual confusion out there. You know, this is one of Derek's favorite pet peeves actually, that, you know, people when they look at, especially LLMs these days, they tend to think of their prompting as if it were training, and that's not the case when
Derek Banks:you're prompting on the fly. Yeah. It reminds me of anthropomorphic fallacy. Like, I still catch people doing this all the time, like, making, you know, like, the AI, like, it's some giant knowledge thing in the sky that we are all talking to and, like, it's an entity. And I don't you know, that that you know, that's not how it works.
Derek Banks:Right? Yeah.
Joff Thyer:But that's not to say that there isn't a possible attack vector on the supply chain side through data poisoning, and that would be impacting the training data that is going into the training process, and that concerns me a lot. We don't hear much about it. I haven't heard of many public instances or at least companies willing to go public with instances where their training data has potentially been poisoned. But I do know, that, you know, a lot of data scientists when they're developing these machine learning operational pipelines, you know, their primary focus is not cybersecurity. Right?
Derek Banks:Yeah. Probably an an easier, I wanna say easier. Another mechanism or another thing that would lead to, you know, evasion would be model theft. Right? Like, if you get your hands on the model, you could reconstruct it in some kind of way.
Derek Banks:It would give you a better opportunity to then test your trial and error, not against the live thing. You know, a case that, I can think of, and I know I've brought it up on previous podcast, is the the folks from here at Dreadnode now that back in, I guess, 2018, I think it was. Can I ever remember the date when they did the talk at DerbyCon about bypassing Palo Alto? The way they did that was or bypassing Palo Alto's spam filter. The way they did that was they were able to steal enough data to essentially recreate the the classifier on the back end, and then we're able to trial and error figure out, like, what words we'd kind of fly by if it was in the data.
Derek Banks:Right? And so I think that sometimes, you know, sophisticated threat actors will chain things together, be it supply chain or or model theft or something to then get past, to get past, you know, other defenses.
Joff Thyer:Yeah. So, yeah, it's interesting conversation. Right? But to to sort of distinguish between the the supply chain, the data poisoning side, is what I would think of as a as a supply chain attack, essentially, and, the actual inference time style attacks. And I I think you started to allude to this a little bit, Derek.
Joff Thyer:There's also the idea of modifying the binary model itself. I guess binary is not a good term for that, but, you know, we we are aware of model ablation techniques where the weights get get tweaked and parts of the model get sort of pulled out. That that's an attack on the actual model, objects themselves or the vectors, if you like, of numbers that are essentially all a model is is a bunch of vectors of numbers that are stored, right, in a data structure. So that's a different entity as well. So I think we've kind of talked around how the attackers do it, but is there anything else on the, like, how we attack side that we need to bring forward here?
Brian Fehrman:I think we covered most of it. And it kind of is what Derek was talking about with kind of like model extraction attacks in terms of, you throw enough samples at a classifier, I mean, you're going to eventually learn enough information that you can probably then make some inferences of your own of how the how your target classifier is is deciding whether or not something is good or bad, coming up with, you know, what features you think it's looking at and then finding ways to, you know, kind of play around and mess with those features to try to to get past it. You know, research I work research work I did on my dissertation, which I brought up in a previous episode, was about VBA malware classifiers and that's essentially what it did is you just create a bunch of VBA samples, throw them out there, throw them at the classifier, get classifications back on it, you know, good or bad, add new features to it, modify existing features and then eventually what you come up with is you essentially come up with like a good list and a bad list of thresholds like, hey, if you have these feature characteristics, it's probably gonna be bad, Otherwise, it's probably gonna be good.
Brian Fehrman:And then from there, it's just a matter of modifying your malware to kind of fit those, which is easier said than done. Because when it comes to images, we talked about early on, you're just modifying pixels. And at the end, you're still gonna have an image. Right? But when we're talking about modifying executable programs, you're not necessarily still gonna have a working program if you make the wrong modifications.
Joff Thyer:Yeah. Was gonna say there's there's things you can do like, you know, in the in the n t header, for example, in in an XE or a DLL, but there there are things you can do that are that are benign, then there's things that that you can do will totally destroy the functionality of the Yeah. Of the
Derek Banks:other things we've talked about. Images are probably the easiest. Fishing text. Okay. Yeah.
Derek Banks:I mean, not easy, but easier than a binary that has to work.
Joff Thyer:Yes. Yeah. The other point I wanted to bring out too is is something that I think people miss a lot of the time, and that is this this, concept that, all, AI models are are data driven, and and because they're all data driven, they're they're all probabilistic in in the results that they produce. And what that really amounts to is is nondeterminism, whereas, you know, the traditional algorithms we use, in deterministic technology is, you know, that's that's almost everything in InfoSec up to sort of this point in time or around about this point in time. So and the reason I bring that up is, in the model evasion attack space, you might try and attack once and it not work, and you might try the exact same attack another time, and it then works.
Joff Thyer:And that's you know, it's due to the fact that these things don't necessarily follow a straight path, and they're probabilistic in nature. Yeah. So myth busting, first of all. Right? AI stops zero days.
Joff Thyer:No. AI does not stop
Derek Banks:zero days. Well, by definition, nothing stops zero day. It gets detected, and then people figure out how to stop it. Right? But that that is a a good one.
Derek Banks:Right? I I was thinking when I read that, was like, AI stopped zero days. I I kinda wish, although I'd probably be out of a job. But, you know, the one of the most high pry profile zero days here recently was the Palo Alto vulnerability earlier this year where, you know, from the outside, there's remote code execution and footholds on the Palo Altos. And and do do you all know how that got caught by Vilexity, which was the company that found it?
Joff Thyer:I do not. Please enlighten
Derek Banks:Yeah. Network monitoring on the outside. Now I suppose AI could have caught it, but also it was just curl commands coming from the originating from the firewall going off to pull down Python scripts, which typically isn't what you want your Palo Alto doing.
Joff Thyer:Yeah. Yeah.
Derek Banks:So, yeah, that's how it got caught.
Joff Thyer:It it also comes down to that, you know, when you add more functionality to a network infrastructure device, that functionality could be abused. Right? So it's yeah. I I will say, while AI, you know, can't necessarily stop zero days, I think it does have a better chance of detecting zero days or or things like that because it is probabilistic and not deterministic. Right?
Derek Banks:I think That's a big difference. Just looking for anomalies in general. I mean, looking at large patterns of data, AI models are better than humans are. Right? So Mhmm.
Derek Banks:Yeah. If it starts to blend in to statistics versus AI, there's somewhere there's a gray area in between. So
Joff Thyer:so I I think the the other thing that's going on here and the trend is, you know, threat actors, are are integrating, adversarial behavior as a standard capability, and they're also integrating more adaptability. And the AI tech goes both ways. Right? The threat actors are using AI to to generate more evasive, entities, to attack different networks. And
Derek Banks:Or do the whole attacks them itself as we'll talk about in a future episode.
Joff Thyer:And, you know, if anybody's I've done this. Anybody's tried to code, like, polymorphic malware, knows that it's a really difficult thing to do. But for you to code a malware sample up as a proof of concept and send it into an AI and say, make this thing more polymorphic, that's not a difficult thing to do for an AI model. And it is definitely a trend that is out there in terms of, you know, leveraging AI as as the attack animal. I mean and I always picture in fact, I did a slide recently on this.
Joff Thyer:I already always picture, like, AI used for defense, AI used for attack, and then, like, going into a sword battle. Right? Because we're definitely entering, that that that kind of world, you know, as as we're as we're watching these these trends evolve. And from a pen test perspective, we absolutely are in a situation where, you know, penetration tests and red teaming is happening against AI components directly, and we all are very familiar with that. So let's go let's pivot to defensive strategies.
Joff Thyer:What can organizations do in the context of model evasion? Maybe Brian's turn to talk.
Brian Fehrman:Yeah. Sure. So there's couple things that people can try depending on how much control you have of the model that you're that you're using for doing the detection. I mean, obviously, if you don't have the ability to train and tune the model, then some these things aren't really going to apply to you. But let's say, if you are a company who develops this product, a model or classifier type product, then things that they can look at doing which are what's called adversarial training, which is basically you take samples that have been slightly modified, but are still malicious and then feed those in as part of the training process to kind of broaden the scope of what a classifier has seen.
Brian Fehrman:But there becomes a point of diminishing returns. Right? Because you start to get into that gray area where you're going to push it to where basically everything is going to start looking malicious. Right? I mean, really what's what what the holy grail is and, I mean, any cyber security aspect, especially within these EDR detection products is is basically catching everything that's bad and not alerting on anything that's good.
Brian Fehrman:Right? And so if you're if you're trying to pump as many examples as you can into into your training process, it's it's gonna all start to look the same basically to it. And it's Yeah. It's just you're gonna lose a certain sense of of quality, when it comes to your overall, classification out
Joff Thyer:of the model. Yeah. I mean, I I think that, you know, something that occurs to me a lot is that people think, like, AI is gonna take over everything in terms of, defensive capability. But I think the best things that we're actually seeing in terms of defense is sprinkling in AI where it is most effective, but not for everything. Right?
Derek Banks:Mhmm.
Joff Thyer:Some deterministic technology is still really good for parts of your defense strategy.
Derek Banks:What I keep reminding people over and over again, and I just did it today, was that just what you were saying, AI isn't gonna take over everything. It's a tool like anything else. And, you know, people get kind of, you know, it seems like we're dividing into I hate and I love AI camps, and I think that's the wrong idea to look at it because if I'm gonna go, like, fix an engine, I needed some wrenches and a socket set. I don't need, you know, a table saw. Right?
Derek Banks:I need to pick the right tool for the job. And so to your point, you know, where the tool fits is where it should be used.
Joff Thyer:Yeah. 100%. It it it's a you know, I think I had a Twitter post the other day. I said, just because you have AI as a hammer, everything is not a nail. Right?
Joff Thyer:Everybody back off a little bit here. Right? And think about what you're doing. And, you know, and sadly, we've seen sort of a a a degradation in some of the security mechanisms that are in the scaffolding around AI applications. So things are not necessarily happening for the positive in some of this.
Joff Thyer:That's a tangent and not necessarily directly related, but it is important. For, organizations, who are thinking about acquiring some sort of AI driven solution, for whatever their information security task is, what kind of questions should should the, organizations be asking, these, these vendors who are sprinkling in ML and AI and all their buzzwords?
Derek Banks:Oh, it reminds me of, again, of Craig Vincent telling me not too long ago, man, people are shoving these large language models into everything. Yeah. Right. Well, I mean, I think my first piece of advice would be that, AI is not magic secret sauce. It's a specific type of thing that covers a lot of technological areas.
Derek Banks:I would say, first of all, I have somebody on staff or send somebody to training to learn about what AI is and what it is not and what it can and can't do. And then kind of determine, like, what, you know, what business problem you are trying to solve and picking the right tool for it. And maybe AI driven technologies are a right kind of fit. So Mhmm.
Joff Thyer:Yeah. Thoughts on that, Brian?
Brian Fehrman:Yeah. Absolutely. I think just having those realistic expectations of what it what it can and can't do. Don't don't buy the whole line of, oh, it's gonna it's gonna detect all these unseen threats. Like, well, I mean, not really.
Brian Fehrman:I mean, to a point. Right? It like, it has to have seen at least some aspect of it before. Otherwise, it likely isn't going to detect it. It's just gonna, you know, it's gonna look like benign behavior.
Brian Fehrman:Or it's
Derek Banks:it's either gonna tell you, I've seen, like, this thing and it matches the pattern. Right? Or it's gonna say, this doesn't look like everything else. Somebody should go take a look. Yeah.
Derek Banks:Those are only two options. Right?
Brian Fehrman:So Yeah.
Joff Thyer:Right. And and it is all in the training data. Right? It depends on very high quality, well curated training data to how effective that model is actually gonna be Mhmm. At the the the task that's being put in front of it.
Joff Thyer:So I, you know, I do think organizations should consider asking, any people that are providing some sort of AI driven service. Like, is this a homegrown model? What is the source and quality of your training data? I mean, actually put those questions out there to see, whether the vendor will respond. Now in some cases, the vendor might invoke the, you know, cone of silence and and say, no.
Joff Thyer:No. That's our special proprietary source, and we can't tell you about that. But, you know, it's, it's still worth putting that question out there.
Derek Banks:It sounds like you don't want my money.
Joff Thyer:Sorry. No acquisition for you. But, alright. So what about the future outlook? Is the future bleak?
Joff Thyer:Is the future rosy and shiny?
Derek Banks:Specifically with model evasion, or are we just talking about the future?
Joff Thyer:Yeah. Well, I suppose we should keep it in context. Yeah.
Derek Banks:And in context of the model evasion stuff, I mean, hacker's gonna hack. Right? I don't think it's going anywhere. Is model evasion's gonna be a thing forever. Mhmm.
Joff Thyer:Okay. So interestingly, the this part always fascinates me is the the regulatory front. Right? We all know that, you know, Europe has championed being very proactive on this, whereas The United States generally is very reactive on the regulatory front. However, I think there is kind of a convergence out there of people wanting more transparency into how models are actually constructed.
Joff Thyer:Thoughts on that, Brian?
Brian Fehrman:Yeah. Basically, so we're talking about the the training aspect of them or kind of where all this data is coming from. Yeah. I I do think that those are important questions certainly for for people to be asking. I mean, why is it making the decisions that it's that it's making and how well does it do against new things that it hasn't seen before?
Brian Fehrman:I mean, what kind of robustness testing are they doing against the model? Will they allow you to do robustness testing against it? Because, you know, some companies, they'll collect telemetry and they'll shut you down if they think that you're actively testing their model to, you know, see how how well it does. And, you know, I think having being able to have that level of capability in your environment to be able to test the products that are in your environment, I think, is important and to have that discussion with the vendors.
Derek Banks:You say that like it's happened to you.
Brian Fehrman:May maybe. We'll not comment. Well,
Joff Thyer:also, the third thought occurs to me, you know, if you're dealing with a vendor that's proff proffering I can't refer. What is that word? That is trying to sell you a solution. I I'm, like, brain fighting on whatever word I was trying to come up with. You know, are they gonna publish their their internal test results for you?
Joff Thyer:I mean, are they gonna give you something that gives you some assurance of the capability of that model? And I I think they should. And will we get regulatory backing for that? I I just don't know.
Derek Banks:I think we're gonna have AGI before we have that.
Joff Thyer:Yeah. Possibly. Possibly. Alright. Let's have some closing thoughts.
Joff Thyer:Alright. So ML is powerful. Machine learning, that is. But, with great power comes great responsibility, and evasion is a real and ongoing challenge. So any other, just from the panel, any other closing thoughts?
Brian Fehrman:Yeah. I think that I think that that last one kind of sums it up, right? It's just it's going be a continuing arms race. With every new defense that comes out, there's going to be a new attack tactic that comes out as well. Because I mean, at some point, like, you you have to let things happen in your environment.
Brian Fehrman:Right? Like, people have to be able to do things in your environment. And as long as people basically are able to do things on the computer, I mean, there's gonna be something that they can do that's gonna get around like some of these different detection products. And it's going to be a constant back and forth, but it really pushes like each side to keep upping their game. And where, you know, where the peak of that is, I don't know.
Brian Fehrman:But we're certainly gonna see improvements on both sides as as, you know, in the years to come.
Joff Thyer:Yeah. I think another really good point here is, you know, organizations that are considering deploying applications that are leveraging AI or even they're training their own models and embedding them in some of their own applications, Don't forget you gotta defend the model itself. Right? We've all talked a lot in the past about prompt injection, in the case of natural language processing and LLMs. You know, there's nothing we can do about that.
Joff Thyer:Right? That's a feature. It is it is not a a a defect, and I think it's a feature we're gonna be struggling with for years. And there's all kinds of guardrails people put around that stuff to try to address it. But but with simpler models, defense is still required.
Joff Thyer:Even if you have a simple classification model, you know, even even something as simple as, like, API throttling against the requests that are coming at it to to, you know, prevent potential denial of service for computationally expensive inference is very really, really important. Any other closing thoughts? Derek Banks?
Derek Banks:No. I think that pretty much wraps it up for today.
Joff Thyer:Well, okay. Well, thanks again, everybody, for, visiting and attending AI Security Ops and another exciting episode. We will see you next time. And remember, keep safe and stay prompting out there. Bye bye.