Join in on weekly podcasts that aim to illuminate how AI transforms cybersecurity—exploring emerging threats, tools, and trends—while equipping viewers with knowledge they can use practically (e.g., for secure coding or business risk mitigation).
Welcome everybody to another episode of AI SecOps. Today, we're going to go over a a notebook that we've put together on embedding space attacks, which has a lot of really cool visuals and I think is good at kind of showcasing the concepts of embedding and what it looks like when we are performing different embedding attacks that we'll talk about. So, before we get started, give you our spiel on if you are in need of a security assessment, AI Security web app, external, internal, Windows environment, cloud environment, anything where you might need security, Blackhills Information Security offers those services and more. So check us out at blackhillsinfosec.com. Or if you are interested in training and learning skills so you can do these things yourself or looking to level up or just curious about a particular topic, we also have a training branch called Anti Syphon, where we have our, full time people who are doing these things day in and day out.
Brian Fehrman:They put together their knowledge in one nice little compact, package for you to consume at a very affordable cost. So if you are interested in various types of security training, technology training, certainly check out, Anti Syphon. And so today, we've got joining me, we have Derek and Bronwen, our panelists. They're gonna hang out as we go down this notebook journey together. And so let's go ahead and let's, let's kick this off.
Brian Fehrman:So embedding space attacks. Well, first, before we talk about embedding space attacks, we should probably talk about what are embeddings. Well, the thing is is that LMs, AI, computers, I mean, all of it, they don't understand words. They don't understand text. They don't understand letters.
Brian Fehrman:I mean, what they understand are numbers. And I mean, if you really wanna get, specific about it, they understand voltages, high and low voltages. But we don't need to drill down that far. Not important. We'll just say they understand numbers.
Derek Banks:They understand on and off.
Brian Fehrman:On and off. Yes. Exactly. That's that's their level of understanding, if you will. It's it's all it is.
Brian Fehrman:And, really, I mean, if you go and you look at like computer organization, architecture concepts and getting down to the low level, it's a miracle that these things even turn on, much less do the things that they do these days.
Derek Banks:Just, I mean, taking it up one abstraction two abstraction layers from machine code to, I guess, maybe hexadecimal and then maybe up to, assembly code. Just looking at, like, the the what the, you know, the few math operations that you have at the CPU level. And then what we have today, like, with AI technology, I it's just it's astounding. It's astounding.
Brian Fehrman:Oh, it it it it really is. It's it's interesting how, how these tiny little pieces, components of of small subset of operations Made us have been cleverly pieced together to to present what we have these days. Right?
Bronwen Aker:It's, well, it's it's like when I was still in the web development world, I told people all it takes is one character. Too many, not enough, are in the wrong place, and everything would blow up. This is kind of the same I mean, I've I haven't seen this runbook, so that I am totally jazzed about this to see this for the first time. So but it it sounds like it's that same thing. All it takes is just that one tiny bit.
Brian Fehrman:Yep. Yeah. And so, without further ado and teasing, let's let's talk about, so the embedding is getting, getting back to that. What are we talking about when we're talking about embedding? Well, as we just spoke about, we need a way for the computer to be able to understand understand words and phrases that we present to it.
Brian Fehrman:And the way that that happens is through what's called an embedding process. The embedding process will take, phrases, words, what have you, and it'll break them up into different tokens that it then puts into a vectorized format. And that vectorized format is going to contain a semantic, information about the sentence, meaning it can have a sentiment information or intent. It can have meanings or definitions that are associated with words and phrases. And the idea is to end up with vectors that capture the essence of your phrase or words or what have you.
Brian Fehrman:So that way we can start playing around and doing different tricks with those. And so we'll get into that more as we go through, but first, let's look at a couple different types of texts or phrases that we have here, because this is what we need to start with. So first, we have a set of benign text that say things like, please review the q three financial report and send your feedback by Friday. I think we'd all agree that seems pretty, pretty benign. Or we need to update update the customer database with the latest contact information.
Brian Fehrman:Okay. Maybe something about the quarterly sales. Alright. That all seems pretty benign. Right?
Brian Fehrman:Then we get into what we're gonna call malicious text. So things like use the SQL injection payload to extract the user user credentials database. Install the key logger on the target workstation to capture admin credentials or create a spear phishing message targeting the CFO with a fake invoice attachment. Okay. So I think we could probably all agree that most of those are malicious.
Brian Fehrman:And then we get into the benign or I'm sorry, the ambiguous text that are somewhere in between where the overall intent so in this case, it's really the overall intent, is not necessarily malicious, but it uses words and phrases that could otherwise be associated with malicious behavior. And so things like document the SQL injection findings and recommend remediation steps for the development team. Or we should simulate a phishing campaign to measure employee security awareness training effectiveness. So in this case, when we look at it as a whole, you can see that the intent is likely not malicious here. But however, due to the use of specific words and phrases within this text, we might consider it to be a little bit ambiguous, at least when we're trying to, trying to categorize this information.
Brian Fehrman:And so with us looking at this text, we have that ability to quickly say benign, malicious, or somewhere in between. And so with the embedding process, what we wanna do is have the the computer be able to capture that information. And so we go through what is called the embedding process. Now, to do embedding, there are specific models that you will typically use for this called embedding models. In this case, it's all mini l m l six b two, but there are a ton of different embedding models that are out there that you could use.
Brian Fehrman:Some are better than others. It's not that one is right and one is wrong, it's just different ways that they go about performing this embedding process. And what you end up with is you end up with a vector for each of the sentences that has this 384, dimensional space, to represent a vector or to represent a sentence. And so once we have those once we have those sentences transformed into a vectorized format through the embedding process, and now we're talking about vectors, well, now we can which is all it is is just a bunch of numbers that are put together, basically. It's just, basically one big column of numbers, where each number represents a a different component or a different feature of of the sentence.
Brian Fehrman:But once we have it in that vectorized or numerical format, then we can start playing fun math games with it. So for those of you who are into, linear algebra and other math disciplines, I'm sure that there's just a whole crew out that are like, yeah, linear algebra. That sounds great. Well, turns out you can actually do some really cool stuff with it. And in particular, we can do things here such as given two different phrases that have been vectorized through the embedding process, we can see how similar are those phrases to one another using something like a cosine similarity metric, which is a pretty common one.
Brian Fehrman:So in this case, if we look down here, we can see maybe I should zoom in a little bit. Can I zoom in a little bit? Yeah. There we go. We can see that comparing benign phrases to malicious phrases, benign to ambiguous and malicious to ambiguous, we have scores in terms of how close they are together.
Brian Fehrman:And a lower score means that they are further apart. And so we can see that benign and malicious have the lowest score, so they're the furthest apart. Benign and ambiguous are, still pretty far apart. They're about as far apart as the benign and the malicious. And then we have malicious and ambiguous, which are actually closer together than any of the other, any of the other comparisons that we did.
Brian Fehrman:And likely because of just the specific similar phrases that are appearing, between the ambiguous and the malicious things such as SQL injection, and phishing, and attack, and red team, social engineering, vulnerabilities, things like that that could, with this particular embedding model cause those types of phrases to end up mathematically similar to one another. So now we get into the really cool part, which is the visualizing the embedding space. So people are people have the ability to visualize one dimensional information or two dimensional information, you know, with like a two two d plot of x y axis. And even three-dimensional information, because that's how we kind of perceive our world, right, in a three-dimensional type space of, you know, up, down, front, back, left, right, however you want to to name those. But once we start talking about dimensions beyond three d, things get a little bit weird.
Brian Fehrman:Like that's not something that we can really visualize. I mean, sure, you can go out and you can find images that are like, hey, here's a four d plot or a five d plot, but it's like, it's really just a three d plot with some, like, extra stuff that they do with it. Like, oh, okay. Yeah. It's it's three d, but also you have, the fourth dimension is which of these boxes you are in within this three d space, and it doesn't really make sense.
Brian Fehrman:Short short thing is we can't really visualize beyond three dimensions. I'm just gonna go ahead and put it up there.
Derek Banks:Even my attempts to visualize it end up being a box with, a diagonal line through it. I'm like, well, that's still three d. And and maybe you could take three d and, like, maybe over time, maybe, it'd be the fourth dimension. But even still, it's Mhmm. Still really three d.
Derek Banks:Yeah.
Brian Fehrman:Yeah. I agree. So then, once we start talking about something like 384 dimensions, no way. Why are we that's just not something we can
Bronwen Aker:that's insane. Yeah.
Brian Fehrman:It's it's a lot. And honestly, 384 is relatively tame when we're talking about, these different kind of machine learning concepts and natural language processing concepts, because there's not a limit to the number of dimensions that you can have. And typically, you would see more on the order of thousands, tens of thousands, or even higher dimensionality. But here, we're at 384, which is still a decent amount and more than we can visualize. However, using clever math tricks, we can project this 384 dimensional space onto a two dimensional space using dimensionality reduction and project and, projection.
Brian Fehrman:And so that allows us to plot this and visually see where our different phrases land. So each of these dots represents one of those phrases that we looked at earlier. And we have them classified as the red dots over here, malicious. The yellow dots here are ambiguous. And down here on the bottom, the green dots are benign phrases.
Brian Fehrman:So again, each one of these dots is representing a phrase, which I think is pretty cool that you can plot sentences on a two d graph.
Derek Banks:It's a I mean, how pretty cool piece of math. What was that? It's a pretty cool piece of math.
Bronwen Aker:What about the stars? What are those?
Derek Banks:Oh, that's a great question.
Brian Fehrman:Excellent question. So with this, they're, and along along with projecting this reducing the dimensionality to project this 384 dimensional vectors into a two d space, this code is also kind of trying to group the information together or find, what is called the centroid of the different categories. And so the red star represents the centroid or kind of the middle point of the malicious information. The ambigu or the yellow star is for for the ambiguous phrases, and the green star is kind of the the centroid or center of mass or, kind of, you know, central location average maybe, however you wanna say it, of the, of the benign the benign phrases. So each of these, basically just represents like a center point, within this space for those different phrases with the idea being that we can try to if we were to input new information, new phrases, we can see which of these, center points it's closest to when we're trying to do categorization of new phrases.
Brian Fehrman:Or as we start to play around with the existing phrases, as we'll see in some coming cells, we can see, does it move closer to the center of a different category from which it originally started?
Bronwen Aker:That's very cool.
Brian Fehrman:Yeah. So the next, we can talk about classifiers. So I alluded to that a little bit up here with the, the center of masses, but there are different ways that we can do classification in terms of saying whether or not a particular phrase is benign or malicious. And here we have two two kinda common classifiers. One is KNN, which is k nearest neighbors.
Brian Fehrman:The other is SVM, which is support vector machine. And so basically, these just go through a training process based upon the, the text that we have, in order to do a classification of, any any given, phrase that we might put into it. And then another one that you can do here is just what's called a cosine similarity measure, which isn't it's just basically measuring how close we are to any any one of the categories given a particular phrase. And so baseline here, we see that we have a detection rate of, three out of three given the phrases that we, put into, these particular classifiers. So next up, we can start playing games with this information.
Brian Fehrman:And the games you can play are trying to get around the classifications that are in place. And so the idea then, what we wanna do, going back up to this chart here, is given, say, a malicious phrase, we want to move it within the embedding space away from the center or away from the grouping of the malicious category phrases and move it more towards the benign or potentially the ambiguous. But we wanna move it away from that that center of mass of the malicious, mappings and within embedding space and move it more towards benign or ambiguous, while still staying malicious, having malicious intent or a malicious outcome. So one of the first things you can play with is just sentiment synonym substitution. So given a particular phrase that has malicious wording in it, can we dress it up a little bit?
Brian Fehrman:Right? So, might call this like a political attack almost because if you ever listen to, regardless of what politics you like to listen to, they'll often play semantic games like this, right, to kind of lessen the blow of something or to increase the the the perception of negativity or positivity or whatever. Right?
Derek Banks:Ambiguous meaning. So I was gonna say that the whole idea of this is like you could identify like some model using an embedding layer of a certain type. Right? And then you could go and develop words that would maybe bypass the classification and get you to further down the road of prompt injection.
Brian Fehrman:Yes. Yep. Exactly. Prompt injection or maybe it's a phishing email classifier. Yep.
Brian Fehrman:Wherever else you might
Derek Banks:Wherever there's an NLP system that you need to get past.
Brian Fehrman:Yes. Yep. Exactly. And so and what I'm going to dub the politician attack, we're just gonna go ahead and we're gonna birth that right here
Derek Banks:Works for me.
Brian Fehrman:Hope that it takes the, takes the world by storm.
Derek Banks:Next day, it's crazy. The Macarena.
Brian Fehrman:We we It's just like the Macarena.
Bronwen Aker:We are attacking politicians here. Right?
Derek Banks:No. Of course. Because nobody likes politicians.
Brian Fehrman:Yeah. Yeah. So so what we're getting at here is playing is playing war games. So here, instead of saying something like SQL injection payload, we'll say database query technique. And instead of lateral movement, we'll say cross system connectivity.
Brian Fehrman:How about instead of xfiltrate, we'll say transfer. Oh, no. No. No. It's not a key logger.
Brian Fehrman:It's just an input monitoring tool. Zero day exploit? I don't like that one. Instead, we're gonna call it a newly identified technique. So you can kinda see some of the different games that we can play there.
Brian Fehrman:These are all kinda basic ones, but that's it captures the general sentiment of the attack. And so then we would go through and into our original, text, our malicious text, and replace any of these phrases that we see on the left with phrases on the right to try to kind of soften the blow, so to speak. So then after we do that substitution, we can then see the updated, look at the embedding space again and see where those phrases now land after we have done the, synonym substitution. So here, we can see these three on the bottom probably have the best result in terms of they moved all the way over to the right here
Derek Banks:That's a large
Brian Fehrman:towards the
Bronwen Aker:That's a large shift.
Derek Banks:It really is.
Brian Fehrman:Yeah. All the way over here towards, benign and ambiguous land. These ones, interestingly enough, ended up actually moving further away from the benign and the ambiguous category. So mean, they they're moved away from the original center of mass for the malicious phrases, but they're not moving in the direction that we want them to. And so that's probably not gonna do well.
Derek Banks:The embedding model is on to your tricks with those ones.
Brian Fehrman:Yes. It is on to us. So looking down here with the updated, classification, we can see that none of these fold the KNN classifier. It, did not evade. So a higher number here is better, cause this is the evasion rate.
Brian Fehrman:So we can see that none of the phrases evaded KNN, one evaded SVM, and two evaded cosine similarity. So that's just one game that you can play. Another one is what's called context padding. Or I'm gonna call it, sandwiching, because it's like you, like you sandwich your your phrase that you want within, you know, other text. You kinda bury it in there.
Brian Fehrman:And the idea is is that you're gonna put in a bunch of benign sounding, information to surround your malicious text such that hopefully, it, there's enough benign information in there to get the phrase to be embedded closer to, actual benign phrases. And this is a technique that we've seen outside of, natural language processing. People have found before that with binaries, for instance, that if you just throw in a bunch of, like, benign text into, like, onto the end of a binary file, that it's potential it has a potential that when a classifier looks at it, that it ends up getting classified as a benign executable just because of all of the, like, allow listed phrases, that you might have in there. That's something that's been around for years now. But it's similar games that we can play here.
Brian Fehrman:And so, basically, rather than just have our malicious phrase, we surround it with, in today's team meeting, we discussed the quarterly results. As part of the infrastructure review, the following was noted. Insert malicious text. And then please review the attached spreadsheet for the full meeting notes. So we make those updates, and then we go back through the, embedding process for those updated phrases, and then we do the visualization, and we can see where it landed.
Brian Fehrman:So here, we can see that we have four of the malicious phrases now shifted all the way out here closer into benign land. One's kind of in an in between space between benign and ambiguous, would say. These three down here look a little more firmly down in the benign space. But then these ones up here, some of them shifted towards ambiguous space. The others actually shifted further away from all these groupings and would likely be considered malicious.
Brian Fehrman:So down here, we can kinda see this information represented, and that we now have four phrases that abated KNN classification, three that abated support vector machine, and four that evaded cosine similarity metric.
Bronwen Aker:So So before you go on to the next thing, one of the things that I'm noticing, because remember, I'm coming at this cold. It looks like when you're doing the initial setup and you've got your malicious, benign, and ambiguous frames of reference. The better quality content you have in those definitions, the better your results are gonna be. Would you say that's an accurate statement?
Brian Fehrman:Yes. Sorry. You're talking about content in you when
Bronwen Aker:you were first setting thing up, you had examples. These are malicious. These are benign, and these are ambiguous. And it it seems to me like the better those were crafted, the more reliable your results were gonna be. And I'm seeing the same kind of thing as we're moving further down into modifying it here.
Bronwen Aker:So so one of the things that I I think a lot of people get hung up on the math, and the math is absolutely important, but I'm also seeing a creative aspect to this process too.
Brian Fehrman:Oh, yeah. Completely. You absolutely nailed it. And it really goes down, goes back to, just a a problem or concept or focus when it comes to machine learning in general that we've discussed before on here, which is, as as you said, the quality of the data is really going to impact the quality of everything else down the line when it comes to machine learning. Regardless of what machine learning task you are working on, that data quality upfront is key.
Brian Fehrman:I mean, and there are, you know, the big companies, they have literally armies of people who are putting together these datasets and, curating and labeling and refining and tuning, you know, and it's, yeah. It's it's a very good thing to point out because it is extremely important.
Derek Banks:Say as a whole, from the very smallest machine learning model, maybe like linear regression all the way up to OPUS 4.6, the whole game is quality of your data and the parameters you pass into the black magic box of math.
Bronwen Aker:Yes. Speaking of math Yeah.
Brian Fehrman:I know.
Bronwen Aker:You mentioned linear algebra. For people who wanna go deeper into this, what other kinds of math should they study?
Brian Fehrman:So I think linear algebra is certainly going to be one one of the big ones, because that gets applied to a lot of machine learning aspects. You know, the neural network aspects, the all the embedding stuff we're talking about here. I mean, the majority of it's really linear linear algebra. If you want to get into some of like the like even more hardcore stuff, then some calculus can help as well. Because when you start talking about, back propagation with gradient descent for training neural
Derek Banks:networks calculus.
Brian Fehrman:That's calculus. Yep. That's I mean, you don't need to you probably don't need to have you know, be able to do a bunch of integrations and derivations off the top of your head, or differentiations off the top of your head. But to have the level of under basic understanding at least of the principles such as what is gradient descent.
Derek Banks:Yeah. I was gonna say
Brian Fehrman:And how
Derek Banks:understanding an integral would be pretty important to know, like, what's happening. And then one more section of math that is important for overall, like, machine learning, especially if you wanna solve real world problems, would be Bayesian statistics.
Brian Fehrman:Yeah. Statistics and probabilities for sure. Yep. Yep. Absolutely.
Brian Fehrman:There's even there was a really good no starch press book. I've got a I can see it from here, math for deep learning, which goes into probabilities and a bunch of other pretty much the stuff we talked about that that is I think is a really good resource.
Bronwen Aker:Okay. Sorry for derailing things, but I just I wanted
Brian Fehrman:No. Yeah. Okay. Yeah. Great great great questions and insights and comments.
Brian Fehrman:This is this is wonderful. So I'm gonna skip ahead, just to the final attack here, which is, what's called embedding collision crafting. Which is that, basically, you wanna craft text that semantically is malicious, but happens to have is is put forth in such a way that when it goes through the embedding process that it lands closer to the, to the benign space. And so the idea here, is so we can look at collision collision text here. So please review the q three security assessment report and send your access credentials by Friday so we can validate the quarterly system authentication process.
Brian Fehrman:So here, it's doing a little bit more than just like the synonym substitution phrases. It's it's crafting the sentence in a way to be, specifically, like, you as a person and looking closely at this, you can see that there might be some malicious intent. But with the overall wording of it, you can see that how it might end up getting confused when it goes through that embedding process. So we can see the first one, the intent is traditional credential phishing disguised as quarterly review. The next is data exfiltration disguised as data migration, and the last is privilege escalation disguised as maintenance.
Brian Fehrman:So it's kind of like social engineering. Right? It's just how when you how you're phrasing it. But the intent is still there. And so we can see here with, doing a quick, check that are just a couple of the phrases that we're checking here getting classified as benign.
Brian Fehrman:And let's go and let's look at it down here within the actual embedding space. So here, we can see that the c two, data exfiltration disguised as benign lands firmly down here within the benign space. And same thing with privilege escalation disguised as what was it disguised as? As maintenance. Our other one though, credential phishing disguise all the way up top.
Brian Fehrman:So I think I'm not sure which all classifiers it used here. Is it just a cosine similarity? I think it's just a similarity metric, which I'm surprised that all three of those end up as benign. Two of them for sure, but we don't don't need to get hung up on it. Regardless, you can see though, of those those shifts through space by just playing basic word games.
Brian Fehrman:And so that's kind of the, the gist of it. And really, I mean, if you wanna talk about, you know, what a defenses look like, I mean, embedding models, you know, making sure you got a good embedding models in place, giving enough, different, data samples to, try to, capture a lot more of the, of the different attacks you might see out there, different classifiers that you can use. You can use an ensemble of classifiers if you want, which is also a good approach. So use multiple class ifiers with majority voting. When it comes down to it though, like, you're never gonna be able to catch everything that's bad and not alert on anything that's good.
Brian Fehrman:I mean, that's kind of the holy grail of security. Right? I mean, that and it's just, especially with these classification problems, it's just it's not a realistic possibility. So in my opinion and so really it comes down to just doing the best that you can with what you have.
Bronwen Aker:Well, anything you
Brian Fehrman:can do any of these attacks exist.
Bronwen Aker:Anything you can do to improve the signal to noise ratio is Yes. I mean, how many how many SOC analysts wind up complaining about alert overload and wind up burning out because they just can't keep up with the false positives, and and then they freak out if they ever have a false negative and something gets passed. This type of process, I mean, if it can help reduce those false negatives and help take stress off of SOC personnel, I call that a massive win. Yep.
Brian Fehrman:Yeah. I I can completely agree. Yeah. And that's one of the unfortunate things with, when it comes to classifiers in the security realm is that, you know, depending with classification stuff in general, I mean, usually, you have to pick what what do you care what do you care more about or what you care less about, and false positives or false negatives. And when it comes to security, you don't wanna you don't wanna false negative.
Brian Fehrman:Right? Like, you don't wanna miss the thing, which means you have to error on the side of false positives, which goes along with the alert fatigue that you're mentioning. And, yeah, it's it's a difficult problem. And, yep, I agree. Anything you can do to help help with that is great.
Brian Fehrman:So so that's it. I hope people enjoyed that. I think it's really cool to see math applied to, the problems to actually like pull apart the components and be able to see the different math that's being applied and being able to visualize it. And I I I think it's awesome.
Derek Banks:And and so now that we're thirty two minutes in, I'll have the shameless plug of, if you'd like to see more of this, you should come take Brian and I's class in October in Deadwood where we'll go through other notebooks even though I imagine we'll probably be covering some of them between now and then on different podcasts. But if you would like to, learn more about this, come take our class.
Brian Fehrman:Yep. Wild West Hackenfest attacking, defending, and leveraging AI. Sweet. And with that, think we'll wrap it up. So Very helpful.
Brian Fehrman:Hope everyone enjoyed it. Well, yeah. We'll, we'll see you on the next podcast and keep on prompting.