Pop Goes the Stack

Prompt injection has been the headline security problem for the last year, but have we been guarding the wrong layer? Lori MacVittie is joined by cohost Joel Moses and architect Elijah Zupancic to break down why many “prompt filters” miss the real execution surface: models don’t process words, they process tokens, and attackers are increasingly targeting the tokenizer to bypass defenses.

Using the research behind Adversarial Tokenization and TokenBreak, they explain how the same text can be segmented into different token paths, changing what the model actually “sees” and how it behaves. That creates a split-brain security challenge across text, tokens, and state, where protecting only the natural-language layer leaves multiple routes around your guardrails. TokenBreak, in particular, highlights how attackers can brute-force and classify responses to infer tokenization behavior, turning the model into its own oracle.

So how can you protect models? Hear why a layered security is the only viable approach: narrowing accepted input surfaces, adding language detection to reduce the search space, limiting automation and abuse patterns, and moving toward token-aware inspection and policy enforcement at the tokenizer boundary. But their are tradeoffs when guardrails sit outside the model.

Tune in to make sure you’re not already downstream of the attack and what you can do about it if you are.

Read Adversarial Tokenizationhttps://arxiv.org/abs/2503.02174
Read TokenBreak: Bypassing Text Classification Models Through Token Manipulationhttps://arxiv.org/abs/2506.07948

Creators and Guests

Host
Joel Moses
Distinguished Engineer and VP, Strategic Engineer at F5, Joel has over 30 years of industry experience in cybersecurity and networking fields. He holds several US patents related to encryption technique.
Host
Lori MacVittie
Distinguished Engineer and Chief Evangelist at F5, Lori has more than 25 years of industry experience spanning application development, IT architecture, and network and systems' operation. She co-authored the CADD profile for ANSI NCITS 320-1998 and is a prolific author with books spanning security, cloud, and enterprise architecture.
Guest
Elijah Zupancic
F5's NGINX Chief Architect focused on the intersection of people, business and technology bringing a human-centric approach to innovation.
Producer
Tabitha R.R. Powell
Technical Thought Leadership Evangelist producing content that makes complex ideas clear and engaging.

What is Pop Goes the Stack?

Explore the evolving world of application delivery and security. Each episode will dive into technologies shaping the future of operations, analyze emerging trends, and discuss the impacts of innovations on the tech stack.

00:00:05:04 - 00:00:34:25
Lori MacVittie
Welcome back to Pop Goes to Stack, the show that dissects emerging tech with all the finesse of a root cause analysis after a 3 a.m. pager alert. I'm Lori MacVittie; let's do a postmortem. So for the last year, the industry has been obsessing over prompt injection. Every vendor demo involves clever prompts, clever filters, clever guardrails. They're all designed to stop someone from tricking the model with words.

00:00:34:27 - 00:01:01:18
Lori MacVittie
There's just one small problem. The model doesn't actually see the words. It sees tokens. Don't worry about the difference, we're going to get into that because it does actually matter a lot. Okay? But what it means is that all the shiny prompt filters everyone keeps building are basically airport security searching your luggage. The attacker is still walking straight onto the plane through the cargo hatch.

00:01:01:21 - 00:01:30:00
Lori MacVittie
So two recent papers, Adversarial Tokenization and TokenBreak--which sounds way cooler, honestly--make this painfully obvious. Attackers didn't have to beat the model, they only have to beat the tokenizer. So they skip all sorts of things and guardrails. They do terrible things to your model, which is what we're all trying to prevent with prompt injection solutions and things we call a guardrails.

00:01:30:02 - 00:01:55:13
Lori MacVittie
But let's step back a minute, because we've got a whole lot of things going on here, including what's a token? Why does it matter? Why is it different than prompts? Firewalls, guardrails, and all sorts of security. So to break it all down, we've got, of course, our co-host Joel "OpenClaw" Moses.

Joel Moses
Oh no.

Lori MacVittie
Well, you thought I was going to forget that.

00:01:55:13 - 00:01:58:24
Lori MacVittie
No, that's, you've been branded, sir, forever.

00:01:58:26 - 00:02:04:16
Joel Moses
I've been replaced by an agent. Look no further than that.

Lori MacVittie
Absolutely.

Joel Moses
Good to be here, Lori.

00:02:04:18 - 00:02:12:12
Lori MacVittie
Excellent. And we've got an architect who's really cool and knows all this stuff. Elijah, welcome.

00:02:12:15 - 00:02:14:08
Elijah Zupancic
Nice to be here. Thank you.

00:02:14:11 - 00:02:22:18
Lori MacVittie
Wonderful. Well, who wants to start and break it down? Joel's read the papers. You've read the papers.

Joel Moses
Absolutely.

Lori MacVittie
You know, somebody break it down.

00:02:22:18 - 00:02:35:05
Joel Moses
I think it's most useful to describe what the difference is between, or what the relationship is between, the prompt and the tokens that the model actually executes because it doesn't execute on the text, does it Elijah?

00:02:35:07 - 00:03:02:20
Elijah Zupancic
No, no. And the paper uses the example of tokenizing the word penguin in two separate ways. Right, it starts with you can tokenize with a "P" and then "enguin" as one variation or "Peng" and then "uin" as another token, as two separate tokens. So you get one word, but two tokens tokenized in two different ways.

00:03:02:22 - 00:03:23:16
Joel Moses
Right. And when you send that text string along and the tokenizer grabs a hold of it, it tends to want to follow one particular path as guided by the fully assembled model. But that doesn't mean that that is the only path it could follow. That's kind of the implication here, that one word can have two different paths.

00:03:23:18 - 00:03:31:06
Joel Moses
And that's what's causing this particular type of difficulty with the models. Right?

00:03:31:09 - 00:03:57:08
Elijah Zupancic
Yeah. Yeah. It's really at the model level. And the attack is, you know, you're simply just re tokenizing requests and you're not changing the underlying natural language that you're feeding into the tokenizer. It's the same natural language you're feeding in, but you've changed how they essentially have segmented the text you're feeding in the model.

00:03:57:10 - 00:04:22:24
Lori MacVittie
Some of this, it helps I think, I mean you're talking and I'm hearing things like digital search trees and graphs. Like when you start, right, if you've ever written a B-tree index for a database, right, and you're looking for certain words things like that, how it falls out, like "p" goes to "pa" and "pe." So if the token is actually, right, "pe" it's going to go this way, "p" it might like start following a different tree.

00:04:23:02 - 00:04:35:16
Lori MacVittie
And that's really what you're saying is happening inside the model is how those things get broken up matters a lot for the paths that it follows and then ultimately, right, that's kind of how you get context. It's how

00:04:35:16 - 00:04:42:22
Elijah Zupancic
Exactly.

Lori MacVittie
Yeah.

Elijah Zupancic
Yeah, it's a way to outsmart the pre-processing layer

Joel Moses
Yeah.

Elijah Zupancic
of the model essentially.

Lori MacVittie
Right, okay.

00:04:42:25 - 00:05:07:24
Joel Moses
So essentially the difficulty with it in terms of security relies on the fact that effectively these systems are by nature sort of split brain. And I think, before we were talking and I said that it's two different brains, one is at the textual level and one is at the execution level. And then you pointed out quite rightly, there's actually three brains. There's also state.

00:05:07:26 - 00:05:22:28
Joel Moses
And so between state, between text, and between token, you've created this sort of split brain where if you put a security mechanism on one, an attacker can simply target the other two brains. Right? That's the risk here.

00:05:22:28 - 00:05:51:14
Elijah Zupancic
It really is reminiscent of us trying to like, you know, use the lobotomy metaphor. But then now it's somehow split into three states where the mind has certain what, inputs that can filter across them, but they can't reason independently of them. Right. And so what happens is at this pre-processing layer, you're able to use tokens and have it slip through.

00:05:51:17 - 00:06:15:05
Elijah Zupancic
And they're preserving their semantic meeting to the underlying model because you've just divided the word up in different new ways. You've just bypassed a internal check in that model's execution, and you got the meaning to the underlying model. And now it's like, oh, okay, I can do whatever you tell me to.

00:06:15:08 - 00:06:40:09
Lori MacVittie
That's scary.

Joel Moses
That's really interesting. So essentially, you know, if you're looking at guardrails simply by as a package of, for example, fancy regexes on inputted content to the prompt, then you're missing an entire couple of layers. So, describe for us, I guess, how the other layers might be defended.

00:06:40:11 - 00:07:13:25
Elijah Zupancic
So I, you know, generally my approach in thinking about this is in depth and it's layering constraints. And like the very first thing is I just go back to basics: limit the number of acceptable inputs. If you can limit even the languages you allow a model to accept as inputs, as a pre-filter stage, you vastly reduce your search space for detecting attacks.

00:07:13:27 - 00:07:51:28
Elijah Zupancic
So, like you're saying we accept three languages English, Hindi, and I don't know Japanese, no other languages. Now, we've tremendously reduced our search space. Tokens, ideally you should just never let anyone provide raw tokens to your model, if possible. If you can remove that whole variation, you've further protected yourself. I think for most apps, that's not a concern. Right?

00:07:51:28 - 00:08:03:28
Elijah Zupancic
You're doing the tokenization yourself.

Joel Moses
Yeah.

Elijah Zupancic
But expanding upon this, you then provide you do more and more layered approaches. Go ahead Joel.

00:08:04:00 - 00:08:27:06
Joel Moses
Yeah, I can see a few elements here that I think are areas of what I would call future expansion for security. One is inspecting token streams themselves. So you're already probably inspecting some of the prompt inputs. And you're absolutely right, you need to define the acceptable boundaries of which you will accept language surfaces and reduce the overall search space.

00:08:27:06 - 00:08:59:20
Joel Moses
That's definitely important. I think validating token sequences is another thing that could possibly be done. Where if you're looking for common inputs, you want to make sure that they follow the canonical sequences and not the non-canonical sequences of tokens. It implies that we also have to have some sort of policy enforcement at the tokenizer boundary, which I know there's been lots of pushback about because it was definitely going to sacrifice performance in order to do that.

00:08:59:22 - 00:09:22:07
Joel Moses
And the other thing that might be of utility is once you know that you have certain trustworthy token flows, cryptographically sign them so that you can quickly validate them. Those seem like possible solutions to this problem. What do you think?

00:09:22:09 - 00:10:00:09
Elijah Zupancic
I mean, we're discussing two problems in two papers. This is the first one

Joel Moses
Yep.

Elijah Zupancic
at the model level. I mean, really what it comes down to in my mind, if you have the model file and you're providing such raw access to it, all bets are off. You can do anything you want with the model. It's more of a social contract in how model providers are providing guardrails built in to keep, you know, those with less expertise from jailbreaking.

Joel Moses
Right.

00:10:00:11 - 00:10:31:12
Elijah Zupancic
Now, once you're talking about running models as a back end service and protecting them, it's a different class of problems in my mind. This becomes less of an issue because you can protect your token streams. And I like the cryptographic signing. So that means if somebody is hijacking one part of your system, and they start giving tokens to agents, you can reject them because you have another layer of security.

00:10:31:14 - 00:10:33:19
Elijah Zupancic
That makes a lot of sense.

Joel Moses
Yeah.

00:10:33:21 - 00:10:54:29
Lori MacVittie
That's always been a good solution. And I like that you're pulling apart the two papers cause like, my attention goes to the TokenBreak because that's, one it's way easier to understand I think for most people what it's doing because there are analogies to previous, you know, technologies. You mentioned in our pre discussion,

00:10:54:29 - 00:11:09:29
Lori MacVittie
right, HTTP smuggling. Right. You're really talking about shoving code into an existing stream to smuggle it into the model and force it to execute. And that's what they're doing with tokens. Right?

00:11:10:01 - 00:11:21:19
Elijah Zupancic
Essentially, yeah.

Joel Moses
Yeah.

Elijah Zupancic
It's making and you don't need, this type of attack you can also do post or pre tokenization as well.

Joel Moses
Yeah. So

Elijah Zupancic
Go ahead Joel.

00:11:21:21 - 00:11:42:09
Joel Moses
Yeah, let me just, just so everyone is very aware. The two papers we're talking about, the Adversarial Tokenization it does work, but it works only against open source models for which you have the ability to investigate and debug and you have access to something called logits, which for closed source foundational models

00:11:42:09 - 00:12:13:18
Joel Moses
you don't have access to that. Those are things that are used to improve the model and they don't allow access to that necessarily. But for open source models, it works really well. TokenBreak, the second paper, is an example of how to use a tokenization attack without necessarily having access to the debug infrastructure underneath, and it requires you send a whole lot of prompts through the model set in order to classify the responses that come back and figure out what the tokenization path might look like.

00:12:13:20 - 00:12:19:01
Joel Moses
Now, the bar for that attack is significantly higher, though, isn't it?

00:12:19:04 - 00:12:50:19
Elijah Zupancic
Yeah. Yeah, and it also requires a guardrails layer external to the model that works at the token level. Some guardrails work, you have to give them natural language prompts and they do not take tokens as an input, specifically to get around this type, or to avoid this type, of exploit. Those still suffer from the same type of differential parsing attack.

00:12:50:21 - 00:12:56:17
Elijah Zupancic
It's the same structural thing. It's just not working in tokens.

Joel Moses
Right.

00:12:56:20 - 00:13:20:12
Joel Moses
So practical advice, I guess right now, of defending against these particular types of attacks is, first of all, if you are leveraging open source models, you know, make sure that no one has access to the logits. Definitely. But if you're trying to defend against a token break attack, you're going to see indicators of this

00:13:20:12 - 00:13:31:07
Joel Moses
in what way? Lots of different prompts, things that that look like they are trying to manipulate things like white space. That would be something that you would look for?

00:13:31:09 - 00:14:01:14
Elijah Zupancic
Yeah. And predictable permutations of words. This is where I go back to language detection as a first line of defense. For example, every single word you prefix with the Z, smart models can figure this out what you mean. But a lot of guardrails layers intentionally are designed to not use so much processing power.

00:14:01:16 - 00:14:24:29
Elijah Zupancic
So they're not as sophisticated as the underlying model. So this will just pass through the guardrails layer just fine, go to the back end model, the model can make sense of it but the guardrails layer can't. And you can defend about this generally with language detection because you'll be like, "I don't know this language, this is just totally weird."

00:14:25:01 - 00:14:28:22
Elijah Zupancic
This is in English, for example. And then it's kicked back.

00:14:28:22 - 00:14:55:25
Lori MacVittie
This almost sounds like, you know, brute force, right, credential stuffing. But it's brute force prompt stuffing at this point. So there are there are other ways, right, behavioral ways that you can also identify, right, someone just like continually shoving prompts at a system.

Joel Moses
Sure.

Lori MacVittie
It could be an anxious, you know, person just continuing to ask questions but more likely an attack of some sort.

00:14:55:25 - 00:14:57:29
Lori MacVittie
So there's other

00:14:58:01 - 00:14:58:16
Joel Moses
Well, I mean

00:14:58:18 - 00:14:59:20
Lori MacVittie
mechanisms to detect it.

00:14:59:24 - 00:15:18:28
Joel Moses
we've talked about this before and Elijah was exactly right. The only defense here is a layered defense.

Lori MacVittie
Yes. Yeah.

Joel Moses
And we're going to have to keep on adding layers to that defense as we go along. So one of the layers of the defense would be language detection, another one might be anti-bot and anti-automation at the user agent level.

00:15:19:00 - 00:15:47:24
Joel Moses
To ensure that what the model is actually receiving through prompts isn't being automated or you're slowing automation down to the point where these attacks are simply not practical, both computationally or economically. But again, these are basics. These are security basics that nearly all applications should have, not just the AI driven ones. Layered defenses are going to get a lot more interesting from here on out, aren't they Elijah?

00:15:47:27 - 00:16:11:28
Elijah Zupancic
Oh yeah. Oh yeah. I mean, the layers, I don't know if we're prepared for the number of layers that are coming. One thought I had too about this at the guardrails layer is if you're using more sophisticated guardrails that are actually based on AI models as well, rather than just neural nets, this becomes less of an issue.

00:16:12:00 - 00:16:42:13
Elijah Zupancic
They're more capable of, say, detecting when someone's sending Pig Latin or has altered tokens in some way as the input. And there's less differential between them and the back end model. There still is, but it is much harder to get an attack through that will both bypass the guardrail and the guardrails of the back end model.

00:16:42:16 - 00:17:09:08
Lori MacVittie
Well and the layered approach is also necessary in order to spread out the latency from each of these different, right, decision points, where it's trying to decide, is this valid or not? Do I let it through or not? There's already if you're working with, you know, any of the AIs, there is a noticeable delay when your response hits a guardrail where it's trying to figure out what am I supposed to do?

00:17:09:08 - 00:17:33:20
Lori MacVittie
And it just it churns and you can feel it. It's the tell, right, that something is going on. And I think spreading it out, a layered defense it sounds complex. Like oh we have to have five different, yes, you do. But the reason for that is also to spread out the latency and the cost, because you want to stop things like brute force right at the edge before it ever gets inside and costs more to do other things.

00:17:33:27 - 00:17:50:06
Lori MacVittie
And you layer it intelligently, strategically, in order to minimize both latency and cost to you and to compute. And you get a better system that is better well defended, I would say at least.

00:17:50:08 - 00:17:53:08
Lori MacVittie
Yeah?

Joel Moses
Yeah, I think

Elijah Zupancic
Well, and

Joel Moses
I think that's right.

00:17:53:11 - 00:18:18:09
Elijah Zupancic
And you touch on something that's a hard part of AI security and that is supporting streaming. Right. And a lot of streaming is just somewhat of an illusion for user experience. So it feels to the user like they're seeing updates. Right, things are actually coming in chunks and then it's being rendered like someone's typing it in.

00:18:18:10 - 00:18:50:17
Elijah Zupancic
So you get to see it move across a screen somewhat like a fancy progress bar. Now, actually though, working in chunks is a lot more difficult when you're using a guardrail external to the model. And so partly of what you're perceiving there is, you're not able to smooth over those user experience issues as well when you have more strict security constraints in place.

00:18:50:20 - 00:18:51:23
Joel Moses
Interesting.

00:18:51:26 - 00:19:13:07
Lori MacVittie
Good. Well, and that's okay, right, to a certain extent. Because we see them as thinking we're a little more generous with, right, how much time it can take. Then we're like, "oh, it's thinking, it's okay." Right. We tend to be more gracious and not as impatient, which is good. But we're running up against time.

00:19:13:12 - 00:19:23:25
Lori MacVittie
So I wanted to move into kind of, right, what should someone take away from this? Other than AI?security is far from done. What to do?

00:19:23:29 - 00:19:45:00
Joel Moses
Yeah. My takeaway from this is: token aware security, we're reaching a future where that is no longer optional. Where you're going to have to not only consider the text, but you're going to have to consider the tokens themselves and the streams of tokens that the text represents. You know, we thought that prompt injection was purely about language, language analysis.

00:19:45:02 - 00:20:09:03
Joel Moses
And it turns out it's not. It's about representation execution. The model doesn't execute sentences, the model executes token streams. And if your security is, if there is not a layer of your security that doesn't, it isn't able to see the stream, then you're already downstream of a potential attack. And so the future is going to include layered defenses at the token level.

00:20:09:05 - 00:20:13:23
Lori MacVittie
Absolutely. Elijah, what would you want people to take away from all this information?

00:20:13:25 - 00:20:51:11
Elijah Zupancic
Well, what is old is new again. You have to protect your inputs. The terrifying thing about models is really they're a open ended execution black box. You can pretty much entice them to create any sort of output with enough effort. And you have to reason about them from a security standpoint in that manner. That try as you might to bound, to enforce inputs,

00:20:51:13 - 00:21:03:12
Elijah Zupancic
you may get arbitrary outputs. And you need to put constraints in to govern those outputs.

00:21:03:15 - 00:21:04:12
Joel Moses
Sounds good.

00:21:04:15 - 00:21:19:00
Lori MacVittie
Wow. Okay. So sanitize your inputs. Which you should have been doing already, but you know, hey, we're not judging. Sanitize your inputs and sanitize your outputs. Guard your outputs? Right,

Elijah Zupancic
The same.

Lori MacVittie
in and out.

00:21:19:01 - 00:21:20:04
Joel Moses
Validate.

00:21:20:06 - 00:21:45:14
Lori MacVittie
Validate. There we go. That's the word. Sanitize in, validate out. And there's going to be layers, lots and lots of layers. And it's important. Right, we're not done. This notion of AI security being solved already. Not even close. We're just getting started. So keep paying attention. Keep evolving. Keep adapting. You know, and keep aware,

00:21:45:14 - 00:22:00:16
Lori MacVittie
I guess. So that is a wrap for Pop Goes the Stack. Follow this show, so you're on call for our next forensic adventure. Until then, sleep with one eye on the logs and the other on the subscribe button.