Explore the evolving world of application delivery and security. Each episode will dive into technologies shaping the future of operations, analyze emerging trends, and discuss the impacts of innovations on the tech stack.
Lori MacVittie (00:08.285)
Welcome back to Pop Goes the Stack, the podcast that treats hype cycles like load tests. Push them hard enough and something is always gonna crack. I'm Lori MacVittie. Let's find out what cracks today. Alright, we have, we got an awesome one today. We, you had a comment on that?
Joel Moses (00:29.353)
Oh, no, it's gonna be me that cracks, Lori, you know that.
Lori MacVittie (00:30.717)
Okay, all right. Well we have Joel here, so this ought to be this oughta be good because today we're gonna talk about poisoning the well, which is not theoretical. We've posited, you know, bad data in bad bad results. And it actually turns out that yeah, that happens.
Because some researchers decided that they were going to create a fake disease. What did they call it here? Bixonimini. It is
Dmitry Kit (01:03.298)
Mania.
Lori MacVittie
Mania. Yeah, thank you, Dmitry. He read the article. Right. So they create this and then they actually create a whole bunch of fake papers, authors, right? They do the whole thing. They get all the supporting evidence and then they basically let it go to see what's gonna happen.
Well, what happened is within weeks, major AI started picking it up, people start citing it, and suddenly they think it's real. Right. So, you know, the thought here is like you can't patch knowledge, right? You have this model, suddenly it has bad information. So, you know, what does rollback mean? Retraining? Hope you get it all, you found it all?
If it's external, maybe you can remove it or replace it, but only if you knew where it came from and how it spread. Right, this is pretty obvious. Well, they created this fake name, we can find the related information, but some stuff could be more subtle. So the question we wanted to talk to Dmitry about today is, you know, how do you deal with it if you find out that somebody poisoned your well, your model, either intentionally or accidentally?
And, maybe even better, how do you set up safeguards to make sure that it doesn't happen? Right? What can you do to say, let's protect against this happening because it's really expensive to fix it? So with that, let's dig in
Joel Moses (02:36.967)
Sure.
Lori MacVittie
and try to answer the question.
Lori MacVittie (02:41.957)
No one wants to
Joel Moses (02:42.097)
So what's
Lori MacVittie
answer? We have no, we have no answers. Thanks everyone.
Joel Moses
Well, I'll give it a start. Dmitry, what's at the root of this pro-, I mean, other than garbage in, garbage out, which is a constant presence in our lives and computing all the way through to AI days, it's not just as simple as excising the erroneous information from the model, right?
It creates a fingerprint, for lack of a better word, a kind of an invisible behavioral watermark that isn't obvious when it responds in single sentences, but over millions of examples you can detect that bad information was fed into the system, right?
Dmitry Kit (03:16.206)
Yeah, it's interesting because it's an old problem. AI, machine learning specifically, is data driven. So in traditional machine learning, which is where I got my start, you curated the data sets quite carefully and you assumed that some of the bad data could come out as noise.
And so the models would kind of learn the general idea about what the data represented versus the noise or biases that would push towards bad outcomes. Well, large language, not large language, but neural networks, deep neural networks came onto the scene and there was this promise that, well, you just need a lot of data.
You no longer need to do feature engineering or care about what your data sets are. And even with the early experimentation with computer vision, convolutional neural networks, when you made them generate random images from within its internal representations, you got a lot of images of dogs, cats and eyes.
So, that's not really our everyday experiences necessarily. But the thing is the things we take photos of tend to be of people's faces or our pets. And so already you're starting to see that maybe your data sets are not spanning the entire experience that we want it to do.
Now with large language models, we have the internet. And you would think, okay, well, that's great. We'll just throw it onto the internet and it will learn all those great things that the internet has to offer. But we all know that there are bad places on the internet that we probably do not want to learn from.
Google, actually this article also mentions this, but Google has spent a lot of time before neural networks and their search results to curate your experience. It identified sources that are potentially more trustworthy than other sources, Wikipedia being one of them. So sometimes the top results would have just been a summary of the articles on Wikipedia.
But what would happen if somebody wrote a bad article on Wikipedia? It's kind of an open platform and anybody can add articles there.
Joel Moses (05:31.033)
Sure.
Dmitry Kit
So like
Lori MacVittie (05:32.245)
I think that's like...
Dmitry Kit
the problem exists.
Dmitry Kit (05:38.67)
But what we're counting on is self-curation, right? So even when we collect data for our large language models, we tend to give more weight to some data sets than others. So we don't just go randomly on the internet, but we go to communities that do have some kind of moderation in place where we can more or less trust more their outputs.
However, when we start trusting those sources more, like the more we start trusting them, the more weight they have on how these large language models respond. And this is an incredibly good example of this, where a couple of articles published on a trustworthy source dominated when people ask about that particular condition of their, skin condition of their eye.
Joel Moses (06:41.145)
I see. Well that's interesting. It implies, I suppose, that although you may be able to remove that bad data, the weights behind the data still remain. Correct?
Dmitry Kit (06:54.318)
It is and it's kind of hard to even understand if this is poisoning or not because science is, like it's okay to publish, unknowingly publish bad results
Joel Moses (07:05.201)
Sure.
Dmitry Kit
because science corrects itself by reproducing those results, right? That's where the checks and balances are. But it takes a while for those checks to come in. So while that result is out there, these large language models are treating them as truth because it was published in those trustworthy sources.
Dmitry Kit (07:24.254)
So, this research uncovers a bunch of really a lot of complexity about what do we trust.
Joel Moses (07:31.259)
Mm-hmm.
Dmitry Kit
What sources can we trust, and to what degree?
Lori MacVittie (07:34.961)
None. We should trust nothing. I mean that's, isn't that the X Files like trust no one, trust nothing, like just...
Joel Moses (07:41.457)
Sure.
Lori MacVittie
But because right as you point out, right, there's a level of trust you assign to different, you know, different publications, different sites, different sources, even different experts, right, you may assign different levels of trust. And you know, if we don't if we don't know those, like how do I know?
Lori MacVittie (08:04.411)
Like what does Google's model or open AI's model, what levels of trust do they assign to different things? We don't know. So that's not transparent to me, right, when I'm asking a question. So you always have to you're always, you're a skeptic. Like nothing you, it is the X files out there. I don't, I don't trust you because I don't know.
Joel Moses (08:25.871)
Interesting.
Dmitry Kit (08:26.742)
It's a good point. And again, in the article, it was mentioned and this was in other sources as well, the answer of these large AI companies is not to trust them. I mean, that they have disclaimers literally
Lori MacVittie (08:38.365)
Yeah, don't trust us.
Dmitry Kit
below their responses
Joel Moses (08:40.709)
Yeah, yeah.
Dmitry Kit
is don't trust us. Because of the scale that they operate, they can't curate their sources
Lori MacVittie (08:47.196)
Yeah. Mm-hmm.
Dmitry Kit
to that degree.
Joel Moses (08:48.23)
Yeah.
Dmitry Kit
And if you're just trying to choose a restaurant, and maybe the restaurant is closed, or didn't exist when you get there,
Dmitry Kit (08:54.538)
maybe that's fine. But if you're looking for medical, you know, medical diagnoses, then you probably should talk to a doctor.
Joel Moses (09:02.543)
Yeah. Dmitry, it seems to me, just by reading some of the leaked system level prompts that some of these frame, these foundational AI models have disclosed, either erroneously or on purpose, that some of the system prompts that steer content analysis or steer some of the subject matters away from being discussed openly seem to acknowledge that there are problems with the information the models are built from, right?
Dmitry Kit (09:32.974)
I'm glad you brought this up. So the situation isn't hopeless.
Lori MacVittie
Ah.
Dmitry Kit
So when these problems arise, so yes, some of that information has been baked in and there are problems related to that we can go into a little bit later. But for any particular application, like latest ChatGPT or whatever, you have opportunities to steer the model to a correct answer on those topics. And so, ChatGPT is not just the model and the weights.
It's actually a huge ecosystem that's built on top of it to give you the experience that we all have come to expect. And so through system prompts, you can basically single out certain concepts and say, you know what, like we know that this is a problematic thing that people are now pointing out in the news and we do not like to be in the news for this kind of stuff. So let's say different things based on that.
Joel Moses (10:37.499)
I see. So, but it tends to imply though that they know that there are some problems with the information that they've built the model from. But it's interesting, you can kind of course correct it by correcting it in the system prompt, but it doesn't matter that the weights have changed, right? The weights are still gonna remember the old and bad information. And that's, I'm assuming, where some of the hallucination problems come in.
Dmitry Kit (11:02.368)
Yeah, so training is, for the sizes of models that these companies deal with, training is super expensive. Reinforcement with human feedback is, reinforcement learning with human feedback comes close to that. Basically, you're trying to reduce the probabilities that those weights come into play. But you're not removing it. You're not excising that information.
You're just saying like try to stay away from those topics. People can still guide the large language model to respond the way they do. But if you can't do it for every single example of that that comes in the news because it just becomes too prohibitive. So most of the initial steering, I would guess, is that would be through steering the system prompt.
Lori MacVittie (11:53.533)
Well, so, and right, I mean, that's, we don't, I guess I would say most enterprises aren't using like the real the big models for everything, right? A lot of the stuff that they're going to be doing, they're going to use either open source models that they've downloaded or more local models, small language models, perhaps. But they could still have the same problem.
So if they were trying to build this out, what do you do? Do you roll back? You know, is it now, oh, go back to the last version? How do you deal with that if you find out that the model you've built to help run your expense report processes and approval has bad data because it says Lori can buy anything and you're like, No, no, she can't. How do you fix that? Do you go back? What do you, what do you do?
Dmitry Kit (12:45.506)
I mean, if you can go back, sure. But this can be tens of thousands, if not a million dollars to retrain a model.
Joel Moses (12:50.109)
Yeah.
Dmitry Kity
So even one of these smaller models, sometimes you remove the information that obviously caused the bias in the first place. And then you add more information to overcome it, to basically say, this was shown to be false and like don't do that.
Dmitry Kit (13:15.262)
And you try to maybe correct it through like reinforcement learning through human feedback or just by fine tuning, moving forward. To all of your points, that information stays there. And it's so
Lori MacVittie
Okay.
Dmitry Kit
distributed throughout that network that you don't know what the unintentional consequences are
Joel Moses (13:35.279)
Mm.
Dmitry Kit
of removing that information. Because large language models, they have behaviors that
Dmitry Kit (13:43.918)
were not modeled into it, right? Like they just come out of the massive amounts of information that they learned. So they're making connection between phrases and tokens, but like phrases and words, and removing some of that can actually have catastrophic
Joel Moses (14:03.26)
Huh.
Dmitry Kit
effects on it. So if you had a process that kind of worked, but now it works a little bit worse, but some of the stuff works a little bit better,
Dmitry Kit (14:13.452)
by going to the previous version, you will lose both of those, you'll lose the benefit as well as the
Lori MacVittie (14:18.897)
Right. Right.
Joel Moses (14:19.365)
Wow.
Dmitry Kit
thing. So excising information from these networks is incredibly expensive.
Joel Moses (14:27.643)
Wow.
Dmitry Kit
I would also guess that some of these big players, they have secondary mechanisms for filtering and detecting when it's entering topics that they don't want to discuss, the guardrails, in which case they steered the response away from that,
Joel Moses (14:42.735)
It's interesting.
Dmitry Kit (14:43.382)
from those responses.
Joel Moses
Dmitry, the way that you're describing how this unfolds, excuse the term, is it makes it sound less like a software bug and more like the action of a prion. So like in mad cow disease, for example, it's a malformed pattern that's introduced into a system between neurons that introduces errors in folding.
Joel Moses (15:08.335)
And it causes creeping corruption over time. That's kind of how it sounds to me. Like it's not a bug that it's working like this. It's actually something that is bad information that other bad information can build around. And the process is not as simple as cutting something out.
It's actually the entire system becomes corrupt and the only way to get it back to good is to work around it through introducing other paths, which is fascinating and difficult.
Lori MacVittie (15:41.284)
Well, if one plus one is not two, that unravels a significant, no really, my oldest son wanted to me to not teach the youngest math because he wanted to introduce bad concepts to see what happened. And I said, No, you can't experiment on your brother like that. But the idea is real.
If one plus one is not two, like we believe, then the entire system of mathematics eventually falls apart. Some things would still be right or kinda right, but eventually, like everything we know just kind of falls apart. It sounds like what Joel is describing. Like it starts out, it's just one thing and then it's a couple, and pretty soon the entire system is corrupt. And,
Joel Moses (16:27.141)
Yeah. So,
Lori MacVittie
you know, so...
Joel Moses (16:27.141)
in order to correct for this behavior, you have other approaches like teacher behavioral teaching systems,
Lori MacVittie
Yeah.
Joel Moses
you know, where you're trying to get the correct response and so you're incorporating human feedback into it to course correct for the bad information that it's been given over time. But I mean, could future AI attacks not involve poisoning the data directly, but poisoning the behavior of these teacher systems? I mean, is that the thing that we're gonna see next?
Dmitry Kit (16:57.486)
Yes. Ha ha ha.
Joel Moses (16:58.629)
Ha ha ha.
Lori MacVittie
Ha ha ha. Yeah.
Dmitry Kity
If that's, you know, the goal. But like jailbreaking, right, and all these things, they already exist.
Dmitry Kit (17:08.824)
So LLMs know a lot about large topics, but they know very little about very specialized topics that very few people are talking about. And so you can always find a place in the LLM's knowledge space where a little bit of misinformation can go a long way because it's overwhelming the little information that's already there.
And so, and like, we don't know where that is because the large language model doesn't know when it's hallucinating.
Lori MacVittie (17:42.331)
Yeah.
Dmitry Kit
It's telling you everything as confidently as it would, regardless of whether it's in this low probability space or high probability space. And given the scales at which these large language models have been trained on, I mean, we can't identify that. And we can't identify why it chose not to represent some of these spaces as
Dmitry Kit (18:05.802)
efficiently as others, mostly because probably they're just not enough to give it that much weight. So even if we say, you know, so we have actually two problems. One is the problem of trusted sources, which is if you publish a thing that looks very much trustworthy, then it's going to get just a lot of weight, period, because we've made that decision in the first place.
And then there's also injecting information on topics that are very rarely discussed or very, very specific. And even though the source isn't as trustworthy, the LLM just has nothing else to talk about, right? So like it will pick it up.
Lori MacVittie (18:49.149)
It's, we're, I have a feeling we could talk about this for another hour at least, but since we're running up at time, it sounds like okay, you can't excise it. So if it gets in, that's it. All right. So maybe the better way to look at it and takeaway for listeners is so what can you do to minimize that risk of it getting poisoned in the first place?
So what kind of things can you put in place or do to minimize the risk?
Dmitry Kit (19:27.734)
It was actually mentioned in the article as well. And really it comes down to curating the data set. And large language models, they're great because they can solve all sorts of different problems, a single model. And that's wonderful, you know, it's great flexibility.
But when you start focusing on a specific application that you have or a specific problem you want to solve, oftentimes you do want tostart putting constraints and biases into the model that focuses on the thing you want to solve.
So in the article, they were mentioning that OpenAI is releasing some Chat health, ChatGPT health application. Very likely that one, the data set is incredibly like curated by humans and made sure to be correct. And no new data is added to that that they did not approve of.
And same thing for organizations that want a chatbot to talk about their particular products or respond with information from documentation where you basically say you limit its responses to the information that you're providing to it and you curated that information to the best effort.
How do you protect that information from being poisoned? It's LLMs all the way down. You would probably employ more LLMs to judge the thing you're adding and making sure that the thing you're adding is consistent with
Joel Moses (20:56.871)
Mm-hmm.
Dmitry Kit
everything else that you've had. So, you know, we went down a very dark path in this conversation. But there are solutions. The large players are taking them into account.
Joel Moses (21:09.905)
Mm-hmm.
Dmitry Kit
Obviously, anybody, any practitioner who wants to release something to production is also utilizing a lot of guardrails.
Joel Moses (21:18.791)
Yeah, I walk away from this discussion thinking the same thing. The importance of curation is not just for the constituent parts that make up an LLM or a foundational LLM. But I also think it also applies to the use of data through RAG resource pulls as well. Curation of the data that you use to affect the model weighting and temperature is also important.
And so you you probably need to look at the the sources of data that are being pulled into an enterprise RAG scenario, making sure that they're not out of date, making sure that they contain the the latest information, making sure that they contain accurate information is still something that you are duty bound to do because it is impacting the way the model is processing.
So there definitely is something that an enterprise can take away from this topic. Recognize that it is there and that it isn't a characteristic and that you do have to take steps to make sure that the data that you are applying is curated. I think Dmitry's point about making sure that it responds only in a narrow topic area, kind of applying a little bit of focus in order to get it to ignore elements where it may go off into data that it doesn't know about.
I think that that is also true. We've seen plenty of chatbots out there that are supposed to be telling you about burritos and are instead writing Python scripts.
Lori MacVittie (22:43.678)
Mm-hmm.
Joel Moses
That's just a matter of the model not being told to pay attention to what it needs to pay attention to. And you can defend yourself against this that way as well.
Lori MacVittie (22:53.938)
Yeah, I like those, right? I mean, effectively we're saying the same thing as, you know, we've been hearing out of the zero trust world, right? Assume breach, assume bad data at some point, and then work to prevent it as much as possible to mitigate externalized sources that change a lot.
Be careful with what kind of data you're feeding your models and just pay attention to the guardrails and system prompts you're using in order to tailor the experience and kind of constrain it as well because these things go off the rails all the time. It's in my feed every day.
Some system lost its cookies somewhere. So I'm gonna say though, that's a wrap for this episode of Pop Goes the Stack. If your backlog just got longer listening to this episode, you're not alone. Subscribe so you get the next one though.