Practical AI

As generative AI moves into production, traditional guardrails and input/output filters can prove too slow, too expensive, and/or too limited. In this episode, Alizishaan Khatri of Wrynx joins Daniel and Chris to explore a fundamentally different approach to AI safety and interpretability. They unpack the limits of today’s black-box defenses, the role of interpretability, and how model-native, runtime signals can enable safer AI systems.

Featuring:

Alizishaan Khatri – LinkedIn
Chris Benson – Website, LinkedIn, Bluesky, GitHub, X
Daniel Whitenack – Website, GitHub, X

Upcoming Events:

Creators and Guests

Host

Chris Benson

Cohost @ Practical AI Podcast • AI / Autonomy Research Engineer @ Lockheed Martin

Host

Daniel Whitenack

Guest

Alizishaan Khatri

What is Practical AI?

Making artificial intelligence practical, productive & accessible to everyone. Practical AI is a show in which technology professionals, business people, students, enthusiasts, and expert guests engage in lively discussions about Artificial Intelligence and related topics (Machine Learning, Deep Learning, Neural Networks, GANs, MLOps, AIOps, LLMs & more).

The focus is on productive implementations and real-world scenarios that are accessible to everyone. If you want to keep up with the latest advances in AI, while keeping one foot in the real world, then this is the show for you!

Narrator: 00:04

Welcome to the Practical AI podcast, where we break down the real world applications of artificial intelligence and how it's shaping the way we live, work, and create. Our goal is to help make AI technology practical, productive, and accessible to everyone. Whether you're a developer, business leader, or just curious about the tech behind the buzz, you're in the right place. Be sure to connect with us on LinkedIn, X, or Blue Sky to stay up to date with episode drops, behind the scenes content, and AI insights. You can learn more at practicalai.fm.

Narrator: 00:36

Now onto the show.

Daniel: 00:48

Welcome to another episode of the Practical AI Pod cast. This is Daniel Witenack. I am CEO at Prediction Guard, and I'm joined as always by my cohost, Chris Benson, who is a principal AI research engineer at Lockheed Martin. How are doing, Chris?

Chris: 01:03

Hey. Doing great today, Daniel. How's it going?

Daniel: 01:05

It's going well. Done a little bit of snow shoveling on the ground. As we speak, we're kind of headed into, winter break or or the holiday Christmas season here in the in The US, and I think this this episode will be released in the new year. So if you're listening to this, this you're listening in the future. To be honest, I'm really excited about talking about the future because our guest today is really thinking very innovatively about how we can secure our AI models and and have safety as we move into that future.

Daniel: 01:39

Really excited to welcome to the show today, Ali Katri, who is founder of Wrynx. Welcome, Ali.

Alizishaan: 01:45

Thanks. Thanks for having me, Daniel.

Daniel: 01:47

Yeah. Yeah. We met earlier this fall. Really fascinated by your line of work and technological innovation at Wrynx. But I also know that you've been thinking about these topics around AI safety, guardrails, disallowed content, etcetera, for quite some time.

Daniel: 02:09

Could you give us a little bit of a background of how you got into these topics and what you've done in the past?

Alizishaan: 02:14

Yeah. So I have been in the machine learning for AI safety or anti abuse use cases in general for the past eight or so years of my career. I spent about three years at Meta where I built infrastructure that serves about half of the world's population. Basically any safety anytime you type a message on Facebook, it goes through tens of safety checks, which are powered by thousands of models. I built the infra that these models run on.

Alizishaan: 02:48

And then I moved on to roadblocks where I built AI powered, like where I built systems to protect about $3,000,000,000 in payments against fraud. So I've been in this space. I've been using AI models to sort of protect against abuse. And during this time, I realized that the models that I'm using themselves are susceptible to abuse. So that's what led me to Fanning Rings.

Daniel: 03:14

And I know that now you're thinking about those actual models. So I often tell people also on the AI security side, there's kind of AI for security and then there's security for AI. It sounds like there's something similar kind of in the, I guess, model or safety anti abuse space. Could you give us a little bit of an understanding of like when we're talking about the safety or security of AI models, Could you kind of define that for us? Like what that, what do you have in mind as the kind of bad case scenarios or worst case scenarios of what a model could do, why it's not, why it might not be secure or safe?

Alizishaan: 03:59

Yeah, so thanks for making that distinction between AI for security and security for AI. And they're two very different things, right? Like AI for security basically means using AI to solve existing security channel challenges in a more effective or a better way. Right? So that's a very different and a very linearly separable body of work from security for AI, which is which focuses on making the AI models themselves and like AI based use cases secure.

Alizishaan: 04:32

As models have entered the tech stack, they also bring in a bunch of security challenges. And that is what security for AI solves. Now, in terms of the safety aspects that we talked about, models are today will generate anything. We'll literally tell like these generative models, like if it's a text model, it'll generate any form of vile content known to man. Could tell you, like OpenAI got into trouble for, like where a teen was asked to commit suicide, right?

Alizishaan: 05:01

Like was or was encouraged to commit suicide. So you have self harm, you have different other categories of harm where you can generate pornographic content, you can generate other forms of inappropriate content like violence, gore. And sometimes it doesn't even have to be inappropriate because safety is very context specific, right? Like as a law firm, safety looks very different for you than what it does for a medical shop versus, say, a customer service setting versus, like, a cogeneration environment. So each one of these use cases come in with a set of permissible and non permissible behaviors.

Alizishaan: 05:36

And that's kind of what safety really is, that making sure that the technology works in ways that you intend it to and minimize the unintended outcomes of it.

Daniel: 05:47

And I guess people are addressing I mean, these these are known issues in the sense that at least a segment of the people that are working on these models know about these issues. Could you give us a little bit of a sense of like, as we speak, how these, at least in a production sense, like what's capable now, like how does the landscape look in terms of capabilities to defend against or align with certain policies or allowed content, disallowed content? What are our choices right now? If I'm looking at the availability currently of both kind of open source projects and what's in maybe closed platforms, what's available to me to deal with this issue?

Alizishaan: 06:42

Okay. So before I answer that, I'll start with an analogy which will help make the rest of the response much more clear and add some context. So imagine you are in a, like, a story like a built like, a giant apartment building with over, like, say, a thousand condos. Right? Now let's say your neighbor for some for some reason or no reason at all really decides to pull out a golf club and starts violently assaulting you with it.

Alizishaan: 07:13

Now for a situation like this, will the good guys be able to protect you just by checking IDs at the gate?

Chris: 07:19

Obviously not.

Alizishaan: 07:20

That's kind of where we are.

Chris: 07:21

You're past that point.

Alizishaan: 07:23

So that's kind of where we are with AI safety today. That's kind of how jailbreaks work. Right? So here, the giant thousand story building is the model. So today what we are able to do is just today's solutions analyze what's going into the model, also known as the prompt and analyze what's coming out of the model, which is the response.

Alizishaan: 07:42

But by then the damage has already been done. So now if you're using video generative models, you can put in a text prompt, get a video output. Your video output is gonna be like it's too expensive to analyze number one and the video has already been generated. So if you already spent a large amount of compute generating that bad content, right? Again, if you're talking with audio models, you can trick audio models into generating bad content.

Alizishaan: 08:09

Like a seemingly innocuous looking prompt can be tricked today using multitude of techniques to generate really malicious output. So now, unless you have visibility into what's going on inside of the model, you're not gonna be able to catch a lot of these things. That's where jailbreaks come from. That's where adversarial machine learning comes from. If you look at it through the context of predictive models versus generative models, it's essentially the same core phenomenon where operating these models as black boxes and we have no idea of what's going on inside of them.

Alizishaan: 08:43

So we're trying to change that.

Chris: 08:46

Looking at that, it seems like in the way that you just kind of phrased it with the context there, it seems like a very intractable problem. As a follow-up to Dan's question, like, how should I as a new user potentially or someone coming into a use case in my company, where a model is desirable, but I'm worried about whatever whatever bad or abuse means in the context that I'm operating in? How should I start off thinking about that? Like, like, what's my starting point? Because I, I gotta say, you know, coming into it, I'm not even sure where to start.

Chris: 09:21

So like, can you level set that a little bit in terms of how you know, what's what's square one?

Alizishaan: 09:26

Yeah. So that's a good question. So normally, what you do is the way I like to think about it is there's a general category of bad stuff that no one really wants or the law doesn't allow and things like that. So there's like a general category of stuff like that, right? Like here, I'm including porn, hate speech, yada yada yada, which take a general category of undesirables that no one wants on the platform, like child safety, for example, that's non negotiable no matter what context you're in.

Alizishaan: 09:53

Now, there's also another aspect of categories, which is about context specific safety. So now if you're in a banking use case, you gotta think about, okay, laundering. You might not have to think about money laundering, say in a code generation setting, for example. So you wanna think about these very specific categories of risk or issues that come from your use case. And people tend to usually have a very good idea of that.

Alizishaan: 10:19

Like, if you understand your use case well, which most people do, right? Which is why they're exploring models and they're trying to solve some problem within their use case. So within that use case, you also would understand the problems that you're facing. And that's another category of risks that you wanna think about. Then once you've thought broadly about these two, then you've got to figure out about Once you've identified both of these categories, then you wanna think about mitigations, detections and things like that.

Daniel: 10:47

And I guess there's like, if we're just kind of defining some terms for people that they might have heard of, this approach that you talked about related to kind of guarding the gate to the apartment complex, right? Or the inputs or outputs, managing the prompts and the outputs. Is that what people refer to as like guardrails, safeguarding? What are some of the terminology that's being used? And then I know that you have a different way of thinking about this.

Daniel: 11:21

I guess just setting the stage jargon wise, how do we define these kind of guarding the gate things? And then as we filter into actual safety within the apartment complex or within the model, what kind of terminology and I guess like there's a body of research building up to what you're doing. What terms are used to describe that? And as people wanted to research that, what would they look for?

Alizishaan: 11:48

Yeah, so guardrails essentially are a catch all term. They can refer to prompt and response filters. Now today, one of the ways of doing There are multiple guardrail type solutions out there. There are guard models out there. Meta has one, IBM has one, Google has one, OpenAI has one.

Alizishaan: 12:09

I'm talking about public releases. Internally, most people have their own, But these essentially are prompt and response filters. So they look at the data going in, they look at the data coming out. Now, so that is like one thing that guardrails are used to refer to. Less commonly, guardrails are also used to refer to like static checks where you just look at the output of the model and say something like, okay, the word forbidden word, let's say the F word, right?

Alizishaan: 12:35

The F word appeared in output. This is not permissible. So that's a simple regex filter that you can use. So that would also be called a guardrail. In terms of looking at the internal state of the models, there's a whole field of research, area of research that's developing.

Alizishaan: 12:53

It's called interpretability. There's a subset of that called mechanistic interpretability, where they try to figure out what subcomponent of the model led to this particular output and try to change it at the source. So try to alter or modify behavior while it's happening as opposed to after or before.

Sponsor: 13:29

Well, friends, here's your hot take of the day. Your team's AI tools, they might be making collaboration messier, not faster. You probably know this. You feel this. Think about it.

Sponsor: 13:40

You've got AI literally everywhere now summarizing, generating, suggesting. But if there's no structure, no shared context, you're just creating more noise, more outputs, more stuff to wade through the gap between having a great idea and actually shipping that idea. That gap isn't a speed problem. It's a clarity problem. Well, is where Miro comes in.

Sponsor: 14:02

And honestly, it shifted how I think about team workspaces. Miro's innovation workspace is not about brain dumping everything into an infinite canvas and just hoping for the best. It's about giving your work context, intentional structure, so your team knows what to focus on and where to find what they need without playing detective across 12 different tools clicking and moving and tabbing. Just just too too messy. And the AI piece, Miro AI actually gets this right.

Sponsor: 14:33

They've got these things called AI sidekicks that think like specific roles, product leaders, agile coaches, product marketers, reviewing your materials and recommending where to double down or where to clarify. You can even build custom sidekicks that are tailored to your team's exact workflow if you desire. And then there's Miro Insights. It sorts through sticky notes, research docs, random ideas in different formats and synthesizes them into structured summaries and product briefs. And Miro prototypes, they let you generate and iterate on concepts directly from your board.

Sponsor: 15:08

Test 20 variations before you ever touch your HiFi design tools, saving you time, giving you ideas and getting it right. This whole thing is built around the idea that teamwork that normally takes weeks can get done in days, not by going faster, but by eliminating the noise and the chaos. So help your teams get great done with Miro. Check it out at miro.com to find out how. That's miro.com.

Sponsor: 15:38

Again, miro.com.

Daniel: 15:44

So Ali, you were just getting into these ideas of interpretability, mechanistic interpretability, I think you called it. We've talked about interpretability on the show before. I think mostly in relation to like trying to figure out why a model made a certain decision in relation maybe to certain concerns around bias or other things. So like if I have a risk model for approving insurance or something like that, then I might need to have some interpretability around that. Or maybe in the case of healthcare, there's a burden for interpretability of like how decisions were made.

Daniel: 16:28

Here, it sounds like kind of the interpretability is being applied to, I guess, the Is a good way to put it, why the model generated some problematic output or is there a better way to think about that?

Alizishaan: 16:44

So there's multiple overlapping aspects here, right? Interpretability is like an umbrella research area. It's an umbrella term. So what you alluded to earlier is more of, it's also described as explainability. So why was this credit card denied?

Alizishaan: 17:02

So you're trying to explain it in human concepts. That is a part of interpretability, no doubt. It's a subset of it. There's another subset, which is how was this generated? So for example, if your generative model outputs, say, like if you say, how are you, hi.

Alizishaan: 17:20

And the model says, how are you? You wanna care about like, how was this generated internally? How were these tokens produced? And the reason you care about that is you wanna know if instead of responding with how are you, it could respond with howdy or with something else, right? So you wanna know what caused those differences and you wanna be able to control that.

Alizishaan: 17:43

So that aspect is also interpretability. Now where this interplays with safety is when you have these prompts which look bad, which look good to a human but result in bad outputs, which is how jailbreaks work. When you analyze how the data flows inside of this black box, you're able to control it and stop it at the source. So think of this as, continuing that analogy from earlier on, think of this as cameras at every gate or every path. So you know that, okay, well, this is what's happening in this hallway and we gotta stop it.

Alizishaan: 18:18

We gotta put an end to it. So it's a very different class of defenses.

Chris: 18:23

As you're saying this, you're you're actually talking about manipulating the internals of the model and the flows that are there, kind of the cameras on the doors and stuff like that, and making it maybe a gray box rather than a black box to some degree, as opposed to kind of the more traditional guardrail approach where you have programmatic, you know, use use the word guardrails around the models, inputs and outputs to try to to try to handle things that way. So it's kind of a whole different thing about about instead of treating the model as a black box, you're saying you're diving into it and trying to affect an improvement there?

Alizishaan: 18:59

Yeah. So intervening is one like stopping generation or modifying that is one form of intervention. Right? Intervention does not have to necessarily be in that form. It can take various other forms.

Alizishaan: 19:11

So once you understand today, we have no idea what's going on inside of the model. We provide an input to the model, get an output. We have no idea what's going on. So this, what we're trying to build or what interpretability tries to do is understand what's happening inside. Now you could control it or you could use that to make a risk quantification and use that downstream.

Alizishaan: 19:34

You don't have to do anything in the moment necessarily. It can be leveraged downstream. So now it's basically like you have a whole new set of information or a whole new set, like a whole new class of data points that you can leverage in creative ways downstream. This is something that's not available today and this is what interpretability builds really at runtime.

Daniel: 19:55

And am I correct? So part of my assumption in the past, and I'm fascinated by this whole subject, is like some small changes in the, for example, the weights or individual layers or individual pieces of the model can produce very large changes in the output behavior of the model, which it's kind of, I'm blanking on this. What is the thing? It's like a butterfly flaps its wings and

Chris: 20:29

Oh, the butterfly effect.

Daniel: 20:30

Yeah, the butterfly effect whatever. Like these, because you can make a change whether that's quantization or other things to the model and that may produce unclear and sometimes catastrophic changes in the behavior of the output. And so if I'm understanding what you're saying right, Ali, it's one way you could try to use the information about how the model is producing certain outputs is to intervene by actually making a modification or preventing something in the model. But that could produce other changes that you may not want, I'm assuming. But you could also instrument the model to understand potentially when it is kind of firing those certain neurons or lighting up in a certain way that is indicative of problematic behavior.

Daniel: 21:24

Am I understanding that right in terms of like various ways of, I guess, intervening instrumenting? I don't know if I'm using the right terms.

Alizishaan: 21:34

We are instrumenting. Like the way we're approaching this is we are trying to understand what happens inside of a model at runtime. Models are a monolith, right? But we're breaking it down into different spaces or subspaces. And we look at the subspaces that get activated during bad generation.

Alizishaan: 21:54

So now when you're, let's say, generating non permitted content versus permitted content, different subregions of the model get triggered. So we're building visibility into that and we're trying to identify them at runtime. So now there are some subregions that you wouldn't care about. For example, if you take a general purpose LLM, it's trained on everything ranging from Python code to fifteenth century Chinese poetry. Now, when you're using it in a customer service setting, you care about neither one of them.

Alizishaan: 22:27

And if they're if those subregions are get of a model are getting activated, then you wanna be able to, like, arrest it while it's happening. So this is similar to, like, if you go back to the analogy that I had, like, made about the apartment building, You want to have visibility at all times into what's going on at each level, right? So you find out that a bad thing is going to happen way before it actually happens. Like for example, if someone's going to conduct a bank robbery, people just don't get up and conduct a bank robbery, right? There's some searching going on.

Alizishaan: 22:57

There's cycles of planning going on. There's purchases of firearms or whatever going on. So now if you stop them at these bad activities at different levels, the police don't have to deal with the shootout situation in a bank at the very end. So similarly, defense works in-depth, right? And we're building a whole new layer of safety that hasn't been tapped into just yet.

Daniel: 23:20

Yeah. That's fascinating. And I'm guessing certain folks are probably wondering, like I am out of curiosity, like this might be the first time that they're hearing about such an approach to this kind of problem. And they might be thinking, you know, how is this possible? Like how, so one thing is I think the general concept makes sense, right?

Daniel: 23:44

Like instrumenting the interior of the apartment complex, understanding what's happening, kind of retrieving that intelligence for you to make decisions or determine if you want to mitigate something. Are the, I guess, have there been multiple attempts to try this sort of thing? And from your perspective, in terms of how you all are approaching it, what is kind of needed, I guess, from the customer side to create this sort of instrumentation? So I could imagine, like in one scenario, like I could train, I could tell the customer, well, we're not going to train on Chinese poetry or code. We're just going to train a whole new model.

Daniel: 24:31

And that burden on the customer is very, very heavy, right? And on another scenario, could say, oh, take this model off the shelf and do this to it. And then, you know, that's a less burdensome. So there's probably a spectrum of here. Could you help us understand like the, the burden and what might be required to get to this instrumentation?

Alizishaan: 24:55

Yeah, so the way we're approaching this today is we do not modify the model. We do not even build models nor require the customer to build a model. We take an off the shelf model and we build a safety module that sits on top of it. So essentially for the customer, it's like a very low friction approach where they take a model that they use and love like LAMA or Granite or any Mistral or any of the open weight models for image generation. You have WAN and you have a host of other host of Chinese models for the audio video setting.

Alizishaan: 25:28

So any model that you love, we sort of make it more secure and tailor it for your context. So now, again, remember, if you're like a law firm, right? An off the shelf Lama or off the shelf Mistral is not gonna have the protection that you have. Like if you're like say a shoe company, let's say you're Adidas or Nike, right? You as a user want to talk about Nike but not talk about Adidas, right?

Alizishaan: 25:54

You can't expect the model maker to put that in for you because the model maker is trying to sell to everybody. So we help build that customization and we do that without changing your primary model. Like your primary model will continue to be as it is. If you make any modifications to it, fine tuning or anything like that, that's on you. You control that.

Alizishaan: 26:14

We don't require you to do it. But even if you do that, we can still support you.

Chris: 26:19

Could could you so totally recognizing that there's proprietary stuff that you're not gonna dive into and and and respect that. Could you talk a little bit about, just kinda clarifying as we were kinda talking earlier about kind of the buttressing with guardrails on the external side versus the going into the model? And as you're talking about adding a component, like, so in my confusion, it seems a little bit like it's on the outside. Can you talk a little bit about what you mean by that without diving into places you can't go?

Alizishaan: 26:49

Yeah. So I'll I'll give you a very I'll try to address that as much as possible without going into the specific

Chris: 26:56

Fair enough.

Alizishaan: 26:57

Specific details. So today, what you have, right, you have these filters which analyze the inputs and the outputs. Now, analyzing the economics here is all messed up. Like if you were to analyze video or audio, right? Those tend to be very expensive computationally.

Alizishaan: 27:14

Those models are very expensive. So now if your inference itself costs X and if you're expecting someone to pay another X to analyze it, number one, it's slow. Number two, it's like paying someone a thousand dollars to guard a $100 bill. You're just not going to do that. Right?

Alizishaan: 27:31

So what people end up doing is people end up shipping unsafe models. So today we have tested models from different audio companies. We've tested models from different video companies, different image generation models, each and every one of them with little to no trickery, which means that average user can just go there and ask them to generate bad stuff and they will do it. There are little to no guards there. And that's because the economics does not make sense, the way thing is today.

Alizishaan: 27:58

So what we've built is a scientific like, we've sort of had a research breakthrough where we can build safety about a thousand times cheaper. So just to give you some concrete numbers, what we've done with LAMA, we've taken a LAMA model, which is like an 8,000,000,000 parameter off the shelf LAMA model. And today, you had to protect it, you would have to use LAMAGUARD three, which is an 8,000,000,000 parameter model. And assuming it generates 10 tokens, that is you're running about 80,000,000,000 parameters of inference at runtime. Now, you do that on your prompt and response boat, that number balloons to 160,000,000,000 parameters of inference.

Alizishaan: 28:35

That is two extra GPUs or one extra GPU depending on how you've wired the set up the deployed the models. So now what we are essentially doing is we're analyzing the internal states of the primary model as it makes the prediction. So in doing so we don't need any of those two extra GPUs And that 160,000,000,000 parameters of inference that I counted, we have succeeded in bringing it down to 20 mil with an M. So we're essentially a rounding error today because of this expensive safety profile. You cannot even deploy them on edge devices.

Alizishaan: 29:10

Like on edge devices guardrails are non existent because you can barely when people are working on the edge they work really hard. A lot of people work really really hard to squeeze that one device through techniques like quantization, that one model onto the limited memory of the device. So you have no room to deploy a safety model. So that's why we've built tech which literally is like a rounding error. 20,000,000 parameters on 8,000,000,000 is nothing.

Alizishaan: 29:37

We can deliver comprehensive safety. Our safety performance is comparable to a standalone guard model. It is significantly faster because our latency just doesn't exist. It's paralyzed and it becomes in practice, the latency of the primary model is the latency that the user sees. With today, you have to sort of account for the latency of the primary model and you also have to account for the latency of the response filter and the prompt filter.

Alizishaan: 30:04

Also the response filter cannot kick in until the primary model is finished generating. So you're looking at like very high latencies. So from the perspective of the end user, you're looking at a lot of added friction in terms of slow speed. You're looking at increased costs because ultimately the cost will be passed on to the user. So you're paying for two extra GPUs that you don't have to.

Alizishaan: 30:26

And your quality will be substandard. Again, remember, all these models are able to do is check IDs at the gate. So that is the protection you're getting.

Daniel: 30:35

So Ali, it's very fascinating and encouraging, the results that you're seeing and what you're able to do with this sort of approach. I'm also wondering like the size or latency or behavior that you just described. That is certainly a huge component of what people are thinking about and why they can't use guardrails in certain cases. Another question that might come up though, and I actually think that you can validate me, but I think you have a very good answer to this is what about the kind of accuracy or reliability of kind of one approach or the other? So a person I guess could argue and say, well, if I have a guard at the gate and he's 100% accurate and making sure no gun ever gets into the building, then there'll never be a shooting and that's a very robust guardrail or something like that.

Daniel: 31:38

But I think, I think what you're saying is you still wouldn't know what happens in inside the building with a 100% certainty. So could you address that side of things like the accuracy side or the quality side, I guess, of the performance of guarding and safety with this kind of instrumented model approach versus kind of an exterior guardrail?

Alizishaan: 32:03

Yeah. So an exterior guardrail, as you pointed out, is limited in visibility. Right? So even if they do a 100% great job of checking someone's IDs, they only have limited information. There's only so much you can do.

Alizishaan: 32:17

Right? What you're the kind of defenses you're expecting is out of the scope. Like, you're limited by information there. Like, if you if you don't know how to drive a car, you're just not gonna be able to do it. Right?

Alizishaan: 32:31

No matter how fit you get, no matter how much you train, you are not gonna become a race car driver if you cannot drive a car. Similarly, like in this security setting, like if the example that I use where somebody decided to pick up a golf club for some reason or no reason at all, you weren't checking for like golf clubs are permitted items to bring into a home. So the security have done their job right. They haven't done anything wrong, but there's a fundamental limitation that exists here. So you can only do so much looking at artifacts.

Alizishaan: 33:07

So there's a whole layer of safety that is untapped or unaddressed or inaccessible because of scientific limitations. But that's changing fast and we're sort of at the leading edge of it.

Daniel: 33:21

And just to make sure that I have it right, so would it be a good way to describe it that let's say that I have, or I want to prevent toxicity or something coming out of the model of a certain type, right? There are a variety of inputs to the model that could result in that type of output, but I'm never going to know all of them, right? Or there's always an edge case, right? And so by instrumenting and saying this part of the model lights up when toxicity is being produced, then I no longer have to worry that I have all of the possible inputs in the world put together that might trigger toxicity. I just know when there's toxicity.

Daniel: 34:14

Is is that is that a appropriate

Alizishaan: 34:17

Yeah.

Daniel: 34:17

Way to put it?

Alizishaan: 34:18

Yeah. So, when you're when you're on the defense, when you're playing defense, right, you don't have like, when you're defending models against abuse in any scenario, whether you're defending models or just protecting a platform in a classical trust and safety sense, you're never gonna have an exhaustive list of the million ways in which things go wrong. But you can gauge you can develop a fair understanding through past examples or through data points that you have. That's the benefit of machine learning, right? But with the way jailbreaks work or with the way adversarial examples work is that they do certain things inside of a model that are not possible to predict in a different model, which is being used as a guard.

Alizishaan: 34:58

So the guard is model type A, the primary model that you're predicting is model type B. So the guard is not going to be able to predict what's happening in model type B simply because they don't have visibility into what's happening inside the model. So without, like, without information, you're not gonna be able to do anything. Right? Like, for example, if you're the SEC and somebody takes away your access to bank accounts, you're not gonna be able to prevent money laundering no matter how many books you've written on that subject.

Alizishaan: 35:25

There's only so much you can do with guns and badges. Right? You you would need visibility. If you wanna try preventing money laundering, you will need visibility into the financial system. So that's kind of what we're building here.

Alizishaan: 35:36

In terms of accuracy, the numbers speak for themselves. We're able to match and beat the performance of standalone guard models over a thousand times our size. And that's because we are exploring this unique insight.

Chris: 35:51

Yeah, it sounds a lot like what is an analogy that in like neuroscience would be if I'm if I'm can pronounce it right in electrocellophilogram, an eg see where they monitor the synaptic connections, and they can see it lighting up, you know, since I can't pronounce the word, I'll just describe it to the best of my ability. It's a long word. I actually had it written down. I was like, Oh, crap, I still can't pronounce it here. So so it sounds like that is your thinking, know, about like how you can apply that, are there any ways that that can tie back also into the hybrid approaches to see If we talk about some of the other more traditional forms of guardrails that are out there, can you combine them into sort of a hybrid approach where you do have, different types of guardrails, that that are in place, but people can add this kind of capability from you into it and thus enhance their overall security model?

Chris: 36:50

Like, what does that What would that world look like in your view if that's valid?

Alizishaan: 36:55

Yeah. I'm a firm believer in defense and depth. So one product does not miraculously solve everything, just like with our human society. If you think about law enforcement, it's a very good parallel where for secure national security, you need army to protect you. Need different forms of the military to protect you from external threats.

Alizishaan: 37:15

You need the border police to make sure that entrance is regulated. But at the same time, you also need state and local law enforcement. You also need federal civilian law enforcement. So you need different levels of that to make sure that security on the whole is ensured. Similarly, in the context of AI models, yes, you have guardrails which look at prompts and responses.

Alizishaan: 37:37

They are valuable, right? But then there's ways to do them efficiently. And then you also oftentimes would need to combine, say, system level features with model level features. So we build those model level features. So you could do something complex like, let's say you're running some sort of customer service bot and a customer has a history of rebots, And then in that, detect some sort of misrepresentation or some sort of lying.

Alizishaan: 38:04

So you can say, okay, if lying score You can compose rules like saying block if lying score greater than 0.8 and this customer has refunded more than $1,000 worth of merch. So there's potential to combine mix and match as well. And I think that's how you improve the overall safety profile of any system. It's always in-depth and those layers have to work together. I can just look at web applications in parallel, right?

Alizishaan: 38:32

You need static code analyzers and you need AI firewall. They don't replace each other. If anything, they complement each other and build an overall robust system.

Daniel: 38:43

And before we leave the subject and maybe look towards the future a little bit, I do want to highlight, I think one of the fascinating things that comes out of the work that you're doing, Ali, some of the, I guess customization that's possible in the sense that we've talked a lot about kind of the quote, traditional, types of things that you would want to prevent, whether that's jailbreaking or toxicity, but there's also like you were saying in every industry and actually at every company, there might be custom types of policies that they want to enforce or certain things that they want to instrument. Does I guess how extendable is this approach to those sorts of situations?

Alizishaan: 39:34

Yeah, so this is very extendable to those. I mean, our approach is designed for these situations that we've identified this niche in the market with models, you're shipping a one size fits all solution. So to put it in a different way, the model needs of most companies are similar ish but the safety needs are dramatically different. So you cannot have a one size fits all safety stack that works for everybody. Yes, sure.

Alizishaan: 40:06

There's a general category of undesirables that everybody would want to keep off their platform but that's a very small subset. With generative models which are capable of doing so much, you need to be able to customize safety as well. So that's the space we thrive in, and that's what we're building for.

Chris: 40:25

Well, that's very interesting to me. The I I've learned a lot today as you as you're thinking near Dan kinda telegraph, we like to kind of finish up by asking about the future. And as you're thinking about the future, both in terms of kind of your specific approach that that you're doing, but also, you know, kind of like where where security is going for models in general in the large. What, like, what kind of what kind of guideposts do you have that you're thinking about? You know, I I like to say, the end of the day when, know, you're taking a shower or you're lying in bed at night about to go to sleep and your mind's just kinda going loosely, you know, where do you see things going and what are you really passionate about, pursuing or exploring, going forward and in time, know, what's that aspirational, hey, I'd like to go do that, that you have in mind?

Alizishaan: 41:19

So I think when you look at safety, when you look at like runtime safety, there's different aspects. So there's build time safety, which is a whole different class of safety products. But in terms of runtime safety today, runtime safety only exists at the data layer, is at the prompt and response layer. The model layer is missing. My aspiration is to sort of build that out.

Alizishaan: 41:41

Has been my vision. That is the guiding vision behind Rings, where we wanna build model native safety. And I see that will have to exist for models to be adopted into different settings. Like if you're in healthcare, for instance, today, it's very hard to use a public LLM, right? Because of data concerns.

Alizishaan: 42:00

No one in their right minds today would fine tune a LLM on PII data. That's because you just can't. It's just not possible. So it's locking a lot of people out of the ecosystem. And that's the problem we exist to solve.

Alizishaan: 42:16

I want to become My vision is to sort of build that de facto model safety layer for no matter what model you're using. I wanna build model, become the go to for model safety.

Daniel: 42:29

That's awesome. Well, I definitely encourage people check out the show notes. And I should have said this at the beginning, but, just to make sure people know, Wrynx, W R Y X. N X. N X.

Daniel: 42:43

Sorry. I even missed that. W R Y N X. So check out Wrynx. We'll have the link in the show notes and, yeah, just really fascinating work.

Daniel: 42:53

And, from the community, Ali, just thank you for for digging into this topic and and bringing a fresh look at things. It's awesome. And hope to see you back on the show to find out where where things have advanced.

Alizishaan: 43:06

Yeah. Thank you for having me on the show, guys.

Daniel: 43:08

We'll talk to you soon.

Chris: 43:09

Thanks.

Alizishaan: 43:10

Bye.

Narrator: 43:18

Alright. That's our show for this week. If you haven't checked out our website, head to practicalai.fm, and be sure to connect with us on LinkedIn, X, or Blue Sky. You'll see us posting insights related to the latest AI developments, and we would love for you to join the conversation. Thanks to our partner, PredictionGuard, for providing operational support for the show.

Narrator: 43:38

Check them out at predictionguard.com. Also, thanks to Breakmaster Cylinder for the beats and to you for listening. That's all for now, but you'll hear from us again next week.

More episodes

Chapters

Creators and Guests

What is Practical AI?