Practical AI

AI is moving fast from research to real-world deployment, and when things go wrong, the consequences are no longer hypothetical. In this episode, Sean McGregor, co-founder of the AI Verification & Evaluation Research Institute and also the founder of the AI Incident Database, joins Chris and Dan to discuss AI safety, verification, evaluation, and auditing. They explore why benchmarks often fall short, what red-teaming at DEF CON reveals about machine learning risks, and how organizations can better assess and manage AI systems in practice.

Featuring:

Sean McGregor– LinkedIn
Chris Benson – Website, LinkedIn, Bluesky, GitHub, X
Daniel Whitenack – Website, GitHub, X

Links:

Upcoming Events:

Creators and Guests

Host

Chris Benson

Cohost @ Practical AI Podcast • AI / Autonomy Research Engineer @ Lockheed Martin

Host

Daniel Whitenack

Guest

Sean McGregor

What is Practical AI?

Making artificial intelligence practical, productive & accessible to everyone. Practical AI is a show in which technology professionals, business people, students, enthusiasts, and expert guests engage in lively discussions about Artificial Intelligence and related topics (Machine Learning, Deep Learning, Neural Networks, GANs, MLOps, AIOps, LLMs & more).

The focus is on productive implementations and real-world scenarios that are accessible to everyone. If you want to keep up with the latest advances in AI, while keeping one foot in the real world, then this is the show for you!

Narrator: 00:04

Welcome to the Practical AI podcast, where we break down the real world applications of artificial intelligence and how it's shaping the way we live, work, and create. Our goal is to help make AI technology practical, productive, and accessible to everyone. Whether you're a developer, business leader, or just curious about the tech behind the buzz, you're in the right place. Be sure to connect with us on LinkedIn, X, or Blue Sky to stay up to date with episode drops, behind the scenes content, and AI insights. You can learn more at practicalai.fm.

Narrator: 00:36

Now onto the show.

Daniel: 00:48

Welcome to another episode of the Practical AI Pod cast. This is Daniel Witenack. I am CEO at Prediction Guard, and I am joined as always by my cohost, Chris Benson, who is a principal AI research engineer at Lockheed Martin. How are doing, Chris?

Chris: 01:05

Hey. Doing great today, Daniel. How's it going?

Daniel: 01:07

It's going really good. Lots of fun things in the news to follow and lots of fun things to work on. By the way, for our listeners who might need a reminder about this, we we are doing some, webinars recently on on a variety of topics, some of which are maybe even related to some of the things we're talking about today around security or or safety. So if you if you wanna find out about those, go to practicalai.fm/webinars. That's where we have some of those things listed out.

Daniel: 01:38

But I'm really excited today because, have an amazing set of things to talk about with Sean MacGregor, who is co founder and lead research engineer at the AI Verification and Evaluation Research Institute and also the founder of the AI Incident Database. How are you doing, Sean? Thanks for joining us.

Sean: 01:58

I'm doing well. Thanks for having me here.

Daniel: 02:00

Yeah, yeah, of course. And this is interesting kind of to think about AI incidents, verification, evaluation. How did you find your way into a day to day where you're thinking about and documenting and studying AI incidents among other things. How did I get to work on what you might call impractical AI or? Yeah.

Daniel: 02:25

Well, guess where practical AI turns into problematic AI, let's say. Yes.

Sean: 02:34

Yeah. And and really, practical AIs is the AI that has consequences and matters in the world. And those are the ones who actually care to look into where it goes wrong. So the kind of really quickie professional tour of things is I'm about a 2017 vintage PhD in in machine learning. I I focused on reinforcement learning as applied to wildfire suppression policy.

Sean: 02:57

So fire starts in a forest. What do you do about it? And how does that impact the development of the the land and the values we we get from it over the course of a hundred years. Really in in that setting, I I had a very strong sense of the power of the technology that we were developing, but also the brittleness of it and just how difficult it was to even know whether the system was doing what you wanted to do, particularly in a reinforcement learning system. And I just never wanted to wander through a forest and have a sense of, you know, this is a charred wasteland because of the forester took my simulation and didn't really realize it was research code and that he should actually apply his own human expertise about what is good fire and bad fire in that.

Sean: 03:40

So I went from that, then I worked on energy efficient neural network processors, which that was a great effort. The organization called Sentient that shipped when I was there millions and has continued to ship many edge neural network processors. And that gave me a really strong impression of the kind of power and brilliance once again inside these consumer electronic devices, hearing aids and the like. And so I left that. I started a company dedicated to the test and evaluation of machine learning systems.

Sean: 04:18

When that happened or when I started that, the kind of LLM explosion happened and that's the kind of Cambrian explosion of sorts of practical AI technologies that I'm sure we were all still feeling. And we sold the assets associated with that to a old safety organization called Underwriters Laboratories and spent some time there doing safety stuff before leaving and starting up the Avery, the AI Verification Evaluation Research Institute. And during this whole time, I I had a project that just kind of, you know, got away from me because it filled a niche. It's served a need. And that's the the Einsen database that you really need to have systems that collect and produce usable datasets motivating safety practice.

Sean: 05:16

And you see this in aviation. A plane crashes. You record what happens and use that to make sure effectively in the aviation industry you have a form of regression test. You don't want that past crash to happen again. You also have similar things in food safety.

Sean: 05:30

You have medical adverse event reporting. These are this is the fundamental primitive that you need to make sure you have in safety is a bad thing happens. You make sure it doesn't happen again. And so we've in that project collected more than 5,000, human annotated reports of AI incidents. Those are collected across, more than a thousand, discrete incident records at this point.

Sean: 05:54

And we're we formed a lot of the training data of, like, what is an incident? And we've been interacting with a lot of intergovernmental organizations and GRC community governance, risk and, compliance community companies and, you know, like, trying to motivate what is how to motivate safety and, like, why very often that can actually be a business imperative because a lot of the incidents that we have in the database, you could actually look at the stock price before and after, and there's an impact.

Chris: 06:23

Could you talk a little bit more, kind of expand on what the nature of safety means in the context of AI? And you because coming in as a as a neophyte on this, I I don't know if there's kind of a standard definition. It seems to me there isn't, at least not that I'm aware of. So how do you define it? You mentioned kind of defining an incident.

Chris: 06:43

Could you define what you think an incident is and how these different concepts relate together in a way that those of us who are not part of that kind of world, that specific world can kind of adopt the ideas and and understand what you're talking about, kind of give us a background on it.

Sean: 07:00

So so this is a three or a four hour podcast. Right?

Daniel: 07:03

It's like, that's like the question, used to always we would have the question like, is something AI or machine learning? And like, eventually you just learn that those words are completely meaningless in terms of the way people apply them. So Yeah. We just need to treat them

Sean: 07:20

as like a pointer and, you know, like the memory might be allocated differently with, or the pointer can point to different memory or maybe the memory swaps. Exactly. So like there's a lot of safety terms out there. And like I've actually had a poster for this at one point because this this comes up so often. You could have incident, accident, adverse event.

Sean: 07:42

You have vulnerability, exposure, harm event, controversy, issue. Like each of these have like slightly different nuance to them, but you can kind of bring it back to the concept of you don't want a bad thing to happen and that you don't want that bad thing to produce a harm. You don't want someone to say like, I've I've been impacted or some organization been impacted. And so the the intergovernmental definition of incident that developed was effectively an event that a harm has taken place. That's an incident.

Sean: 08:18

Your mileage may vary in different contexts and like different communities. There are different terms that resonate. But the reason that we went with incident to begin with is basically it's sufficiently vague while still meaning something because you can't say accident. Some of these have intention. You can't say exposure or compromise or the ones that are in the computer security world because some of them don't have intention.

Sean: 08:46

And so incident covers them all and, you know, I could gesture and give you some Venn diagrams, but, this is a this is a podcast. So we'll we'll avoid that. I'm wondering, like

Daniel: 08:56

in particular, when you narrow it down to AI incident, obviously sort of AI, depending on your definition of of AI kind of has existed for some time. And so I guess, like, AI incidents have been happening for some time in the sense that like you could have a computer vision model detecting maybe it's, problems and things coming off of a manufacturing line. It misses one and that causes harm maybe downstream to someone who uses a product or something like that. How has this maybe, you talked about the shift to, of course, the kind of expansion of what, you know, how AI is impacting our life. How from your perspective has that idea, I guess, been stressed in new ways in recent years?

Sean: 09:48

We have constant debates over what we would ingest into the Einstein database. We And because we have to operate and it's a database that we have to make these decisions day in, day out, have very difficult decisions. And it's not just in our case, it's not just what should be considered an incident. It's also if we start ingesting these, are we lassoing an infinity that's just gonna like pull us in into some extreme direction? And maybe we would like to index catalog and present to the world that that infinity, but there are, these kind of like little harms that are repeated maybe millions of times a day when you have these systems that indexing all of them isn't necessarily useful.

Sean: 10:36

So we do have a little bit of a bent towards is this informing the production of a safer AI or safer world for AI. But a lot a lot of things meet the fundamental criteria of involving AI and minor harms have taken place. There there are also increasing number of high high scaled harms that take place. Like, if you make a billion people slightly more depressed, non zero number of people probably have died as a result of that. And that's where, you know, it's it's not just you're building a bridge that needs to bear weight.

Sean: 11:14

You're building a bridge that needs to bear the weight of all of humanity falling over it.

Chris: 11:19

I'm curious as you're kind of selecting things to go into the database, you know, per the criteria that you just talked about. How are you sourcing that? And especially when you consider the fact that, you know, so many AI things today are proprietary and kept secret in organizations for obvious reasons. You know, how is that sourced? And how do you how does the different sourcing affect the utility of the database and how you make those kind of selections?

Sean: 11:49

Most of what we have at the moment is journalistic reporting. The reason that we are living in that space at the moment is journalists actually put in a tremendous amount of work to validate the base facts involved in it. And that that is where the degradation of the journalistic community or at least the ability for them to to make wages has actually been, quite harmful. We also do get direct reporting at times. We do get people that that email us or submit direct to our forms.

Sean: 12:17

We get people that create blog posts and things like that and submit it to the, incident database. We do encourage people to, submit, to the incident database. The simple matter of the fact though is, the volume that that we're dealing with is one where eventually we do need to switch from this kind of voluntary reporting, to more mandatory, form of reporting. In the EU code of practice, there is a requirement for severe incidents to be reported. It hasn't reached implementation as as of yet.

Sean: 12:51

There's a lot of back and forth on on that front, but the insight you can derive from mandatory reporting is greater than that of this voluntary reporting. Like we can prove existence, we can prove it's happening. It's very difficult for us to assign a rate to incident events though, particularly as it becomes non newsworthy through time. So we do have some works that we're trying to make the most of that. This bears some resemblance to public health practices where you don't have perfect insight into disease spread, but you do have these kind of indirect measures of doing tests in the sewers and things to see if the the how much of the viral particles are there and whatnot.

Daniel: 13:33

Well, Sean, I appreciate you taking us through this kind of idea of AI incidents. Obviously, we are practical AI and in a practical way, we would very much like to prevent incidents or at least understand kind of the security implications of using certain AI models. I know with the Avery, the AI Verification and Evaluation Research Institute, in my looking at that, you're talking a lot about kind of third party auditing of AI or kind of frontier AI or foundational AI, however you wanna frame that. Do you wanna explain a little bit about kind of how that side of things might fit in the landscape and why it's important? Certainly.

Daniel: 14:19

Sean: 14:20

a fundamental problem that we have, particularly at the frontier model level with things like OpenAI, Anthropic, Google's Gemini and so forth, is it's very difficult to even know how safe your systems are because they're general purpose systems. This this basically broke the safety frame. All the safety processes that we have are built around, starting presumption of there's a specific context it's operating in, and you reason about its safety within that context. Well, if your context is just wildcard star everything, where does that leave you? Do you need to, verify across all circumstances that it's gonna be safe?

Sean: 14:58

Because the answer is gonna be no. So how do you approach the task of verifying claims? How do you encapsulate something that a customer or person or, you know, organization would rely upon and say like, okay, this has gotten this level of assurance applied to it. I know I can apply it within my college age student population teaching them how to do the maths. Getting that level of signal at this point, a real practical problem that you have is no one's gonna believe you when you say, you know, this is good for high college age math remedial level.

Sean: 15:44

Like you have to run a pilot program and see how it works before you actually can, believe the representations being made by companies because they probably haven't evaluated in your exact circumstances. They might have something that's analogous, to that, but they probably won't have it exactly in your circumstances. And this is a problem on the practical AI side of things, but it's also problem on just the increasing power of these systems and the scale in which they operate. You actually do want some really strong top level guarantees that the system is going to try and steer away from catastrophe and doesn't, actually have an active propensity for doing the bad things. The science of establishing that is actually still a work in progress.

Sean: 16:35

There's a lot of work not just in establishing evaluation, but meta evaluation basically deciding is this evaluation saying the thing that it says it says or not. And Avery as an organization is concerned with doing third party audits of basically saying, is this thing safe? Is this thing doing the thing it should be doing? And there's a premise in there that a third party is better positioned to do that than a first party. And we've all been on development teams pushed to prod, pushed to the real world, pushed the real world circumstances and found, oh, I didn't think about that.

Sean: 17:23

In the one of my favorite incidents in the Eisen database is this gentleman who got a traffic citation mailed to him. And he looked at the traffic citation and said like, you know, this isn't my car. It not only isn't a car, it is a woman who's wearing a shirt, that says knitter and there's a purse strap going across it. So it like kind of played with the letters. So, it looked like KNI9TER or something like that.

Sean: 17:57

And I don't think the people making the traffic camera were thinking about a woman walking through the field of view of the traffic camera, with like a shirt perturbed in such a way that, it would catch it and register it as a license plate. Then the traffic station went out like this. The world is hard. The real world is real hard.

Chris: 18:18

I'm curious that I like that example. It's a it's a very personal example. And I can certainly imagine that happening to me actually, bizarre as it is. But I'm curious as you as you are looking at applying this process, you know, that you're describing to maybe a larger incident, is there an incident that comes to mind in the database that where where you could kinda compare what your estimate like if relative to not having gone through third party process versus if it had, how you might envision that playing out in a different way. I'm I'm kind of trying to get a sense, and you can you can interpret it any way you want or or find whatever example.

Chris: 19:03

But trying to get a sense, like, when organizations are listening, you know, they have employees and their leadership are listening to the podcast, and they're trying to put this in the context to their own operations. Like, what is a what do you think is a kind of a an moment potentially for the listener in maybe a fortune 500, maybe not that big, you know, maybe a a mid sized company where this process will make a substantial difference to the outcomes involved.

Sean: 19:33

Probably a good way to work here is by analogy to other segments that have audit. Audit is something that no one truly enjoys. You don't necessarily want to be audited, but you actually do. You do wanna be audited if you're an organization that wants to receive investments from other organizations. Having audited financials is table stakes.

Sean: 19:57

You're not gonna put your money into an org that hasn't been audited. Or actually, should say there are famous instances in which an org has not been audited and people subsequently regretted it because the reason they weren't taught it is they were, you know, improving disbursements by Slack emojis. And, you know, that it didn't go very well in those those circumstances, for anyone. And so we we do periodically get reminders that, you know, these processes are important and that audits, solve a problem. And that's, you need to be able to trust the fundamental representations about the state of an organization.

Sean: 20:31

It's the same thing for the model because the, the model is similar in a sense that it's taking actions, it's doing things that has impacts. And so there's value to knowing when you should or should not trust information.

Daniel: 20:47

And I'm looking through some of what the Avery Institute has been involved with and in particular kind of recent work of this like meta evaluation of benchmarks, this sort of bench risk, which is actually how we got connected through a colleague Ashwarya. And I'm wondering, this is like a meta evaluation and I know a lot of people make decisions about models based on benchmarks. Could you maybe help us kind of walk through mentally what is the difference between maybe like an audit and benchmark and maybe like tying in some things that might make people think about whether benchmarks as they look at them on leaderboards and that sort of thing are really a Yeah, if they're really relevant to the real world behavior of models.

Sean: 21:46

So maybe overextending the financial metaphor on things. When you audit a financial organization, you say, okay, I see that this organization has a balance sheet indicating they have $10,000,000 in the bank. You're like, great. Now as an auditor, I need to go and check the balance with the bank and see, is this money real? Is it actually there?

Sean: 22:09

And similarly, an audit, is concerned with, I've seen the balance. I've seen the balance sheet. And now I need to check the evidence and make sure that I actually believe that representation in some form. And so where meta evaluation or evaluating the evaluations or checking the benchmarks, evaluating the benchmarks are useful is you're basically checking those receipts. And in the course of the, Ventris project, we found that a lot of the receipts were just kind of like, well, trust me, bro, like, written on a piece of paper and, that there were real substantive issues of like, okay, maybe there's like an IOU and there's some gold buried out in a field somewhere, but I I have some doubts.

Sean: 22:56

And at the very least, I need to look into this a little bit more before I rely on it for real world purposes. And the dichotomy that we identified is a lot of the benchmarks that are produced are produced for non practical purposes. They're produced for knowledge generation purposes. They're produced for research purposes where people are wanting to understand systems that are making sense of it, but it's not produced with the intent that someone's gonna then say, all right, I'm gonna deploy this in my environment. And I know now that it's unbiased because it scored well on BBQ.

Sean: 23:34

And BBQ is a great benchmark. It was really foundational in the bias benchmarking community. But it's also not produced with the intention that every time a frontier model is released, you say like, look, we've improved on bias, and it's 10 BBQ points better than the than the prior ones. And like, that's a good thing. You do want that, but it's not a practical AI purpose.

Sean: 24:01

It's not saying like, it's unbiased for my specific application because there's a distribution associated with BBQ. There's characteristics that aren't necessarily generalizable to the environment to which you're deploying it. The prompts can be in like a very particular space for BBQ that you're just not operating in. And all these things are associated with in this research, work that we put out, they're associated with failure modes that we identified. We collected a list of failure modes and then we looked at how well benchmarks had expressed mitigations against those failure modes.

Sean: 24:38

And the results were mixed, like some did better than others. But by and large, most benchmarks historically have been produced for research purposes, for practical AI purposes. And this is a problem because this is what we're working with and informationally and this is why we're starting to see a valuation ecosystem pop up and that being a discrete task and something that's supported by organizations separately.

Chris: 25:04

I'm wondering as we're kind of going through this and discussing the process, if as you're as you're looking at some of the different risk categories that you guys, you know, outline on your website and stuff, and, know, a couple of examples are misuse and unintended behavior and info sec and emergent social effects. Which of these do you think is is kind of like maybe the most underestimated? You know, the thing that people aren't really expecting that has consequence that may be outsized relative to what those of us not in this field would expect. Do you have any sense of that in terms of, like, this kind of a little bit of the surprise that people might not realize is there?

Sean: 25:47

That that's tough because I'm a little bit of the the fish and water being told it's it's wet.

Chris: 25:53

Fair enough.

Sean: 25:54

I I will say in like kind of rewinding my my own experience on this is I I I feel like I'm a pretty astute observer and I've I've been more predictive of what problems are coming around the corner than I I think most people I've encountered. But even with that, I still am regularly surprised. And I wish we could be a little bit more predictive about it. And I think we are are gonna get there and we're gonna learn from from the past as, and hopefully push that into the future. But it's this is a new thing.

Sean: 26:24

Like we don't have we don't have a like a great operating history, particularly for the general purpose AI systems. And I think that the split in this community, that you should probably think a little bit about that question in terms of expectation is there's, security and safety people. And that there is a little bit of difference between those. Security is safety. Safety is security.

Sean: 26:45

But, what you care about tends to be slightly different than what you're expecting. And security people have a great much greater capacity for the presumption of bad actors doing bad things and that producing bad outcomes. And then the safety people have a much stronger presumption of just the world is its own adversary. We're all in our own kind of game against nature and bad things will happen regardless. And, maybe in the bad actor side of things, they're a little bit more insistent upon it though, particularly when, the scalability of doing bad things is actually substantially increasing.

Sean: 27:21

So we have to solve all of these. Well, Sean, one of

Daniel: 27:24

the things I wanted to ask you about, well, in particular, I'm a big fan of the Darknet Diaries podcast and listen to that a lot and always love hearing stories of hackers and the kind of world that is there and all of what people try to do at DEF CON and other places. I was really intrigued by your your study kind of to titled to Era's AI. I see in the description you kind of took some some models to to DEF CON and I'm reading your your description, had the pleasure of taunting a room of hackers with a declaration that the model is flawless and, you know, obviously, certain things, happened after that. So I I I would love to hear, I mean, maybe how this came about or what the idea really was And then would love to, you know, hear some of that story.

Sean: 28:19

Sure. So to set the stage a little bit, DEF CON is a hacker convention conference that's been run-in Las Vegas for a good number of years. And it's, I don't know if listeners have seen the nineteen nineties movie hackers, which has like a a very early, like, Angelie Jolie and and some others in it. And it's like these young men and women rollerblading everywhere, that's just how the future would be and talking about how risk architectures would change everything. And there's this hacker aesthetic that kind of noisily was portraying of, you know, joyfully compromising systems.

Sean: 29:03

In this particular instance, having a mixture of people that are doing that with good aims and and bad aims. But, DEF CON is this community organized thing and you have villages. And one of them is the AI village that's been run for a few years now. And in that village, you have presentations, but, there's also a section of it dedicated to some sort of a challenge that's conducted over the course of the conference. And people filter through and figure out how to to break things and they're treated to cash prizes in this case.

Sean: 29:40

So this is something, called the generative Red Team two. So it it was a change from the prior year in in form. And we basically ask people, there's documentation for the open language model produced by the Allen Institute for AI. So a 7,000,000,000 parameter large language model along with its guard model put in front of it. And we said, here's the documentation.

Sean: 30:04

Here's the representations about what this model is supposed to do. And your job is to show how that's wrong. That kind of unleashed a few days of chaos and like, I don't know what else was happening in the conference, but like I was at the judging table for the entire time, with reports, flying in and needing to decide, you know, like, is this actually a violation of the documentation? It did the thing we we wrote, which we did put some Easter eggs in there of like we expected people to, to be able to to break it. But did they submit sufficient evidence of a violation of that documentation?

Sean: 30:42

And that is actually a surprisingly difficult thing. We had all these people that were very good at the compromising systems and saying like, yeah, I know how to prompt this LLM and get it to tell me how to burn down this convention center violation of what the model should do. We kept on the end to say, oh, that's great. But like you you haven't told us anything like a anecdote does not equal data in this instance. And we need you to show that it's systematically like pushing towards arson, that it's always fun to to burn things down or that you can put some sort of more universal jailbreak on it that makes it so that it under underperforms in that particular filter of things.

Sean: 31:24

And so what we basically had to bring to the security world with security world does not like this, and I don't blame them for it, is statistics. We had to say, like, it's not just about the fact that there's an exploit. It needs to be it's systematically vulnerable in some form because it's always rolling dice. That's just the the nature of it. And, that made for a wild few days, paid out a good cash purse, and learned learned quite a lot about the the collision of disciplines that we need to foster when it comes to security, safety, statistics, machine learning.

Sean: 32:04

The world world's getting more complicated.

Chris: 32:06

Prior to kind of talking about some of the chaos that ensued at that point, as you talk about that that collision of of culture, if you will, you know, with kind of kind of taking the hacker out of their thing of thing, hey, I have an exploit, you know, look at me, and requiring that level of rigor. First of all, can you describe why you required that? Why that level of rigor was important in this case? And how did that force different behaviors out of the hacker community that were applying themselves against the problem that you had presented?

Sean: 32:43

We had a few people that were unhappy initially, and then we explained it, and then they went back and came back. And then they started turning the crank and getting a lot lot of payouts. So people that really want to encamp and figure out the modality did very well. The problem and the reason why, like, anecdote doesn't equal data here is, if you say something is, you know, 99% filters out 99% of the the bad thing. If they wanted to, they could roll up to one of those stations and, you know, just keep on issuing the same query or like adding one period to to it dot dot dot dot.

Sean: 33:16

And then one time out of a 100, they'll get something they'll be able to walk up and say money, please. The problem is that's not really useful. If you're designing the system, you need some idea of what is the systemization here because you you know it's 99%. You don't to some extent, like you're gonna keep on working to get to a 100, but you you care about the cases where it's not 99%, but you made a mistake and it's actually 70% or it's 1% or 0%. And that's a statistical argument.

Sean: 33:48

That's a, here's a description, a higher level description, then here's an example. Like this isn't me prompting it. This is, a description of a attack that is not accounted for in your documentation that you're vulnerable for, and you need to solve this as a strategy. Like you you never solve talk like a pirate. So anyone can talk like a pirate to the system and get it to do bad things.

Sean: 34:10

And that that's the thing you care about rather than I asked it in piratees something a 100 times and one

Daniel: 34:16

time it gave me bad. Were there certain modalities like you described of failure that were a surprise kind of to the judges in terms of like the, Or maybe to like the I don't know how much interaction there was with the model building team at Allen Institute and that sort of thing, but any kind of modes of failure that weren't expected or just the ones that were continuously problematic, but even if they were known before?

Sean: 34:47

Somewhat unexpected. We characterize this in the research paper as well. And do recommend people take a look at it because it's very practical in nature. The biggest one and the one that I I I think the most people just instantly exploited and they didn't they didn't have full information on this one. Like the we didn't give them full information on the integration of the guard model and the foundation model or the chat model underneath.

Sean: 35:11

And there's multiple ways of configuring that handoff. You can either do like a hard reject and say like this doesn't pass the guard model and we're gonna give you nothing. Or you can re prompt the underlying model and say like, but don't here's here's what you're being prompted but don't answer it. You know, there's variations of that. And then you can still get like a somewhat useful reply out the system.

Sean: 35:36

It just should reject it. The configuration of that handoff was looser than it necessarily needed to be. And exploiting that handoff between the guard model and the underlying model was the kind of, you know, big vector that you could use that to print cash over the course of the competition. And that's the perhaps the thing people need to think about, you know, particularly as deploying these systems are very often a collection of multiple systems that, present as one is the interface between them is very often under tested. It's very often difficult to know how those systems will interact.

Sean: 36:14

And, particularly when the benchmarks are expressed at a lower level than the than the whole system, you don't know a lot about what will happen at that point. You've muddied the waters a little bit.

Chris: 36:24

I'm I'm curious. As you went through this exercise with the attendees, beyond just the specifics of this problem, what was the value of going through the exercise and what you know, in terms of the outcome for you and what you got out of it beyond the immediate guard model with the, you know, with the model is protecting in that process? Like, what was that like? And I'm also curious if you go and do a similar exercise in the future, which I'm assuming there will be some some variation of that. What can you envision being a a a good next step to tackle in terms of trying to secure value and get to a next level outcome?

Chris: 37:09

You know, have you thought about what you would do as you came out of the previous event? Should you do it again?

Sean: 37:15

So I I think one of the really strong takeaways, and there there's actually been some some work since then, was, the need for tools and to kind of scaffold this in some form and to make it where you have a clear form of expression of, what these things are called, which is, flaw reports because it's it's not an incident like no one's harmed. You're kind of, you know, testing things in a laboratory. And so what you have in classical computer security is you have bug bounty systems. You have the ability to submit those. There's kind of clear adjudication processes that often lead to payouts, to people.

Sean: 37:53

And the extensions that are necessary for machine learning based systems, data driven systems, distributional systems is I think clearer as a result of the activity that we ran at DEFCON. Like that that's the kind of research value that's that's produced, by that. Allen Institute probably got, some operational use as well that they they learned a lot because they're at that judge table dealing with the onslaught as as well. That's the vendor table for things. And so I think if this if this is run again in in the future, getting past those pain points of, you know, you're kind of rolling your own code and and all that and that's not not necessary for for this scaffolding it a lot more.

Sean: 38:43

We'll we'll carry it a lot more distance and then move this towards something where there's one or more companies that are operating a flaw reporting business in the same way that there is a security bug reporting.

Daniel: 38:57

Well, Sean, as we kind of get close to the wrapping up point here on the podcast, I'd love for you to share anything that comes to mind in terms of like, as you look out kind of at the next phase of maybe your own work, but maybe, you know, the ecosystem a little bit more broad in terms of the area of research in which you are involved. What are you what are those things that as you, you know, end your days or or you're driving back to home or or whatever? What are those things that stick in your mind? Like, you're excited. Like, what what if this happens?

Daniel: 39:32

Or I I'm really interested to see how this develops or I'm encouraged by kind of this going into the future. What what's kinda top of mind for you, in terms of the community that you're involved with coming into this next year?

Sean: 39:44

There's an interesting phenomena that every time we pass a milestone database, and we're like, oh, we've crossed a 500 or a thousand incidents. We're like, yay. And then we're like, but it's not a it's not a great thing. Should you celebrate? Yeah.

Sean: 39:58

Should we celebrate? And I I feel like my my dream utopia might be like I need to go do something else because the safety problem has has been solved and there's no more there's no more to do. Like, the unfortunate fact though is I this is only gonna get more complicated, and there's more to do. There's more to solve. There there's not there's not gonna be a perfect AI system, out there.

Sean: 40:28

So what I'm hoping for in the next year is to develop a greater sense of what institutions, what techniques, what methods we need to develop and scale and apply so that we can have a safer AI ecosystem. And this is important not just from a perspective of wanting to prevent harm, but can't deploy unsafe systems to clients that want safety, that care about outcomes. And so our ability to make a safer system is very heavily involved in our ability to ship product to solve problems in the real world. And so I hope to highlight that I hope to have a greater accounting for the risks and make it so that we have better and better measures of those risks because you manage what you measure. And so let's measure these risks and then we can separate the actors that are investing in safety and and stronger AI systems, safer AI systems, from the ones that aren't doing that.

Sean: 41:47

And right now, that's that's not where we are.

Daniel: 41:50

Well, really appreciate that perspective and and appreciate the work that you continue to be engaged in. Sean, thank you for that. And from the community, it's really, really important. So appreciate that. And and thank you very much for taking time to chat with us about it.

Daniel: 42:04

It's been great.

Sean: 42:05

It's it's been a lot of fun. Thank you for having me.

Narrator: 42:15

Alright. That's our show for this week. If you haven't checked out our website, head to practicalai.fm and be sure to connect with us on LinkedIn, X, or Blue Sky. You'll see us posting insights related to the latest AI developments, and we would love for you to join the conversation. Thanks to our partner Prediction Guard for providing operational support for the show.

Narrator: 42:35

Check them out at predictionguard.com. Also, thanks to Breakmaster Cylinder for the beats and to you for listening. That's all for now, but you'll hear from us again next week.

More episodes

Chapters

Creators and Guests

What is Practical AI?