[00:02:34] Sean McGregor: Hi, I'm Sean McGregor joining you from deep underneath the Harvard Law School. I am a machine learning safety guy and uh, cover a lot of ground and looking forward to this conversation with you [00:02:47] Jacob Haimes: Awesome. And in one sentence, what's broken about how we measure and evaluate AI systems and how do we fix it? [00:02:56] Sean McGregor: in one sentence, huh? We'll [00:02:57] Jacob Haimes: One sentence. [00:02:58] Sean McGregor: we'll just say everything. More finite. I would say. We have spent much of the last 20 ish years optimizing machine learning systems from data and have so thoroughly destroyed any notion of measurement that we have powerful systems that, have a capacity to be characterized. and that is a great big problem for everyone because we need the ability to scientifically understand these systems. [00:03:32] Jacob Haimes: Alright. Let's take a step back maybe and talk about how you got to where you are. You did a PhD in machine learning for public policy and before everyone's ah, yes, that makes sense. Perfect. This was in 2010, so [00:03:49] Sean McGregor: Yeah. [00:03:49] Jacob Haimes: was a bit before the current ai summer, so to speak. How did you find this? How did you get interested in it? Why machine learning for public policy? [00:05:18] Sean McGregor: So I, I guess taking one step back from. The decision that resulted in me even going to grad school for a PhD in the topic. I I had finished my undergraduate degree in 2008, which was a great time to graduate from undergrad. No, it was, [00:05:35] Jacob Haimes: Yeah. [00:05:36] Sean McGregor: The headlines weren't particularly happy in 2008, but I, I always knew I wanted to go to grad school anyways. Like I wasn't done with schooling at that point. I was only like one quarter baked as a person thinking about things. And the undergraduate topic that I had been, really obsessing over, and for my undergraduate thesis was one that was titled something like Building That List of Life. It was how do you go about enabling citizen science to model the distribution of wildlife around the world? And I was really heavily inspired by something called Encyclopedia of Life that was announced at that time of, trying to have these central data resources that everyone, pours their data into and makes a lot of sense over the world. so I went from that project to actually doing a research experience for undergraduates, a summer that I spent in a forest in Oregon. Literally like needing to walk around deer to get to the, laundry room and no cell phone reception, but it had high-speed internet and like a really great computer lab. And I, I could just do like field work during the day and then go into this kind of overly overheated lab environment and hack on moth distribution data that we were collecting and figuring out how we could topic modeling and other things for different MAs feeding on different food plants. I thought that was just so cool. And a combination of of interest that, I could actually be out in the world doing something and then come and work on the data and make sense of it. I found that really exciting and I finished that and I hadn't applied to any PhD programs at that point. So like I, I worked for a little bit in what work I could find in the, 2008, 2009 timeframe and. I applied to three different PhD programs. Only three. But there were three different disciplines. One was computer science one was department of Natural Resources and one was geography. More or less at that time I concluded, geography is interesting. Like I really like geographic problems. This is something I've been working on, but I also don't really wanna explain to everyone. Geography's still a thing. We haven't actually figured out, out all of geography yet. And department of Natural Resources was also was also interesting, but I felt like my kind of core practice and discipline was something that would benefit from. Computing and like approaching these problems from a space of computing. 'cause I felt like I could pick up a lot of the other elements more informally, and I could, as a matter of practice, live in natural resources, live in these public policy domains. And that the thing that would make me distinctive and make me powerful in a good way is approaching it from computing and understanding how systems are built, how to build systems, and like how to operate in those high skill environments, high speed environments. And this is something that very often doesn't come into the public policy space that you don't often hybridize these two elements of how do you do policy and how do you do computing for policy. So I started in in my first. First week and and my grad program at Oregon State, which has a excellent forestry program. And I had a really excellent very senior machine learning faculty member that was really foundational in building a lot of the stuff in the field. He is founded some of the some of the journals and like the base publications that have made it work. And he's done a lot in, anomaly detection, sensor modeling all these things of real world impact. And my first week talking with him, he's I have these two projects. One of 'em I think was species distribution modeling related, so it's something I was interested in. Another one was wildfire suppression related. was. A fire starts in a forest. What do you do? And how do you go about answering that [00:09:27] Jacob Haimes: Put it out. [00:09:28] Sean McGregor: compensational? You're from a fire prone state. You're saying put that fire out. I don't want my home in to burn down here. And I was from San Diego, like San Diego. Did not have snow days, But we did have a fire week. [00:09:42] Jacob Haimes: Yes. Yeah. [00:09:44] Sean McGregor: and so like this was immediately an interesting problem to me. And this is one that you know, like in a full kind of com computing bravado like that you often get, particularly in, in AI circles, it's we're gonna bring the computational intelligence to it and we're gonna make it better. We're gonna figure How to do this problem better. [00:09:59] Jacob Haimes: Yeah. We're gonna solve cancer, [00:10:02] Sean McGregor: yes why not? Go after that. And that's like the exciting part about a PhD really is like you [00:10:06] Jacob Haimes: right? [00:10:07] Sean McGregor: It should be a problem that like maybe you arrive at a solution in the next four to 10 years. Maybe you just like incrementally move in the direction of it. But you are supposed to aim big. And so like having a sense of there's a problem in what, how we're managing our lands with respect to fire. So let's really figure that out. Let's [00:10:28] Jacob Haimes: Yeah. [00:10:29] Sean McGregor: Build a simulator. Let's simulate this public land. Let's not do it for like just, here's a fire it spreads and decide what to do with respect to this fire. Let's look at the what happens over the course of a hundred years that this land is gonna be, the vegetation's gonna be growing. You're going to have changes in land cover from fires. You're gonna have. These really severe events where the whole landscape may, might burn, or you're gonna have smaller fires that create this like patchwork of aged trees, which is really good ecologically, Let's make sense of what to do in response to wildfire. And you said suppress it. And [00:11:04] Jacob Haimes: That's not the answer though, right? [00:11:05] Sean McGregor: It might be like and flashing forward to, the conclusion of the PhD. It's complicated. Like it's not one of there is a right answer to this particular fire. There's, Different preferences on risk. There's different preferences on the values that you realize from the land, whether that's timber values or breathing clean air, [00:11:29] Jacob Haimes: You're saying there's trade-offs? [00:11:30] Sean McGregor: there's trade offs. [00:11:32] Jacob Haimes: Unacceptable. [00:11:32] Sean McGregor: Unacceptable. [00:11:33] Jacob Haimes: I need one correct answer. Thank you. [00:11:36] Sean McGregor: Okay. Then probably suppress the fire, I guess you're right. [00:11:39] Jacob Haimes: No my, my girlfriend actually comes from Kansas. They do a lot of controlled burns. And so she grew up going to prairie Burns. [00:11:47] Sean McGregor: Drip [00:11:47] Jacob Haimes: and, sorry, go ahead. [00:11:49] Sean McGregor: Using the drip torch and like walking along and, [00:11:51] Jacob Haimes: Well, She didn't do those herself, but they have big parties around those. They like do a cookout. It sounds horrible to me 'cause my eyes are like super sensitive to ash. But that's a thing that they do. And yeah I think it's pretty interesting how different approaches could have resulted in much better outcomes for fire, like the current state of wildfires. And I think we can move towards that with the work, basically that you started doing there. [00:12:21] Sean McGregor: So the final output of this work stream, of this of my BG was building a visual analytic interface that allows you to actually change the values that you associate with the landscape. So if you don't care about smoke inhalation days you could zero that out and that actually changes very often what policy would be optimized in response to the development of of the landscape subject to that policy. That. Was not exactly the answer you, you look to have at the start of a PhD. It is nice if you have a headline saying I've solved wildfire. And I just had this. And there's, I just had this sense of the technology I was developing is immensely powerful, can lead to really a great kind of optimization of the way that society is run. But then as soon as you bring that over to implementation subject to the complexities of the real world, subject to the complexities of someone needs to implement it and actually do it, it just felt like it was gonna break. And I always experienced this thing of when I would explain my research to people they would repeat back how they intend on using it. I'm like, oh, you could do that. But like I you're like missing the point or you're like not using it in a manner that's consistent with kinda like the spaghetti code I know is sitting under, underneath this thing. 'cause like we were just figuring out how to even make it work initially. I never wanted to have a sense of a land manager used my wildfire assistant and now this is a charred landscape. And the fact that squirrel no longer has a home is your fault. And I always felt a tremendous sense of responsibility for what it was I was building for that. And that while we could build and optimize these things that that handoff, that transition how to make it valuable for society is something that. Takes a lot more time and attention than a single PhD would allow for [00:14:27] Jacob Haimes: Gotcha. And so then. Towards the end of your PhD when you're having these thoughts of maybe this isn't necessarily the direction, that I want to keep going. You moved to Orange County, which I looked up, [00:14:42] Sean McGregor: accident. Yeah, [00:14:43] Jacob Haimes: Yeah I looked it up. And it has the highest cost of living in the us. Or it's one of the, sorry, not highest cost. It's one of the counties with the highest cost of living in the US with 60% above the national average. So you are decided to move there. And you actually wrote a blog post about it. It ended up, being an accident. Can you just give a little bit of color to that story real quick? [00:15:04] Sean McGregor: sure. [00:15:05] Jacob Haimes: that is yeah. Pretty interesting. [00:15:07] Sean McGregor: So I guess to set why for this whole thing first I should say, I'm from San Diego and I love my hometown. And despite having recently, shifted from Orange County to Boston and presently located in the cold northeast I would like to spend as much time in a, in places like San Diego as possible. It's beautiful. I love the beaches and it's my vibe and and how I exists. Orange County is not notably San Diego. Growing up you would say oh yeah, LA and then people from Orange County would say no, we're Orange County. Completely different. And so naturally I ended up moving to Orange County and yes, orange County, completely different from Los Angeles. It's not the same place, I moved there because it was close to San Diego, and I got an opportunity for an internship I, that, that looked interesting to me. And so I arranged housing. I signed a lease I moved there and this is before I defended, but not too long before it was like was gonna be seeing, is this a place that I wanted to actually start my post doctoral career. and I walked up the first day of the internship and this was in January. So it was like everyone returning from the holidays and I get greeted by the HR person outside the building and he brings me inside. I'm the first one really there to report. He sit me down in a room and he says so I have some bad news. you don't have a position here. And, i, the, this was a bit of a shock and I found out subsequently it was because they were cutting all contingent positions and I was mostly like financial maneuvering of the organization. I, I found out it had nothing to do with me. It had nothing to do with what they wanted me to do. The people I was gonna work for and report to reach out to me after I left the building that day, after I got my one, one day of pay, because I reported they have to per California law, pay me for it. They reached out to me and were very apologetic and were trying to buy me all the meals whatnot. So I had effectively accidentally moved to Orange County. Like I didn't have a job there. I had a lease and was paying, thousands of dollars a month for for the place, And that's when I had to scramble to figure out what I was doing because I was not a, I'd taken a leave of absence or was not. Enrolled in grad school, that particular term, because I was just gonna finish up my dissertation and then enroll the the following term to do the defense for the minimum number of credits. So [00:17:30] Jacob Haimes: So you're now in Orange County, you have a lease. So you're committed at that point. How did you go about making that not a disaster? [00:17:44] Sean McGregor: It feels like it and, I, my advice to listeners of the podcast those looking to make it in AI safety, perhaps more than other fields, but it's. If you're good at what you do, if you know something and you proceed through the community with integrity and capacity, then usually when you are on the job market and you become available, particularly like if there's a good story behind it of Hey, I'm accidentally live here now. Up? People are they want to help you. Like it's, most people are good people and might not be able to give you a job, but they might know someone who does, or they'll be like actually you would be perfect for X, Y, and Z thing. And that's more or less what I did there of I started doing like some random teaching. I started just hitting the ground and talking to everyone and, actually, like it worked out way better than if that internship had come through. I, found something that more closely reflected my experience and capacities and drives and really led me to where I am today. And I've, I feel quite lucky to, I've not had that internship. [00:18:58] Jacob Haimes: Yeah, no, that, yeah, that it makes sense and I would totally agree. Like I, I think really you just need to start doing things. I've said this before, but I really feel like you need to choose to do something where no one else can tell you that you can't do it. Quit asking for permission, basically with regards to your own career at least. So yeah I totally that resonates with me a lot. And speaking of where you currently are, like you've done a lot in the seven years since then. Can you give us like the highlights? 'Cause it is, it is quite storied. Uh, We don't have time to go in depth into all of it but it'd be good to, to get a little bit of an idea. [00:19:39] Sean McGregor: Sure. So I, I guess taking the express route on this the thing I found when I was suddenly situated in Orange County was actually just doing consulting work. And I think that this path is probably one we're gonna see a lot more commonly. The economy is a like mentality of people need the things, You can from thing to thing as a result. So the first thing I did was I was the technical lead consultant for the IBM Watson. AI xprize. And the a I xprize work was starting up something called the AI Incident Database, which was just something that to exist. From my perspective, it's a collection of harms produced by AI systems in the real world. Inspired by some of our databases and aviation food safety product safety, computer security. A bad thing happens. You record it. You collectively work to make it so that it's less likely to happen again. This is I think a very critical modality for effective safety cultures is to have this you might not be able to prevent everything, every bad thing from happening once, but you should work to make it not happen again. The Santa Santam of those who cannot remember the past are condemned to, to repeat it I think is very important for the safety community. So I started up, that project, which is something that it's the first of this list of things that I'm still working on and then just to fast forward to through time to today, I eventually left the startup started started my own startup dedicated to the test and evaluation of machine learning systems. Before lms were a broadly known thing. And [00:21:23] Jacob Haimes: Mm-hmm. [00:21:23] Sean McGregor: at, uh, a variety of different algorithmic decision making systems and. Computer vision systems and whatnot. But the same technology worked for large language models. That startup, or at least the intellectual property associated with it was bought by an organization, an old safety organization from 1894 called underwriters Laboratories, ul, and went into their nonprofit organization, UL Research Institutes, Pursued a public mission across a variety of impact domains, but oriented, or centered on digital safety and how do Make a safer digital ecosystem. Left left that position March of this year 2025, Have been working on a combination of instant databasing as well as, actually, since we're to current day, and I think you'll be asking me about those items. I'll let [00:22:13] Jacob Haimes: Yeah. [00:22:14] Sean McGregor: prompt those. But that brings us to today and carrying a few titles or wearing a few hats that all thematically aligned to trying to make it, make AI safer or make society safer with ai. [00:22:25] Jacob Haimes: Gotcha. And. One thing that you mentioned when we were discussing just, our pre-show discussion as something that you were thinking about recently is, how you approached contributing towards safe ai. And the way that sort of went for you has had you in almost like straddling two different sides of the, maybe like corporate based for-profit safety and then also the nonprofit, maybe more policy oriented work. Can you talk a little bit about how you see this distinction and, would you have done the same thing again if you. We're given that opportunity, what would you suggest to people who are currently looking at this space and considering different ways to engage? [00:23:21] Sean McGregor: Sure. And I think that the way of capturing this dichotomy is, saying that when you're proceeding to a career in the safety of AI or the safety of society with respect to ai, you really need to operate along one one of two modes. And you can try and operate down both of them as I have, but it makes for makes it more interesting than I think most people would look to pursue. Those two modes are a market ma market mode or a regulatory mode. The market mode is one that you are aligning yourself to profits and people wanna pay for things and businesses and people working for them are supported by people paying for things. The other side of things is the regulatory mode. And this is very often looking to reign in the excesses of capitalism and for lack of a better term, And make it so that the market mode doesn't actually make it so that we, are living in a world we don't actually wanna live in. That eventually there's various forms of market failure or there's un unsafe things can sometimes be profitable. It addictive drugs are incredibly profitable and you could make a great amount of money making highly addictive drugs. But we decide we don't want addictive drugs as a society, that it leads us to bad places. And so we try and stop their production in distribution and when you are. A person new to AI safety, you really need to decide which of these kind of two teams you're playing on. I try and sit in the middle. This is, I think, particularly important for things like the AI incident database that is meant to, just be an independent collection of almost indifferent data that helps inform Doing better in both these spaces that you, it's not profitable to have your customers stop buying your things because it's unsafe and they don't trust your systems. It's also, something that in democratic countries you also create regulatory institutions to just, to stop that from happening. And that's where incident data is useful to both. But if you're at a company and you find greater affinity and your job description is aligned to regulation one of two things is gonna happen to you. If you're in a company that's profit, motive based you are either going to find yourself soon out of a job, or you are gonna find losing sleep because the purpose to which you're being called and in bed within the organization is not actually being upheld because the invisible hand of of capitalism which I kind of picture in this particular instance as being the hand from Super Smash Brothers is going to guide and shape you in a in a way that you don't find good. And, you can do a lot of good in the corporate context. You can navigate that adeptly, but you just need to recognize that at the end of the day if you want the company to make less money because it's safer and better for society if you live in a democracy, it's best to advocate for that in public policy and try and quietly, [00:26:42] Jacob Haimes: so within the company will result in you being shuffled off and getting someone else who won't put up a fuss. [00:26:52] Sean McGregor: The companies have really great immune systems for ensuring that the profit motive is adhered to. [00:26:58] Jacob Haimes: Yeah. [00:26:59] Sean McGregor: I, [00:27:00] Jacob Haimes: That was the nice way to say what I said. [00:27:01] Sean McGregor: yes. [00:27:02] Jacob Haimes: Yeah. [00:27:03] Sean McGregor: The and I say this as someone that actually does have an affinity for the capitalistic operating system, like a I think the job of regulation is to figure out how to, how to correct for the excesses that exist because how capitalism is just so efficient at doing what it does. And a thing you learn in a AI safety is overfitting leads you to a bad and weird places. And capitalism will overfit its own objective function and lead us to bad places. And regulation can bring you back. yeah. [00:27:38] Jacob Haimes: Yeah. I guess at least my mental model of. Capitalism and like how it works is just like the thing that it's being optimized for is profit and it does a great job at optimizing for profit. And anything else, is secondary meaning when it really comes down to it doesn't matter. And that includes, any sort of sense of morality or anything like that. And that's not to say anything with the people in those organizations, but the companies themselves, like you said, have a good immune system. They have been created to be systems that optimize profit extremely efficiently. [00:28:15] Sean McGregor: yeah, and the mental model that's best to proceed in if you want to not get greatly frustrated by it, is think of it as amoral, not in a moralistic sense, but in a, it's orthogonal or it's like an independent vector of morality. And, I. That the things that have the greatest ative power and like the capacity to change from my perspective is figuring out ways to apply those levers of profit to, to advance the case of safety. And that is a challenging, at times a frustrating proposition. But one that has profound downstream consequences. Coming back to. The question on like, why are, why don't we see like a big federal database doing what AI and database is doing is I think that we are likely to see a federation or a decentralized collection of incident databases that are not scoped around ai. And then there's gonna continue being the AI and database as a catchall that networks these things and makes it so the data is shareable and you can learn across domains. About what kinds of risk exists out there? That I would really like to see what we already have in aviation and food and drug and all these that do in fact index AI incidents that we just get a lot more of those specialized databases of communities of practice that know the application domain, know the data that can be generated and those can all be collected and indexed towards their own purposes and then shared outside their individual context. in the US there's a saying of the states or the laboratories of democracy that states each have their own laws and regulations and so forth. And different contexts that AI operates in is the opera. It's. The laboratory of AI safety that, Systems don't very often generalize to other contexts particularly well. So when you end up deploying it in a completely novel, unexpected, or it's just not part of the data collection of the engineering of the system you do learn something. And that's where the sharing of safety data between context is quite important. [00:30:38] Jacob Haimes: Yeah I guess that's probably like a key reason here as well. Why why there are so many issues as well. I think I've, also, probably said this ad nauseum to anyone who is a continuous listener of the show. But like a very significant problem is that these systems are being created and put out there, and the idea that they could do anything is part of the selling point. And that just doesn't, that's, that can't be true. [00:31:08] Sean McGregor: We haven't solved the everything problem. And be an interesting day for humanity if and when we do solve the everything problem. But, at the end of the day, and this goes back to the PhD side of things, eventually you wind up at values, You, you have to reconcile different interests and decide how to safeguard the wellbeing and flourishing of multiple s Sapient people in that world. And and we don't, we don't seem well prepared for a philosophical foundation for what it means to have in increasingly capable machines operating in tandem with society and people making decisions on things. We haven't figured it out with exclusively people, and we're embedding all the same problems in the machines. but at least we can engineer the machines, so there's a capacity to solve problems in a way. [00:32:13] Jacob Haimes: Yeah. But then, speaking of problems so one thing that I like care a lot about in terms of my research areas and what I've looked into is the benchmarking crisis, which is I don't think, people are necessarily using this term, but I do think it's applicable here. Because last year at nips there were at least 10 probably more a couple that I didn't see papers that were all just saying essentially, here's how we could do better in, in benchmarking and in, in evaluating our systems. And the same is true at. Many conferences in the machine learning space. And the same is true this year as well at nips. You have a paper like bench risk is the sort of like main title. And it's also, same thing about how can we improve benchmarking the fact that we need to do this. And we've been saying like so many people have been saying for a while now, it's been a year and which means at least two years. Because some of those papers that were at last year's, nips were started long before then of people saying let's reform, let's make this better and we're still not there. How did we get, like, why is this the case? What are the critiques? Why are, why is this such a problem? [00:33:27] Sean McGregor: Taking two steps back and looking at the kinda sociological factors and how we got here. We're drunk on progress. We have the ability to greatly expand capabilities of these systems in a way that just. Fundamentally destroys measurement, good hearts and all these rules that you read the first chapter of any textbook on machine learning. And tell you don't do things that are widespread in the highest levels of proceedings. And reason for that is it's just working like we're producing more and more capable systems, but we're more and more blind to the characterization of those systems We won't see. Really like a real solution that's implemented and that the cultural norms shift until it's actually when you have to other fields have had great crises and scientific understanding. Psychology, I think was actually one that was very famous for having a replication crisis that, that they found most I think it might've been the majority of of psych psychology studies were not Like they ran the same the same procedures. They did a new sampling and they just found no statistical significance on this if there's an effect, it's not a strong one, we're not able to find it like your, what you found in your confidence interval is not present in this, and maybe we take a few more samplings and find that you were right 5% of the time. Which means, hey, you just had a artifact of statistics and you didn't actually build the basis of knowledge. And so that, that psychology, and that's real bad. And machine learning is dealing with something similar, that these benchmarks very much associated with producing systems, capable of doing the things associated with the benchmarks. But then that benchmark's not useful for measurement. Like you trained against that thing. It's now in distribution. And I the way this is typically handled in machine learning is you have a training test split. but that's just not enough at this point. [00:35:39] Jacob Haimes: Part of the reason it's not enough is because everything is in the training split. [00:35:44] Sean McGregor: Certainly for the let's vacuum up all data everywhere and put it into the model That's the case. Not as bad for smaller models. [00:35:52] Jacob Haimes: That's true. [00:35:52] Sean McGregor: for smaller models, and you can assure it. But even in that case, even in the case where it's a small model and you can verify Hey, I produced data, and then like I shuffled some off into the test set. You're still in distribution. You're still evaluating the system for data that was sampled. Consistent with the data that was into training. And the problem we're most often faced with in safety is the generalization problem. Can this handle the variation of the real world? The world changes through time not the data that's a value available to the person who's building the system very often isn't gonna be the exact data environment to the user. Don't in a direct manner measure that [00:37:26] Jacob Haimes: But it feels like we're not even trying. [00:37:28] Sean McGregor: We're drunk on progress. Like I, it's it. It's working for capability side of things, and it's only going to stop working when coming back to the modes. One of two things happen. People stop buying the technology because it's unsafe. It doesn't, generalize. It's very brittle or regulation comes in and says, you don't have access to the market until you solve this problem. [00:37:50] Jacob Haimes: And I think that's, to me, that's what's was really like, Nope. I just I can't even formulate the idea here. [00:38:00] Sean McGregor: It's, [00:38:01] Jacob Haimes: gets me really irritated. [00:38:02] Sean McGregor: Yes. It is an irritating thing. And I the number of times you just have a sense of, we all know this is here, and we just decided not to worry about it. [00:38:15] Jacob Haimes: Yeah. [00:38:16] Sean McGregor: it it's among the more frustrating things working in in the safety space. And I, you just have to be around long enough to know when there, there's the moment that you can move methodologically to to better when you can figure out how to balance these aims. In this world we live in. [00:38:33] Jacob Haimes: And so how does bench risk get us closer to that? What does it tell us or how does it allow us to better think about at least like the current state of things? As is. Yeah. [00:38:45] Sean McGregor: To, to plug the thing. You can go to bench risk.ai to see the table and whatnot. If you're at your computer and listening, listening to this, you can click around. The way bench risk works is it uses risk management processes to actually look at ways that a benchmark could lead someone to be misinformed about the nature of in this case a large language model or a chat bot. [00:40:20] Jacob Haimes: Take into account a company using it, using the benchmark to intentionally subvert someone's expectations? [00:40:28] Sean McGregor: I would say that's like a second order dimension on this one [00:40:31] Jacob Haimes: Okay. [00:40:32] Sean McGregor: if the benchmark would lead. So it's not looking at a company repackaging a benchmark and representing it as evidencing something that the benchmark authors were not claiming it to evidence. So the [00:40:46] Jacob Haimes: Okay. [00:40:47] Sean McGregor: analysis on this is the documentation of the benchmark [00:40:51] Jacob Haimes: Good faith, essentially [00:40:53] Sean McGregor: and it does presume good faith in the person writing the benchmark. Which I think you can presume a little bit stronger than good faith on it because there is actual liability that arises from people relying on benchmarks to make decisions in the real world. And if you say, we have the perfect benchmark, and this perfect benchmark is how the system is completely safe you're really putting yourself out there. That's not a, that's not a thing that, [00:41:13] Jacob Haimes: and that's why they won't say it. [00:41:14] Sean McGregor: Yes. And so listen carefully about when people are publishing benchmarks, what they are prepared what claims that they're prepared to make from it. So as you start from that claim if this, if it says this system will prevent people from talking about ways of producing arson, [00:41:34] Jacob Haimes: Okay. [00:41:34] Sean McGregor: How this system will not talk about accelerants that you could put into buildings so as to burn them down. That is a specific thing. You can benchmark, you can say is are our sins questions being filtered out or not? You can look methodologically at does this, this benchmark likely to get it wrong? So is it. It's a communicated that it's the benchmark communicated as being related to arsonist arson risk. And then it is the data associated with the queries that an arsonist would issue and into the system. There's all these things that are like a long list of things that if you get wrong, it's just not a reliable benchmark and you can't use it to decide whether you deploy this in a context where you're worried about arson. This is where to actually having a practice of benchmarking and like having a ecosystem that generates reliable information about AI systems. You need to either. Have a strong longevity property of the benchmark like that you've mitigated most of the risk associated with it getting destroyed through time by the optimization. Or you need to be prepared to evergreen it, and you're always generating data. And this is why monitoring is in vogue at the moment. In part is monitoring is a form of benchmarking that? You can potentially evergreen and continually bring in live data to test for it. Trade offs there abound as well, though. We need it all. There's, defense in depth. Is another good term to go into, which I think you might have done a highlight on previously in one of your [00:43:11] Jacob Haimes: Yeah. The episode with lianne blue dot impact is going hard into the defense in depth framing, which I, yeah, I agree. I guess one thing that is maybe like a concern then of benchmarks or like a critique of benchmarks is that, having these enables the sort of safety washing, moral hazards for language model developers because they can point to a safety evaluation that maybe doesn't have long good longevity. Who knows that? You can point to it and say, oh truthful QA says that my model is truthful. And if you actually look at, what was claimed by the authors, that's not at all what's being said. And the therefore, like the name is very problematic. But how do we square the fact that these artifacts that we're creating can be used to like, like a against us essentially in terms of our understanding. [00:44:09] Sean McGregor: So go. Going back to the kinda market mode versus regulator mode on the market side of things people need reliable information about their systems. Like they, eventually you'll get to an ecosystem that provides reliable safety marks reliable markers of trust. If a system is insurable which. Something that landed in the last week is a lot of the insurers are starting to exclude AI risk from their policies, which is [00:44:38] Jacob Haimes: Good. [00:45:43] Sean McGregor: yeah it's an excellent development for a change thesis embedded within the market modality that, [00:45:49] Jacob Haimes: Yep. [00:45:50] Sean McGregor: I the risk the worst situation we, you can have with respect to risk is have uncertain assignment of that risk. [00:46:00] Jacob Haimes: Yep. [00:46:01] Sean McGregor: as that risk is devolving to the organizations that have the greatest decision making authority with respect to, the production of risk in the real world then we have a greater capacity to adaptively respond to it and produce safer and safer ecosystem. The other one is on the regulatory side of things. There's a variety of like conformity requirements with standards and something that I would like to see a lot more of. We'll see what happens is a lot of more standards being written in code or in benchmarks. And then hopefully having those be stewarded in a manner that corrects for a lot of the problems that we identify in the bench risk paper. And we'll see on that front. I'm hoping that enough movement and enough different spaces that some of these some of these outcomes will run out. [00:46:57] Jacob Haimes: Yeah. I guess part of the issue, part of the reason that I'm interested in in this space and I care so much about it is because whenever I think about it for too long, also, like I, I just get this fire because I'm like, wait a second. We're saying, okay let's do all this extra work, right? We're going to make sure that we're keeping the practices very explicit. We're specifying the purpose, we're specifying the scope and the assumptions and the information sources and how we're analyzing the model. And we're doing all of this to really make sure that what we're presenting is. Accurate and that we are giving appropriate caveats and that we are making, claims which we can actually back up. And then we compare that to what's being done on the other side. We compare that to how language model developers are behaving. Also other systems, but I'm singling out, the big language model developers because they're easier to single out and probably on a larger scale are affecting people negatively. But couldn't they have just done this to begin with? Couldn't they just be doing this testing internally? Couldn't they have solved these issues without us putting in essentially what is free labor? Like why does it make sense that we are, doing this basically. [00:48:23] Sean McGregor: I don't think it's a satisfying answer, but the answer that, that I have on that question is effectively we are being guided by that invisible hand. Thinking again about the, the hand in Super Smash Brothers That at every step along the way when you're, a startup or it's weird to call 'em startups at this point, but they still have the startup operating mode. you're asking do we have the luxury of doing this the way that it should be done? And they might not think about it explicitly in those terms, but you, you are always like a quarter away from, irrelevancy and two quarters away from organizational death and the like, kind of principles of evolution and whatnot exist in, in workspace as well of the ones that you know, way to be, they're bankrupt now. Like they, they've gone away. And it's not a satisfying thing, but the the thing that I really react to in, in your statement is I, just can't rely on everything to be done the right way at all, all times. Like we have to build either market or regulatory solutions to all of them because there's too many people out there and there's too much ability to charge ahead. And so it's not a, it's not a satisfying conclusion, but it's it's one that if you find enough fire in it on that front, you probably need to go to the regulatory camp [00:50:02] Jacob Haimes: Yeah. [00:50:03] Sean McGregor: that, that is, there, there are clearer answers there versus. If you're in a company and you know you're employing two people or 2000, I, alright here's my choice. Do I do this and we stay alive or do I not do this? And I need to fire 2000 people And it's all over. And all guided by that. That hand [00:50:35] Jacob Haimes: My solution is just to punish them for not doing it. [00:50:39] Sean McGregor: smite them. Smite them, Jacob? [00:50:41] Jacob Haimes: yes, no, but actually like with liability, with meaningful liability. I think that, the problem solves itself. [00:50:48] Sean McGregor: Yes. And among the. Among the greatest sins in the policy space, in the regulatory space is we've chosen to eliminate liability for things that there would be a market capacity to respond to, and they've in the presence of liability. So section two 30 and make it so that platforms are not liable for user content traded on them. Gr great in some respects, like they, it actually has given us a great many positive things. but right now we're dealing with all the negatives [00:51:22] Jacob Haimes: yeah. [00:51:22] Sean McGregor: on that front. And liability is among the most powerful forces in the universe. [00:53:55] Jacob Haimes: And then like, related to those things you know, we need to put out this these works. We need to be sharing these, right? We need to they're not gonna do it. So we have to but that also is at, is opposing the need for that like information withholding that you were talking about for a good benchmark. So how do we square that? How does that how do we balance that effectively for scientific progress and like the value of openness with the need for this sort of information security. And then also, does that mirror your thoughts on the open source closed source for AI systems debate? [00:54:34] Sean McGregor: The market needs information about what is safe or, and unsafe because people people don't fly on airplanes if they think the airplane is likely to crash. And, that's why a lot of the safety ecosystem built up around aviation is you don't have a commercial air service. If you don't have aviation safety. In hand. We have a variety of highly impactful, highly sought after market segments that AI wants to solve, but more of a of planes crashing than we've really thought about at depth, like legal aid being, being one, one example of that, that the practice of law is something that. If people had greater access to it we would correct one of the economic injustices that we have in the country of there's a whole system of of laws that's only accessible to people if they get hit by a car of someone that could be sued by a injury defense or an injury plaintiff's lawyer. And so you do have a market need for it. You do have a need to solve this problem of benchmarks or profoundly unreliable in a lot of instances. That's an element of my work is actually, working with people in industry to try and define what a reliable benchmark is and to in fact, build it and, apply it across the whole industry so that. Coming back to the, profit motive of companies it doesn't make a whole lot of profit sense to invest in something that you can't differentiate your product on. If you don't have a good benchmark out there saying ours is safer. Like we've actually, we have a security operation center that has a ability to respond to jailbreaks inside a week and close each of those. You can't market your company based off of anything like that right now. Like you don't have reliable information about, about those activities and in, and so investing in and producing good, long-lived benchmarking techniques gives the market ability to solve these problems. [00:56:46] Jacob Haimes: I did wanna ask one more thing about the bench risk. So in, in the paper, in what you present, you say, ideally what we would like to see is that benchmark authors self score assessments, and they include that, when we share these knowing how many of these kinds of tools there are out there data sheets for data sets is an example that's, from like 2021 or something like that. And there are tons of different frameworks that significantly improve what we can say about the data set or the benchmark that's being presented. And they're almost never used. And this seems like a bit unrealistic to me, and I am, I'm curious what your thoughts are and like how your team went about thinking about this. [00:57:38] Sean McGregor: I think that the most important impact we can have out of the outta the bench risk work is to provide a means for benchmarks that do aspire to real world decisions, to indicate that somehow And to be explicit about it that this is a higher bar to clear, than simply documenting it and whatnot. Because most of the, documentation about benchmark standards that exist out there they're just a useful way to and present what it is you've produced. And those are valuable and you need to produce those. And in fact, I think you actually score. Lower or less well on on bench risk if you don't document it. But the truth of the matter is very few benchmarks are aspiring to evidence, real world decisions. And I we're hoping to provide a guide towards that, A means to do that. And, we're going to evaluate more benchmarks, we're going to score them. And I, we tried to prioritize a little bit on, what's in the release documentation of these large language bottles. [00:58:49] Jacob Haimes: Yeah, that makes sense. [00:58:50] Sean McGregor: so you can know which ones to go after. And I'm in it for the long haul. Like this is not something we're gonna be done talking about a year from. This is element of the effort that's going to last for a hundred plus years is figuring out how to measure and expose this information. And so this is a powerful tool in that effort. [00:59:14] Jacob Haimes: Gotcha. Yeah, you should do a truthful qa 'cause I bet it would score extremely poorly. But talking about like impacting real world decisions, real world things. If someone does want to impact things, does want to make, changes there are also other ways that that they can do that. And one that you've decided recently to start it maybe not super recently 'cause you've been working on it for a little while but you are starting a group or working with a group that has just started called Avery which is a Frontier Auditing nonprofit. What is your reasoning for why you're doing this? And tell us just a little bit about Yeah. How this came about why you're doing it, what your mode of impact or your theory of impact is. [00:59:58] Sean McGregor: So a Avery stands for AI Verification and Evaluation Research Institute, and it's heavily inspired by a fairly long history of audit which most commonly is known as being applied in the financial sector where you need an organization to come in and say do we trust that these financial statements are representative of reality? you can detect a little bit of this rhymes with a lot of things we've been talking about today of do we actually trust the forms of evidence? Do we trust the representations that are being made? And how do we move towards a greater and greater sense of assurance over. These things that are being claimed on systems. And this really does serve both the market and the regulator side of things. The audit in a market context in financial statements is something that lets people invest in organizations. There's some famous accounting scandals, Enron being a good example of years past that actually resulted in the more or less a death sentence for the auditor associated with that organization. and that is. Something that ha was, they were more or less sacrificed is the way that people in accounting sometimes talk about it because they're like we have to trust the results of audits because like the whole financial ecosystem is premised on trust and this is an important element for it. So that serves the market side of things. also the really big risk side of things that government takes an interest in, that you need organizations to go in and say Hey, this system is capable of very highly scaled cyber attacks and wreaking a lot of havoc across, the digital ecosystem. We would really like that not to be the case. And we also know it's super hard to arrive at a sense of, it's. to impossible. So let's work with these outside evaluation and verification organizations towards that. So Avery is engaged in to figure out like what is an audit like as in this space as a like first principle. And trying to get that standardized in some form where we use the consistent language surrounding it. And then developing the methodological basis that an activity like that can be carried out. Had double entry bookkeeping for centuries. We've large language models for five years. So there's a lot for us to figure out here. And is also no, no offense to the 700,000 CPAs in the United States, but it is like a little bit simpler than. Like a system that can also do accounting fly a plane and [01:02:45] Jacob Haimes: I wouldn't trust an LLM to fly a plane, but I do get what you're saying. [01:02:49] Sean McGregor: Don't put ideas in people's heads. Like someone you're gonna see, you're gonna see something in in the next month. And it's gonna be this person hooked an L into making flight control decisions. And we'll see whether they intervene before it hits the ground. But yeah, [01:03:05] Jacob Haimes: Yep. [01:03:05] Sean McGregor: it's the everything problem. And our task, our curious, the curious task of AI safety is to figure out how to go from the capability to specific risk and how to map between those. [01:03:17] Jacob Haimes: Okay. And so this is a nonprofit and so how are you getting funded? Asking for a friend? [01:03:27] Sean McGregor: Primarily private individuals and philanthropies. and I, that is, that's the majority of the of the funding. There's debates on how such organizations are funded in the long term. And this is something that we actually grapple with on the AI and database as well of having independence as an asset. It's also a liability financially and whatnot. It's a little bit harder to, to proceed, but, [01:03:55] Jacob Haimes: Yeah. [01:03:56] Sean McGregor: the reason for being a nonprofit is, Your decisions and your reason for doing that Nonprofit are pretty clear. Everyone involved in Avery could make more money by doing something that's not Avery. There's, [01:04:09] Jacob Haimes: Yeah [01:04:10] Sean McGregor: there's fortunes being won and lost out there at the moment. so you can have a greater sense of trust in the organization as a result of saying, we don't get the profit share. The thing that makes a nonprofit isn't that you don't get paid although often that is the case [01:04:25] Jacob Haimes: I was gonna say, I don't know. Like that. [01:04:27] Sean McGregor: really feels like that. It's that it, you don't get to take more than like a normal schedule wage for it. It's all the money, all the excess funds that you bring in need to be deployed to the purpose. You don't get to say, here's a special dividend to all the employees. You did very well last year and you're now a deca millionaire or something. And that. is very important when you're looking to serve the public interest, is to have a sense of you cannot be corrupted by X amount of money coming in. Because if that money came in, that's, and it ultimately compromised the purpose. Like, why are you doing this? Like the [01:05:12] Jacob Haimes: Yeah. [01:05:12] Sean McGregor: just go make money. There, there are better ways to make money than to compromise the purpose to which you're engaged in. [01:05:18] Jacob Haimes: Yeah. So basically the reason you're a nonprofit is so you can pay your employees less. I'm just kidding. [01:05:26] Sean McGregor: not, it's it's so that you can't them to forget about their purpose. And I think a good [01:05:33] Jacob Haimes: Yeah. [01:05:33] Sean McGregor: this is Juul the the e-cig, The [01:05:37] Jacob Haimes: The Oh, like the vape thing. [01:05:39] Sean McGregor: And look at their organizational history. And really sad because like they started off legitimately, at least a good portion of them bought into the wear a smoking cessation aid. And that is like a true thing. This is a dual use technology that can be addictive and be unhealthy to people. And it can also help people stop being addicted and stop being unhealthy. But the problem is the addiction side of that equation is a lot more profitable than the smoking cessation side of things because eventually you also stop using the smoking cessation aid. And [01:06:11] Jacob Haimes: Yeah, [01:06:12] Sean McGregor: eventually they were purchased by I think Philip Morris and they all got really great payouts, but if you were there for the smoking cessation aid side of things, that's not gonna feel great. Being a nonprofit means you can't do that. [01:06:25] Jacob Haimes: No I strongly believe that like of the, in the importance of having organizations like that. And so I'm, trying to pull out more reasoning from you by saying that. But then, okay so we've got this sort of we don't want to be. I guess yeah, slave to the profit incentive. But then you also have you can still be corporate friendly and so ML Commons, which is a group that you've worked with, I know is very corporate friendly I would say. And so I'm curious like what your, you'll have to be corporate friendly, right? If you're being a frontier model auditing sort of group. But at the same time, how do you define are people who are currently participating or currently employees of frontier model companies, allowed to be part of your working groups? And like how does that work and how do you make sure it doesn't. Create a scenario where there is private interest that's trying to undercut research, even if maybe not trying, but just is undercutting research. [01:07:26] Sean McGregor: so ML Commons is a business league. It's a 5 0 1 C six. It's also a nonprofit, but it's a different kind of nonprofit, That distinction is quite important for understanding orgs that are out there. I've been affiliated with 5 0 1 C3 four. Six nine. I'm trying to get the whole 5 0 1 c section of the tax code. I [01:07:48] Jacob Haimes: Six seven basically. [01:07:50] Sean McGregor: I've done seven is, is union, right? I think [01:07:53] Jacob Haimes: No, it's a, it is a dumb joke. Don't worry about it. Please continue. [01:07:56] Sean McGregor: So I almost have the whole thing. And the reason I like Emel Commons as an organization is that comment I made earlier of business needs reliable measurement in order for business to sell products. And [01:08:09] Jacob Haimes: Okay. [01:08:09] Sean McGregor: is a bunch of engineers that care about measurement and they want to do, they wanna do it right. They get angry when suggestions of that would produce too much variance and too out of confidence intervals come into play. And, that's just something that I regard as oh yeah, these are. These are my people trying to, chart the path in inside industry. The org has to measure across the whole industry. It has to produce the best measurements it can, and it's interested in maintaining benchmarks for long periods of time as something that is, standardizing the practice of measurement in the space. And so I just love the alignment of interest. I think that we very much need orgs like this. And if you go back in history, in the development of other safety organizations, you'll see that very often there is this relationship between industry and the kind of, NGO supervisory mechanisms that, that gets stood up. [01:09:14] Jacob Haimes: But then there are also for-profit ones, right? So there are also auditors. That are for-profits doing the similar things to what you are proposing to do with Avery. And an issue, like a conflict of interest there is that as soon as there is meaningful legislation in place the services that you are offering skyrocket in value. How does, maybe not Avery, because Avery is a nonprofit, but like how do people in this space reckon with this conflict of interest? And do you think this should be resolved? If so, like how? [01:09:51] Sean McGregor: The way that this works in financial audits, 'cause financial audits are for-profit organizations that are very often partnerships. And happens there if you put out an audit that is. Let's say you just don't show up and you like issue the audit finding. It's oh, the financial statements are all good. You can put all your money in there. They're doing great. Then it subsequently comes out like there were no, there was no money in those bank accounts, like the, it didn't exist. And as an auditor, you actually have tremendous liability for having put out a fraudulent audit. [01:10:23] Jacob Haimes: Okay. [01:10:24] Sean McGregor: You have a carrot to be dishonest and you have a stick to [01:10:28] Jacob Haimes: Keep you in line. [01:10:29] Sean McGregor: keep you in line. And that seems to work adequately in the financial space. There are cases where there's, there are failings, but it carries you pretty far. Sorry, I'm checking on my. [01:10:41] Jacob Haimes: You're all good? [01:10:42] Sean McGregor: Is, I am in the basement still. Will wrap now. Exclamation point. Sorry. Ba back on. So there, there are ways of there are ways of ensuring the integrity of a for-profit auditor. That point the development of audit for ai. [01:11:05] Jacob Haimes: Yeah. [01:11:06] Sean McGregor: we have not defined a profession that is AI auditor to any adequate degree that it would make sense to engage in a the for-profit auditing of systems. I, I think speaking for myself at least we want that to exist. Like we, we think it has to exist in some form. That activity is predominantly carried out by publicly funded nonprofit operators or for-profit ones I have a preference to one of those of over the other. But neither exists right, right now beyond a collection of aspirants that I am hoping to be and enable. And they, it does exist at this moment. It's just not nearly enough. [01:11:49] Jacob Haimes: Gotcha. Okay. And then I know we gotta wrap up. I want to ask at least the one question that I ask everyone which is, what is your favorite part about what you do? [01:12:01] Sean McGregor: My favorite part of what I do is that it. It has a sense of, this is the question of our time that we Out how to navigate this problem and that we're in this together. Like all humanity. We're factious and there, there are issues, but broadly we're seeing more change more quickly than humanity's ever seen. And we're all in this ship together. And that there is that sense of shared purpose. [01:12:30] Jacob Haimes: Awesome. Yeah. Sean, thank you so much for joining me. I'm really excited to, to get this episode together and I really appreciate you joining me and having the time. [01:12:40] Sean McGregor: It's been great talking with you Jacob, and great show and really a fantastic beard. I have to say, I love your beard. [01:12:46] Jacob Haimes: Thank you. Yeah, I think the first, when we first met, you mentioned something about some, if you're gonna, make it, super corporate, you might need to shave that, and I was like, not gonna happen. [01:12:57] Sean McGregor: And I, I love the conviction of the answer on that front. And Yeah. it's, I wish I could grow a beard like yours. Like I just can't, I don't have it in me. [01:13:06] Jacob Haimes: Yeah. Yeah. It's definitely, genetics is a big portion of it awesome. Yeah, thank you so much. And I'll let you get to where you're going. Yep. [01:13:16] Sean McGregor: Thank you, Jacob. It's been a pleasure and keep in touch. Hopefully you see you in San Diego.​