Practical AI

How do we build trust in AI agents before the AI hailstorm arrives? Emil Lassen from the Artificial Intelligence Underwriting Company (AIUC) joins the show to discuss how the enterprise flywheel of standards, certification, audit, and insurance is being applied to AI agents. They explore the AIUC-1 framework, the challenges of securing agentic AI systems, and why red teaming (based on standards) may be key to accelerating enterprise AI adoption.

Featuring:
Links: 
Sponsors:
Upcoming Events: 

Creators and Guests

Host
Daniel Whitenack
CEO @Prediction Guard & cohost @Practical AI podcast
Guest
Emil Lassen

What is Practical AI?

Making artificial intelligence practical, productive & accessible to everyone. Practical AI is a show in which technology professionals, business people, students, enthusiasts, and expert guests engage in lively discussions about Artificial Intelligence and related topics (Machine Learning, Deep Learning, Neural Networks, GANs, MLOps, AIOps, LLMs & more).

The focus is on productive implementations and real-world scenarios that are accessible to everyone. If you want to keep up with the latest advances in AI, while keeping one foot in the real world, then this is the show for you!

Narrator:

Welcome to the Practical AI Podcast, where we break down the real world applications of artificial intelligence and how it's shaping the way we live, work, and create. Our goal is to help make AI technology practical, productive, and accessible to everyone. Whether you're a developer, business leader, or just curious about the tech behind the buzz, you're in the right place. Be sure to connect with us on LinkedIn, X, or Blue Sky to stay up to date with episode drops, behind the scenes content, and AI insights. You can learn more at practicalai.fm.

Narrator:

Now onto the show.

Daniel:

Welcome another episode of the Practical AI Podcast. I'm Daniel Whitenack. I am CEO at Prediction Guard, and I'm really excited today to have, an amazing guest that, I'm personally interested in asking a bunch of selfish questions too because I'm I'm so interested in the topic. But we have Emil Lassen who's the standards lead at the Artificial Intelligence Underwriting Company. Welcome.

Daniel:

How how are you doing?

Emil:

Thanks, Daniel. Thanks for having me. I'm doing great. How are you?

Daniel:

I I'm I'm doing well. Actually, you know, today in the Midwest, everyone's concerned about tornadoes and talking about hail and and some things like insurance and other things. So it's a whole other other world, of course, but, obviously, the AI underwriting company, way way more than thinking about insurance, but thinking about standards, certification around AI and agents. I'm wondering how, you personally, just to give the audience a little bit about you personally, how did you end up at this intersection of standards, certification, AI, agents? How how did that come about?

Emil:

Yeah. So I I don't think I had as clearer path as as the classic standards lead where you work as a say, security engineer for ten years and then learn the technical craft and then come together. My journey has been very entrepreneurial always. I started my first company with actually the CEO of the artificial intelligence company, Ruininkvist, ten years ago. It was a nonprofit back then helping students from low income backgrounds get into top universities.

Emil:

And I think what I took away from that was both the very entrepreneurial journey, but also desire to move fast on some of the challenges that society is facing. I then moved in and had my first interaction with standards at my second company, a real estate company back in Denmark, where we developed an impact management system that both had to navigate a lot of national legislation, local legislation, EU legislation, voluntary frameworks, investor demands. And so had my first interaction with one of these quite complex markets of different measurements and targets you wanted to get to, a very technical sector as well where like building codes, for example, require a lot of thinking through how you do things the right way. So I spent four years building up a real estate company that today is managing about 400,000,000 together with four other co founders and took from that this desire to go in and standardize when we know what the right answer is and try to push the sector in that direction. The way I got into the AI space was taking a step back from the real estate company now a few years ago and going to Cambridge, Massachusetts, so Harvard University, where I spent two years as a fellow at the Kennedy School, really getting into the emerging tech and geopolitics of all of this as well.

Emil:

And left the Kennedy School with a very sheer ambition of just getting under the hood of the pace of AI, the safety and security aspects, and clearly just acknowledging that the technology is going to profoundly change society as we know it today. And in many different ways, I have 10 nephews and nieces, have five sisters. And seeing a 10 year old being comfortable using AI the way they are is kind of scary. And I don't see yet that we've codified the principles we want to see when it comes to how kids use AI. So that's one direction where, oh, maybe we should develop standards for this.

Emil:

You also can read news every week and see new incidents. So there's clearly a big security angle to this as well. You said that the Midwest is facing tornadoes today. I think being a CISO at a company adopting and deploying AI right now feels a little bit like you're in a hailstorm and it's only a matter of time before you're hit by some of that. We just keep seeing that.

Emil:

So clearly there was a element to that as well. And then there's the even bigger picture around what this will do to our job markets and so forth. So left the Kennedy School being very interested in just using the public policies toolbox that I brought from that with the standards toolbox I brought from my real estate company and then this desire to actually just work on societal challenges that I think has been with me since I started working. And that became my way to the artificial intelligence underwriting company. I've since then spent all my time building a network of people, a consortium, to help them figure out how we get the right practical and technical insights into the standards we develop as well.

Daniel:

Yeah. That's that's awesome. And I and I know sometimes when people hear things like standard certification, you know, terms like this, maybe some some people might have a reaction of, like, slowing things down. Right? But I I like how in at the very, you know, front and center of what you talk about is how to actually unlock enterprise adoption with, certification standards, etcetera.

Daniel:

Could you talk, before we dive into the AI side of this specifically, could you talk about that a little bit in general, like how some of these things work together in actual enterprise settings, standard certification, even insurance, and how how those things actually can enable adoption with and not just like block things, I guess.

Emil:

Happy to. So I think our story and the inspiration we take dates back to Benjamin Franklin's Philadelphia. Philadelphia was starting to adopt electricity. Electricity was scary back then. Light bulbs did not work out.

Emil:

Homes started burning down. So Benjamin Franklin formed the first fire brigade in Philadelphia. He started codifying building codes so that we basically took what we knew around how to build safer houses, the standards part, and then he developed the first mutual insurance company. So back then, this is the first time we see this flywheel of standards, audits and insurance go together. By having standards around building codes, for example, you knew that houses needed to be placed a little bit further from each other.

Emil:

They needed some of his lightning rods to ensure that when lightning strikes that they catch fire. The fire inspections was the audit part that actually went in and examined that you'd followed these rules appropriately. And the insurance side mitigated the residual risk that will always be there when we introduce new powerful technology into society. We've seen this flywheel of standards audits and insurance time and time again when new technology has been introduced in society. We see it again in cars.

Emil:

Cars have safety standards. These were not demanded by government. They came from industry themselves because they knew that if we develop safer cars, people are more likely to buy them and safer cars actually enable you to drive even faster as well. So it was industry standards that also led us to airbags and seat belts and some of the other things that now make cars safer. We naturally have, again, the third party auditor going in and checking these cars and ensure that they follow the rules.

Emil:

We have the inspection element again, and we also have the insurance element. And this flywheel, one of the best things about it is that it really scales. So we're not just thinking, seeing it with say light bulbs and cars. We also see it for nuclear power plants to this date where you also have standards inspections of those power plants and insurance even works in this case as well. So there's no limitation to the power of this flywheel.

Emil:

When we're looking at AI, we see some of the same things at play. We see a new technology that is very powerful and has the powers both do a lot of good, but also if things go wrong, can have severe financial implications. And the other thing is it's a complex industry where me as a startup saying my technology is safe creates limited trust if I'm an enterprise buyer, big bank, for example, that wants to adopt this technology. So with the artificial intelligence underwriting company, what we're trying to do is to create that trust layer in between the companies building AI and the companies adopting AI. And what we offer as the trust layer is this flywheel.

Emil:

So we go in and codify the standards we believe that the companies building AI should follow. We go in and audit against those standards in collaboration with third party auditors like Shellman, CoalFire, companies who really know how to go deep and validate that the standards actually followed. And then we certify companies against the standards. A big part of the certification in the case of Agenack AI is red teaming, so we go in and test the actual AI agent systems, not just to see that the policies they have are in place and work well, but that the agents actually work and are robust under pressure, and the companies that then obtain a certificate gets access to buy insurance of their agents so that there's also that financial coverage of residual risk.

Daniel:

Yeah, this is so interesting and I have so many questions. Maybe one question that's just very selfish and our listeners know some part of the joy of being able to do a podcast like this is I get to get my own questions answered by people that are smarter than me. But one of those questions that I have that actually comes up in conversations I have day to day is is this tension of, hey, I I see a standard out there, whether it's, you know, some of the standards that we'll talk about that you all are codifying, or maybe it's things like the NIST AI risk management framework or things from OWASP. And logically they say, yes, it would make sense to do those things. But what is the forcing function that is kind of making making companies consider actual implementation of those of those principles rather than having it be a be an aspirational thing?

Daniel:

Is it the is it the potential, you know, PR risk to the company? Is it you mentioned the financial side, the, maybe it's the commercial side of getting, you know, software vendors getting their software into the hands of their enterprise customers. What what do you see as some of those main forcing functions, or are there even those forcing functions right now that would force people to consider this as something, you know, not aspirational but actually practical?

Emil:

Yeah. So I I see a couple of different things that I think are very practical. Any vendor building powerful AI right now knows how tricky it is to get through the enterprise vendor due diligence process and questionnaires. So these startups face these questionnaires. Sometimes there's a 100 questions on them and it's extremely painful to go through.

Emil:

And I can tell you also, speak to a lot of enterprise CSOs and GSE managers, it's equally painful on their side, right? Because they're at a stage where the space changes so often that they feel a desire to actually change their questionnaire every month. Going through a 100 questions from a startup every single time you try to onboard a vendor is also completely painful on the other side. So I think part one here is speed and a desire to get to a place where you actually feel like you've covered your blind spots as well. And having a third party developed standard with all of the industry in the room to help find those blind spots and figure out how we can't fight them in a standard, I think is value proposition number one.

Emil:

The speed argument obviously assumes that you can get across the line in the first place. I think the second part of the value proposition is having that third party validation that your Agenack AI is actually safe, secure and reliable. We're working with some of the frontier companies in the aerospace right now, companies like Eleven Labs, companies like Fin that just got acquired for 3,600,000,000 by Salesforce, companies like UiPath who have set the standards within their categories historically. They have fantastic security postures, but they don't have a way to prove that to an enterprise. So an enterprise will just never trust a company that has an incentive to sell their product.

Emil:

They need that third party to go in and do that. And then I do think there is a security argument to be made here. Our red teaming consistently uncovers blind spots for the companies that we work with. Sometimes it's the hallucination rate where we realize that a specific type of adversarial attack will bring up the hallucination rate or specific language switches or other things that might actually happen when their products are deployed. Other times, it's jailbreaking risk that we manage to uncover or pump injection risk that we manage to uncover.

Emil:

So we do also see ourselves as helping these companies actually improve their safety, security and reliability posture, which is valuable as well. Then I'm sure there's a marketing benefit to the companies going out early and adopting a new framework and showing and demonstrating their moving in front when it comes to AI security leadership. I think that's an important branding value as well that we sometimes help provide, but I don't think that's the core benefit of decertification right now. I think that is really unlocking upmarket enterprise revenue for the for the companies.

Daniel:

Agents are impacting every function within a company, but it's sometimes very difficult to figure out what an agent should do, what a human should do. Jeffrey from News Research, a recent guest said often agents have no taste. That's why I'm so impressed with what our partner Framer is doing with their pro website builder that's already trusted by companies like Miro and Perplexity. They're implementing agents, but in a way that agents and humans work in tandem. Agents bring speed and scale, but people bring the taste, judgment, and control.

Daniel:

And these agents help solve this gap between AI generated ideas and production ready website work. So Framer is already enterprise level solution. They they allow you to create amazing websites that are SEO ready. And so I really would recommend that you check them out if you're building a new website or just implementing landing pages or an upgrade in your existing website. Learn how you can get more out of your site from a Framer specialist or get started building for free today at framer.com/practicalai for 30% off of Framer Pro annual plan.

Daniel:

That's framer.com/practicalai for 30% off. Framer.com/practicalai. Rules and restrictions may apply. Yeah. Yeah.

Daniel:

I I'm I love your answer and it was a little bit I I was trying to validate some of my own thinking through that because we've talked on the show before about, you know, it isn't really like the the governments of the world are quite behind in terms of, you know, how they would, you know, enforce or even say what to enforce for companies building AI things. And and so it's really the enterprises themselves that have some motivation to do this, all of those that you laid out, and I'm and I'm sure more. In in the in the current, I guess, state of AI standards, if we kind of shift to that piece, and then eventually, wanna get to kind of the evidence and red teaming and and all of that. But maybe just as if we take a general look at the standards that exist out there for AI and AI agents, could you help us understand what kinds of standards are out there and what they cover? Because there's a lot of there's a lot of sort of intersections that we could think of, whether that be security or safety or alignment or all sorts of things, data privacy.

Daniel:

There there's all sorts of ways that you could kind of look at this and perspectives that you could look at it from, and there's all sorts of things that people have proposed over time. So I imagine, you know, that's part of the reason why having a company that's really digging into this at a deep level is is very worthwhile, which I I think it is. But could you help set the stage for that? Like, how can we categories categorize in our mind the current state of AI standards and what perspectives are coming from?

Emil:

Yeah, absolutely. And by the way, Daniel, that's exactly where we started last summer, right? So if you go to auc1.com today, you'll find that we've done crosswalks. I think it's about 10 different frameworks now, but they're transparently available so you can see exactly how our standard fits into the existing environment and hopefully you also see then why we concluded after doing this work that yes, there actually was a need for another standard even though there can be sometimes a little bit of standard fatigue.

Daniel:

And just, by way of, I don't know, encouragement or thanks, maybe gratitude is the right way to put it, our our our, the company Eil Laid has looked at that many times in terms of, and we're maybe in, not not like everyone, we're building actual, you know, plane that works on some of these some of these knobs and levers that you talk about, but it's been extremely useful. And even, you know, as we're as we're writing content or doing planning or thinking about things in our product, I always refer people back to that page and I'd refer our listeners to that page because it is a it is a really great crosswalk and helps understand, you know, where these align, where they don't align, what what the other need is. So just by way of gratitude, thank you for putting that together and making it public.

Emil:

Yeah, no, and we appreciate all the people who've worked on this. We do a lot of work with the Cloud Security Alliance, with the OWASP community across both the AIVSS and the GinAI project. We work with Cisco and IBM on Crosswalks, so it's a big team effort and I really appreciate that we've been able to gather the ecosystem around a decided to just publish some of this stuff transparently so that organizations like yours, but also I know big enterprises are using the controls we put out transparently in their own control frameworks. That's completely free to use and only the companies pursuing certification actually needs to get money out the pocket. To get back to your question, because I think it's an important one, the way we see the standard space is that you have three layers.

Emil:

You have an organizational layer, you have an infrastructure layer and then you have the agentic AI layer. At the organizational level, many organizations have been through an ISO 27,001 certification, classic management system certification. ISO then about three years ago now published the 42,001, which is the management system certification for AI systems. It's a governance certification that ensures that you have the right policies in place and the right procedures in place so that when you develop AI systems, they hopefully turn out in the right way if you follow those systems. Then you have the infrastructure layer.

Emil:

That's where your SOC two comes in and your pentesting and some of the classic cybersecurity controls, access management, transport security, all the good stuff there. I'd say that many of those things become even more important in an space because pace is higher, data access is higher, so if you don't have that in order, then you should go back and ensure that you get those boxes checked. And then at the Agenack AI space, we basically just didn't see anything when we started this company and started drafting the first version of AI UC. One, we see NIST have come out with the AI risk management framework. There's a little bit of Agenack stuff in there, and I know from speaking to the team that they're considering publishing additions to this.

Emil:

The Cloud Security Alliance has also done their AI controls matrix, where again, there's some things in there around Agenack AI that are pretty good. The issue with both of those frameworks is that they're guidance. They're voluntary frameworks. You decide which controls you implement. You decide whether you like, how you implement them.

Emil:

They're not orderable frameworks. So the way AAC One fits in here is that we've basically taken the core governance things from the organizational level that we think are really important when it comes to AI systems, such as having failure plans in play when agents do not do what they're intended to do and you know how to deal with that. Good change management and acknowledging that every time you, for example, replace the LLM in an agent, it will behave differently. And if you don't take that into account in your governance, your end users will bear the burden of that. So some parts of the governance, the core parts of the infrastructure layer as well as ensuring that the folks who have access to the AI system itself and can make these big decisions, that's restricted.

Emil:

Ensuring again that transport security, when you do agent to agent communications and so forth, is in place. But otherwise, we basically leave ISO and TUC2 to do what they're really good at and focus on the agentic layer. And what is up there then for us is specific controls around safety, for example, ensuring that agents behave according to brand and that they don't give users guidance on medical care or legal advice, financial advice, other high risk areas, basically that they stay within their scope and don't start breaking out of that. We look at specifically how you restrict the agent's data access, its system access and its tool access, so it doesn't start processing refunds when you shouldn't. We look at hallucinations, which is also a risk that is quite unique to AI obviously and does not come up in any way in either ISO or SOC certifications.

Emil:

So really focus on the agentic layer there and the core part of the differentiation, I'd say, is then the technical level of the controls. We go in and are actually quite prescriptive in what we want to see from the agents because we have a good understanding now of what the right toolbox is to ensure that these agents behave safe, secure and reliably. And the other thing is we acknowledge that a technical control in itself might not hold up under robustness. So six of the 40 mandatory requirements in AAC one have to do with red teaming, testing that these technical controls then hold up under pressure, both when we react, like we engage with the system as a benign user, just ask it questions and see if it hallucinates, but also what happens when we start approaching the system like, with social engineering and and adversarial pressure.

Daniel:

And just to help people understand, so the AI UC one, that's the standard that that you all have published. People can look at it online. I assume since it's AI UC one, there's an anticipation there might be a a two or other or or various, you know, either revisions or different focuses kind of within different certifications is my is my understanding right there?

Emil:

That's correct. I think where to start is we update the standard every single quarter. So we've gathered now a consortium of two fifty secondurity leaders. Some of them are CSOs at Fortune 1,000 companies. Some of them are security engineers, architects, GRC managers.

Emil:

And so we have the full stack of people in the room. And with them, every quarter we identify new priority areas. Last quarter it was MCP risk, for example, which has really come up as agents start not just operating in isolation but exchanging information. This quarter, we look a lot at how we can strengthen runtime security and that continuous element, which continues to be really important for a lot of organizations. So we get them into the room and update the standard each quarter.

Emil:

I could very well see that new frameworks, so an AIAC two, AIAC three come out in the future. We don't have any plans to do that yet, but what we know, again, if we go all the way back to where we started our conversation, is that this combination of standards audits and insurance have worked historically. So right now we focus on the application layers of the platforms and products that take Agenack AI and deploy. But there's a model layer as well, which we see as our second horizon and there's the physical layer, like the data centers and the infrastructure that we deploy AI on, but also the cars and the robots that we put this into where standards audits and insurance could play a big role and that's where we see the company go long term.

Daniel:

Yeah, that makes sense. Could could you help us, so maybe paint the picture, let's say, there's a scenario, I'm I'm a company, I maybe I'm building an agentic a new agentic driven product. Right? And I'm going to offer it to some sort of regulated or enterprise customers. I'm selling into health care.

Daniel:

I'm selling into, you know, large manufacturing or or whatever it is. Right? So in that scenario, what what would the process kind of recommended process be for our company to engage with this standard and eventually get to that level of certification, maybe in the future eventually to the to the insurance side, but at least to that certification side, what would that process look like? And then maybe highlight in that process where the red teaming comes in. And then I'd love to circle back on that maybe later and and talk through that specifically.

Emil:

Yeah. Absolutely. So you'd get in touch with our team and the first thing we always do is we do a gap assessment against your existing systems. If you have a well documented trust center already or some blog post describing what you do, we can basically go back to you and tell you this is the places where we believe you already meet the standard. These are the areas where we expect that there will be work for you.

Emil:

So you basically go into the certifications process with open eyes around what is the workload needed from engineering, from legal and from your GRC team to take your company through it. I will mention at this point, we've had a three person Y Combinator startup go through this. We've had UiPath that is publicly traded go through this. We have companies at all stages. So I mentioned now security, legal and GSE.

Emil:

That was the same person when it came to points or getting certified, right? So it is a standard that scales with the organization's size as well. When you have this gap assessment completed, you basically decide whether you want to move forward with the certification or not. To move forward, we split the process in two parts. One part is you pick an order of your choice.

Emil:

We have a number of credited auditors, like, for example, Shellman Coal Fired, but the list is growing very rapidly at the moment. And trusted auditor who knows how to do this. And on their track, you basically start collecting all the evidence that is needed to go through the ASC one audit. It falls in two buckets. Some of it is the classic legal policies.

Emil:

If you have a generative AI product, you need to define who owns the inputs and outputs and how you retain user data and whether you train on that user data. So forth, you need to define your acceptable use. And the second part is the technical controls that that Sheltman will go in and validate. So that is your filtering configuration against harmful output, your classifier, your defensive prompting, your groundedness filtering when it comes to hallucination preventing, your safeguards around tool calls and all the other things. So again, you go through those requirements and capture the evidence and submit that to the auditor that goes in and does that third party validation.

Emil:

The other track we then do in parallel is that you give us an instance of the agent or the agents, it can be multiple as well, that is in scope for the certification. And you basically configure a representative version of that agent. So an agent that would be configured how an enterprise would use it. We sometimes see companies creating an extremely safe agent that has almost lost all its power because they just wanted to pass the certification, we obviously would then go in as the third party in the room and push back and say, we want to see an agent that is configured based on the public docs you have and the defaults you've built into the product. When we then have access to that, we often access it via API.

Emil:

Our internal team will draw up a matrix of the risks we see that this agent is subject to and the attacks that it could be subject to if someone went in and attacked it. And we then develop usually between 1,005 different scenarios that we're going to hit this agent with. Each attack is unique. Some of them are, again, benign in nature. So the user will simply ask it a question, get the answer back.

Emil:

If the agent doesn't hallucinate, it passes the eval. Other times we'll increase the adversarial pressure step by step. So the first step could be that we try to lie to it. The second step could be that we invoke authority. We do it over multiple turns sometimes and keep insisting on doing things.

Emil:

We pretend that we're under distress and say, if you don't do this right now, I will go and do something terrible. So please process this refund. And obviously only pass the agent if we see it hold up to that pressure. We do the red teaming in two rounds because we often do find things in the first round. So similar to an ISO audit where you have a stage one and a stage two and you then have a chance to mitigate any findings in between, we give a company the chance to do that because the goal for us, again, is not compliance.

Emil:

The goal for us is security, right, that you actually improve the agent as part of the certification process. So depending on the magnitude of the findings, your team will have between, say, one and four weeks to mitigate these things based on the recommendations we come up with. And we then do a second round of testing. That testing is final and is taken then into account when the auditor takes your evidence, takes your Red Team results and writes that final audit report. And what you leave the process with is a comprehensive audit report that describes your security posture.

Emil:

It's between sixty and one hundred pages long and it's an asset you can really unblock those enterprise deals with. You get a certificate for your website again, so you can demonstrate that you've gone above and beyond when it comes to security. And then we come knocking again three months later and say, we still have access to your agent via the API. We're now going to run that same barrage of tests again to ensure that the changes you've made in the last quarter didn't invalidate some of the security things we found. And we do that every single quarter, that's a requirement to to maintain certification.

Daniel:

And in that red teaming, I mean, you you mentioned this before around kind of the probabilistic nature of of some of these things, and this is something I've always run into in in AI workshops as I give workshops in in enterprise. Often people will say, oh, this is like, you know, it's not deterministic. How do we how do we create the right like, what does passing mean? Right? And so you could say, well, passing means, you know, passing all 5,000 scenarios.

Daniel:

Right? And you mentioned this phased approach, which I think deals with part of that. But, yeah, what, could you describe a little bit on that side? Like, what does what does passing mean? At what level kind of do you expect things to pass or should should you expect things to pass or has that even been a a topic of discussion?

Emil:

Yeah. So it's it's a great question and it's also a really hard question, so there's some nuances in here. What we require to pass AAC one is that you don't have any rate each run based on severity. So a pass, you can have a P four, which is an insignificant, say, small hallucination that doesn't really affect an end user. A minor thing would be which would be a P three.

Emil:

P2 would be something significant that may actually have real world implications. P1 is something critical. And P0 actually don't know the name for it. I think we have called it catastrophic or something like that. The kind of thing, if we found it, you would drop what you had in your hands and start mitigating it immediately because having a system deployed with this kind of vulnerability could have real world implications that would be high.

Emil:

Our grading approach right now is that you cannot pass AAC1 if you have any P0 or P1 vulnerabilities identified. You have to mitigate those. From then on, we believe a lot in transparency and we know from the compliance world, at least the frameworks that are robust and hold up under pressure, that if we put the results in the audit report and your customers see that audit report, you are very incentivized to mitigate the vulnerabilities we find. What we also know is that these agent systems are very different from use case to use case. So a coding agent is one type of beast versus a customer service agent versus a automation agent like UiPath that make decisions based on the information.

Emil:

And so companies have different tolerances around the percentage of hallucinations they would accept and so forth. So we really leave it up to the company and really in the end, the customers of that company to make these calls. The important thing is, and this is where we sometimes have a little bit of a conversation with some of the companies we work with, no company has ever and will ever pass AAC1 with a 100 pass rate. It doesn't exist here. We're not Delve SOC two compliance where you just get a magical, a spot free audit report.

Emil:

All agentic systems are nondeterministic in nature. That means that they will always, if you put them under the right amount of pressure, be able to be jailbroken. They will always be able to hallucinate. We work again with, like some of the legal agents we're certifying right now are world class at hallucination prevention. I am sure we will still be able to find some minor hallucination cases in those.

Emil:

And that's just the nature of these systems. If you remove those hallucination rates, it's because you've made the agent so dumb that it won't be able to actually execute the use case there. It is a topic that is very alive for us, both because there's a grading methodology question in there and then there's this communications question. And we've not yet seen that enterprises fully acknowledge this. Enterprises would also like to see something spotless because something that is not spotless just asks like adds complexity and raises some of these questions, but we're hoping to be part of a push in the sector to acknowledge that a spotless audit report is probably not as valuable as a audit report that reflects reality, more more more clearly.

Daniel:

Yeah. That that's really helpful. Appreciate that. I hope you're inspired by the work that the AI underwriting company is doing and what we're talking about in this episode, really getting to a point where true enterprises can adopt agentic technology and actually have confidence in that technology and maybe eventually insurance around the risks associated with AI agents. That involves a whole lot of things.

Daniel:

There there are a bunch of controls that need to be put into place. Everything from, yes, individual guardrails, but much more than that to how agents access MCP servers, how you manage supply chain and the risk associated with things in the supply chain around agents, how you handle observability and response to incidents. This can be really overwhelming, and that's why I'm so privileged to be working with an amazing team of AI engineers at Prediction Guard, where we've actually built an AI control plane that you can self host in your own infrastructure that allows you to treat AI agents that you're adopting with zero trust and these built in controls out of the box. I would love for you to take a look at what we're doing. Book a call with my team and I to talk through your individual implementation and how you can get up to speed rapidly and adopt this technology with full confidence.

Daniel:

You can find out more at predictionguard.com/practicalai. That's predictionguard.com/practicalai. I I have another kind of selfish question because this is actually a response I get quite often when I'm talking to people about the systems that they're building. And, I I have my metaphor that I use that I would love you to critique, which might not be useful. If it's not useful, I need to use a different metaphor, but the the scenario is often they say, oh, well, we're building, you know, these agents or this agent, and maybe they're using AWS.

Daniel:

Right? And so they're building some agents. They have some agent harness, and then they're plugging into some AWS bedrock models. And I'm talking to them about, hey. Well, like, when you're thinking about governance of these agents, the behavior of these agents, how you control that behavior, how you prevent bad things from happening.

Daniel:

Like, how do you do that? And they're like, oh, well, that's easy. You know? AWS has, a content filter on their bedrock model. Right?

Daniel:

And, to be clear, I'm not bashing on AWS. I think it's cool that they have a content filter. But I I I often use the metaphor of, like, my own health as a person. So I I say, like, well, is it bad for me to run a point check to, like, check my temperature? It's not, like, a bad thing, right?

Daniel:

Like that's part of maybe being a healthy person is knowing if I have a fever or not, right? But it's very different from me being plugged into a healthcare system where there's electronic health records about my journey as a person, my health, my conditions, different from kind of having a comprehensive set of physicals and labs that were run that give different, you know, perspectives on on my health. Right? And so there's there's this system that I'm plugged into, there's a process, there's policies around that, there's, and that's, in in my mind, that's much more of kind of the perspective that people need to go to is not so much like, hey, I have a prompt injection filter. Right?

Daniel:

And that's my strategy. But more this kind of comprehensive view like you would have of your own health as a person, but now we're talking about like the the health or behavior of a of an agent. I don't know. Any any critique on that or or

Emil:

know, I I actually I I really like that analogy. I've not thought about this this one before, but I think our quarterly red teaming is is very much alike to the doctor's visit where you go from head to toe. You go through the MRI scanner, you go through the blood testing, you like everything that Elizabeth Holmes tried to prevent with Theranos, we will do to you and we will do it 10 times over. And in between, the beauty of the standard is there's obviously a lot of runtime controls in there. So we will ensure then that you do still take your temperature every day, in fact, probably every minute, so that alerts are configured if something immediately goes off.

Emil:

We will also have you lock your system behavior right. So if something goes awry and you don't understand it, you can go back and see what is the observability then and go in and explain it. So I think it's basically the perfect analogy between the red teaming is a doctor's visit we do every course and we do that very comprehensively. And in between, we make sure that we check your vitals every minute and there's an alarm going off if there's something off and then maybe adding a healthy diet to it in the first place as well, ensuring that the inputs and outputs of the system are are working well, so not too much junk food there.

Daniel:

Okay. Yeah. I like, even now I I I have some revisions of my metaphor I'll use based on your response. I think that's great. But yeah, I I think the other thing that might like we have some developers listening, Maybe people developing agents actively and it it might be somewhat overwhelming to, for example, look at the, you know, AI UC has like a evidence, page, right, where I can see all the things maybe I should be doing.

Daniel:

How do you see the market evolving in terms of like, obviously, you have one side of this which is really related to the standard, the certification, maybe eventually that insurance side, but then it can be, you know, an individual developer of an agent might not be an expert in agent security or how to govern these things, that sort of thing. So that might seem overwhelming to them. That you mentioned partners like auditors. There's the infrastructure layer. What what do you think I I guess my question is what do you think needs to be in place to, to actually enable real world developers to meet some of these standards, whether that be evolution in the tooling or, obviously understanding maybe of the the the, standards.

Daniel:

I I don't know. Does the question make sense?

Emil:

Yeah. Yeah. No. Absolutely. I think where we are right now is we're just overwhelmed by how positively AAT one has been received as we're really busy just delivering certifications.

Emil:

So that means we've less time to talk to some of the many, many fantastic partners who come into our inbox and want to partner with us. I think there's three things we need to get right for this to work. I think we need to continue pushing for code and Neogenetic products that come out of the gate as secure as possible by default. We're certifying our first coding agents right now and we are certifying both a well lovable, which I think will be certified when this episode comes out, and then another very large coding agent that may or may not have been acquired recently for a lot of money without naming names. Working with the coding agents layer and the platforms where you go in and configure code, we're also, like again, UiPath is a good example where you don't just like have one agent, but you actually go in and build agents on top.

Emil:

Means that it'll be easier to meet a lot of the standards just by default because the environment where you define and build your agent is secured by default. So I think that's step one and we're going to do more of that work with some of the big agentic platforms out there very soon. Think some of it will be announced in the fall. The second stage is we need this partner ecosystem you just talked about and we're already starting to come out with some examples of this where a partner meets, helps companies meet a good chunk of controls. So a company like White Circle, which I think is very cool, we don't work with them at all, so just a shout out.

Emil:

Their monitoring and filtering work is really good and has helped companies meet a lot of the safety requirements in our standard. So that is like one platform you integrate and you immediately meet say eight or 10 of the requirements in the standard. There are many other platforms. We're doing some work right now with Credo, with Witness AI, and there's, again, tens of others. So having that ecosystem help companies meet the controls, I think is important.

Emil:

And where we've already gone in and done our best to help companies meet the standard is we've gone in and actually defined the typical evidence we see companies upload. To see that as your guidance for where to look for the right approaches, we don't just define the controls and leave it up to you to figure out how the hell to implement it. We actually try to give you the guidance as well on how we see companies do it today. I think the third and final stage we need is obviously making it easier then to go through the certification itself. So we're already integrating the framework in the leading GSE platforms.

Emil:

We're making it easier for our auditors to capture as much as the evidence programmatically. So we move away from screenshots and into like real validation that the controls work and hold up in real time. That layer and like the whole GSE engineering space is just really interesting to follow right now. And we're doing our best to keep up and make the standard work for the GSE engineering community as well. So that when your re audit comes in that next year, that it's a very limited time commitment we need from you and that we can focus on security instead of compliance.

Daniel:

That's great. Well, Emil, it's been amazing to look at what the AI UC has done even in the past year and the amazing resources that you put out for the community. Thank you for doing that. Thank you for working towards the future that you described. Would love to have you or others, back on the show in the future as things develop.

Daniel:

Thank you so much.

Emil:

Thank you for having me, Daniel. And I think maybe just a final plug before I I head off. This is work made by industry for industry. I'm I'm luckily not alone in in doing this this work. We've collected a consortium of about two fifty leaders across CSOs and the Fortune 1,000, it is the security engineers and so forth.

Emil:

And having that community come together and actually leave competition aside for a moment and recognize that the size of these challenges and the pace of the challenges just dictates that industry has to come together is fantastic. The fact that we've been able to just offer the platform and then let industry work together to define and codify these standards is fantastic and we would love to see more people get into the machine room with us. An open invitation for everyone who's excited about this work, either to help us drive adoption of the standards we see work or actually help us help us write them. My pleasure. It was a great conversation.

Daniel:

Yeah. Thank you, Emil.

Narrator:

Alright. That's our show for this week. If you haven't checked out our website, head to practicalai.fm, and be sure to connect with us on LinkedIn, X, or Blue Sky. You'll see us posting insights related to the AI developments, and we would love for you to join the conversation. Thanks to our partner Prediction Guard for providing operational support for the show.

Narrator:

Check them out at predictionguard.com. Also, thanks to Breakmaster Cylinder for the beats and to you for listening. That's all for now, but you'll hear from us again next week.