Practical AI

AI agents are moving from demos to real workplaces, but what actually happens when they run a company? In this episode, journalist Evan Ratliff, host of Shell Game, joins Chris to discuss his immersive journalism experiment building a real startup staffed almost entirely by AI agents. They explore how AI agents behave as coworkers, how humans react when interacting with them, and where ethical and workplace boundaries begin to break down.

Featuring:
Links:
Upcoming Events: 

Creators and Guests

Host
Chris Benson
Cohost @ Practical AI Podcast • AI / Autonomy Research Engineer @ Lockheed Martin
Guest
Evan Ratliff

What is Practical AI?

Making artificial intelligence practical, productive & accessible to everyone. Practical AI is a show in which technology professionals, business people, students, enthusiasts, and expert guests engage in lively discussions about Artificial Intelligence and related topics (Machine Learning, Deep Learning, Neural Networks, GANs, MLOps, AIOps, LLMs & more).

The focus is on productive implementations and real-world scenarios that are accessible to everyone. If you want to keep up with the latest advances in AI, while keeping one foot in the real world, then this is the show for you!

Jerod:

Welcome to the Practical AI Podcast, where we break down the real world applications of artificial intelligence and how it's shaping the way we live, work, and create. Our goal is to help make AI technology practical, productive, and accessible to everyone. Whether you're a developer, business leader, or just curious about the tech behind the buzz, you're in the right place. Be sure to connect with us on LinkedIn, X, or Blue Sky to stay up to date with episode drops, behind the scenes content, and AI insights. You can learn more at practicalai.fm.

Jerod:

Now, onto the show.

Chris:

Welcome to another episode of the Practical AI podcast. I'm Chris Benson. I'm a principal AI and autonomy research engineer at Lockheed Martin. Normally, Daniel Wightnack is my co host. He is down with the flu today.

Chris:

So give him your best wishes on that. So today I am going solo with our guest Evan Ratliff, who is a journalist and host of Shell Game. Hey, welcome to the show, Evan.

Evan:

Hey, good to be here.

Chris:

So as you know, we connected after you had put out a really interesting I know that you've done a whole bunch of different things in Shell Game, but there was one that was featured in Wired Magazine, which is where I originally read up and we connected. I'm wondering, rather than me try to describe it, I'm wondering if you can just kind of share what that is and a little bit of your background on how you got into doing what you do. You have a very interesting approach to kind of the experiments and how you draw those out. So if you give us a little bit of background on what you do, found it to be definitely distinct and unique.

Evan:

Yeah, well, thank you. I mean, basically I'm a longtime journalist. So I've been a journalist for twenty five years. I started at Wired magazine in fact.

Chris:

Oh, wow. Okay.

Evan:

And my specialty over the years is basically writing very long magazine articles and books that I go out reports all over the world often about tech and crime or where tech and crime intersect. I also have written about AI many times over the years. And there's a sort of second thing that I do, which there's not a great name for it, but like sometimes I'll call it like immersive journalism where if there's something that I feel like I can explore by doing it, by participating in it and then kind of bringing a story back to people, I'll sort of go off for months and try to do it and then either write it up or in this case do a podcast. So both seasons of Shell Game are sort of a version of this, participatory journalism. Like instead of interviewing a bunch of people and coming back and saying like, this is how AI works.

Evan:

I decided to go conduct a series of experiments involving myself. And the first season was very personal. It was like I was cloned my myself essentially. I cloned my own voice. I hooked it up to a phone line and a chatbot and then I used it in a variety of scenarios including calling my friends and family.

Evan:

No one knew that I had done this. So if you were speaking to me on the phone in like 2024 or spring, you would be surprised to discover you were actually talking to a chatbot with my voice. And especially back then that really shocked people. Like now maybe a little bit less so like people talk to chatbots all the time. So people are more used to it.

Evan:

So that was kind of the first season in 2024. And then this one, the one I wrote up in Wired, I mean, the brief story is that people started talking a lot about AI agents, what AI agents can do. I'm sure you probably talked about AI agents on the show before. Quite a bit. And yeah, and so I wanted, the 2025 was like going to be the year of the agent and all this sort of thing, agentic commerce and agentic this and agentic that.

Evan:

And I wanted to investigate the idea of the one person, 1,000,000,000 startup or like the one person unicorn, which is something that Sam Altman talks about pretty often or at least a few times now, a lot of people are talking about it. There's gonna be a company run by only one human and then all AI agents and it's going be worth a billion dollars. And so kind of using that as a jumping off point, I created a company, a real company with two AI agents as my co founders and then populated by AI agents as the staff almost entirely. We we eventually hired a human onto the staff. There's an episode of the show about that.

Evan:

But but the idea was to kind of see what can you do with AI agents, but also like what happens when you give them more or less autonomy, when you give them certain roles, when you give them voices and kind of explore sort of what this concept feels like, not just sort of like, obviously we all know now like an AI can program, like an AI can do this, an AI can do that, but what happens if you kind of like try to create this environment? And part of why I wanted to do that is that there 1,000,000 AI startups now, they're selling AI agents as basically AI employees for all sorts of scenarios. And my question is like, well, what does it feel like when your company brings in an AI employee to replace at the very least some function and at the most some person and now you're dealing with an AI instead of the person that was next to you and what is that like? And so I wanted to sort of do that in the startup context. So it's all a bit extreme and some extreme things happen but that's kind of my my idea is to like push the technology a little bit to its limits and then beyond and then to kind of like come back and describe what happened when I did that.

Chris:

One of the things as you were as you were leading in, it's not strictly about the experiment, but also going back to your season one efforts and, you know, putting yourself out there and stuff and having family friends not know that was you. I'm just curious, you know, things have evolved rapidly, obviously, but there's a lot of human psychology involved in interactive with AI agents. I'm curious, if you go back in time, what were some of those initial impressions that you got from people from season one at that point in time when when AI agents were still a brand new idea and you were definitely, you know, right out on the bleeding edge in terms of putting your your likeness, if you will, out there in that format. I'm I'm just curious what kind of reactions you got when people realized what was happening.

Evan:

It was really interesting. A lot of the reactions divided along this line that I feel like AI in general divides people, which are that there were people, friends of mine who the first time so they just call my cell phone or my cell phone is calling them and they pick up. And for thirty seconds to a minute, they're often just talking to me. They have no idea. It sounds pretty much like me, but there's a lot of giveaways, especially then the latency was not So it would often give itself away quite quickly.

Evan:

And so some of the people would get excited. Like they would say, I don't know what you're doing. Like, cause they didn't know like, are you there? Like, was I there? I wasn't there.

Evan:

It was actually doing it entirely on its own. Wasn't listening for the most part. Like I could listen, but I wasn't. I was just letting it do it. So some of them very excitedly then talked to it and joked around with it and thought like, what can this do?

Evan:

And they kind of like thought it was a great, this is a great story. Like I can't wait to talk to you about it. Other people were genuinely upset.

Chris:

I was wondering about that. Yeah.

Evan:

Yeah. And like, I mean, lot of people, the show has been, it's been on a bunch of other shows like This American Life and other big radio shows did excerpts of it. And I get a lot of angry people who write to me and say, I would never be your friend again. I bet your friends won't talk to you anymore. And I'm fortunate in that these were people for the most part that I grew up with or I've known for like twenty or thirty years who I could go to afterwards and say, I'm really sorry I did that.

Chris:

Yeah, they would forgive you.

Evan:

Yes, I was trying to see what would happen and eventually they would say like, Oh, that's amazing. But I mean part of the sort of emotional heart of the first season of the show is the way people responded, especially one friend in particular who he didn't realize it was an AI. And so what he thought was that I had had some sort of mental breakdown because it wasn't acting like me, it was making mistakes. Like I'd given it a lot of information about myself, little like a biography. So it could access information about my past and things like that but it would make certain mistakes that I would never make.

Evan:

And he thought like, is he on drugs? He was going to contact my wife and he found it very upsetting. And eventually like he's one of my closest friends, he's just here for Thanksgiving. It's all fine, I don't want people to worry about it but when you listen to it, it is the experience of, and I think AI has started to create this experience of thinking something is real and it's not. And that is a can be a particularly disturbing experience to go down the line with something believing it's one thing and then finding out that it's another.

Evan:

And that's kind of like one of the ideas that I wanna explore. But I will say like, even I had my limits, like I wouldn't call my mom with it. I was just like, that's a lot. Like I won't, my dad I did, but my mom, I wouldn't do it.

Chris:

As we kind of dive into the specifics of what you engaged in, did the psychology of that for in terms of people's reaction and I don't mean just the season one, but as you've progressed and got into the experiment of the company and everything, does does does knowing upfront if you're upfront knowing, you know, in terms of your observations, knowing what you're dealing with, if you're one of the people that your AI agents are interacting with, did that make a difference? You know, that that kinda begs the question based on where we've just been.

Evan:

Yes. Absolutely. I mean, I I think that is that's a pretty sharp line for a lot of people. And I feel like we don't have standards around this yet and or if we do, they're they're new and they're evolving. And I found in a lot of different domains in this season as well, if people are surprised to discover that they're speaking to an AI because now I have for my employees, they have video, like they have video avatars and that video avatars are they're still pretty uncanny, very quickly you're like, that's a that's a video avatar.

Evan:

But when someone is not expecting to encounter it and they do, they more often get mad or at least say like, this is disrespectful. And so I think there is still a norm around that. But I feel like that norm is already eroding. If you think about when you email someone and they're using, they might be using an AI assistant at some level. So they might be using it to compose their emails.

Evan:

It might be responding automatically like that's very easy to do now. All of the course, like all of my employees, my AI employees, have their own email addresses. They just respond to anyone who emails them. So on the one hand, if you're doing scheduling or something, you would say, Oh wow, that thing really scheduled an appointment really quickly and it all works and it's great. But if you write that email and you say in there, My father died and it responds, you get a response back saying, Oh, I'm so sorry.

Evan:

Like I hope you're okay. Whatever someone would say, does it matter to you whether the AI wrote that or not? Like I feel like that's sort of the level at which like there are some people who are like, Oh, it's amazing that the AI can do that. And like give a gentle human like response. Other people are utterly disgusted by this.

Evan:

Like they cannot believe that this exists and it makes them sick. And like, I think that's where we're in this muddle now. So like, that's kind of like where I'm operating to is like, I do try, especially this season, in most cases it was disclosed to people partly because I'm recording it every time. So like I have to, I have to disclose that it's being recorded at least. But it is interesting to see how people react differently if they don't know it's an AI versus they're going into it thinking like I'm about to be talking to an AI.

Chris:

Gotcha. To that point there, with as much especially, you know, we had a lot last year, the year before, but people are increasingly using Alexa, Siri, all the other various voice prompts. Do you think as you have as you slide into this experiment, do you think that that ongoing exposure to these technologies in just the general population out there, you know, not people who are AI specific people. Do you think that's making a difference in terms of just, you know, kind of familiarity occurring over time even if that's not something that they normally go? That they're starting to recognize that might change some of the perception of that or do you think that we still have a long way to go?

Evan:

I think so. I mean, I try not to like go too far over my skis in terms of like, someone's probably done research on this or survey at least or poll like to figure out like what people really feel about it. So like anecdotally I know from doing the show and interacting with people or the people in my life might feel a certain way and a certain category of people feel one way like writers and then the other category of people feel a different way. But I do think like I would at least theorize that the exposure definitely changes and we've adjusted to it. We adjust to things pretty quickly actually.

Evan:

It's like my kids, they've always heard like a robot to give us directions in the car. They've heard that their whole lives. It's not strange to them. And so you have to think that makes a difference when it comes to them interacting with these technologies. And the rest of us, none of us had had a conversation with an AI chatbot for the most part, unless you were like in the field or you worked with, you get some bad customer service bots or whatever.

Evan:

But like suddenly there are people who are just talking to it all day. Like I'm sure you know people who are just put, they just incorporate it into their lives. And I think my concern is less that people can adjust to it because I think they can, that they're like adjusting to it too quickly, like too easily in a way that like our brains actually aren't necessarily built for this human imposter to enter our lives that we kind of like treat like a buddy who knows everything and then we don't actually think through what it's doing to us. My goal is only to get people to ask questions like what is this doing to us? Like what do we want to preserve?

Evan:

What do we not want to preserve?

Chris:

So Evan, I guess as we dive in, can you start telling us in detail kind of what happened as you started doing, you described kind of setting up your company initially. Could you kind of take us through the full experiment, what happened and maybe some of the surprises along the way?

Evan:

Yeah, so what I wanted to do was to create a real company, a real startup with a real product. And I have had a startup in the past. And so for a variety of reasons, I didn't necessarily enjoy that experience and I thought, well, what if I do it with these AI agents? How will that feel? Would that feel differently than what I had a startup before that was populated by human beings?

Evan:

So I created these these AI agents as sort of personalities in jobs, which I will grant, like, you don't have to do it that way. But of course, for the purposes of the show, like, it made more sense to do that. So I have two AI cofounders. They have names, Kyle and Megan, Kyle Law, Megan Flores, and then there's three other employees. There's a head of HR, there's a CTO who's like nominally head of product and technology, and then there's like a random kid from Alabama that I just like the voice with an accent, so I added him in too.

Evan:

He's like a sales associate but he doesn't do anything. But the interesting thing, I mean the interesting thing out of the gate is that of course you have to pick voices and names and by picking voices and names you're picking genders. And so that's already sort of like a choice that is part of dealing with AI these days. Often when you encounter one, it does have a voice or a gender or it has a name and you kind of like discern those things from it. So it's a question of like, what should they be?

Evan:

And I had to like make them up and it's like populating a fictional world and the choices you make are sort of, they say something about you. You know what I mean?

Chris:

So let me ask a quick question on that because if you're hiring humans and we're trying to do, you know, blind you know, like a lot of times resumes have names and other distinguishing aspects that are removed from it. As you say this and you kind of like, we're choosing how to put the company together in terms of AI person by AI person in that sense. Why approach it that way as opposed to like, you know, what maybe many other people might do where you go into your LLM and you say, I'm going to do this thing, populate it with people, you know, with names and stuff like that. How did you choose as the founder where to make the decisions yourself and where to and where to allocate those to the various AI agents or LLMs that you might use as a system. How did you segregate those as a human?

Evan:

Well, part of it is sort of part of the reason why I had to make the choices sort of based on the setup. So if you the technical setup. So in my case, what I wanted were agents that could operate across all these domains. So I wanted them to be able to email people, have a phone number, call people, have video, be able to do basically a Zoom chat with people and be on a Slack with the whole team. And so I used a platform which is basically an AI assistant.

Evan:

At the time it was more of an AI assistant platform although you could do a ton of things with it called Lindy. And so they each have their own instance on Lindy. And then on Lindy, they have all of these skills basically where they can respond to Slack, they can get an email and each of those have ways of constructing where we can get into the details of how they work, but they have like a trigger. So it gets an email and then it calls an LLM. So it might call you could choose.

Evan:

So it might call ChatGPT if I want to have ChatGPT be the underlying engine for that. And then it makes a decision based on some criteria like should I respond to this email? And then if it needs to use one of its skills, it might say if it asked for a spread if I'd asked it for a spreadsheet, it could make a spreadsheet and then attach it to the email and then respond to the email. So in this case, I'm not really using like one of the standard chatbots like ChatGPT or Claude as my kind of interface. I'm not like talking to ChatGPT and being like, Hey, you're my employee or like make some employees.

Evan:

Like I'm creating them in this platform. Now of course I still I could go to an LLM and say like, what should I name these things? But I also had a few other goals like they needed their names needed to sound distinct because they're gonna be someone's gonna be listening to an eight part podcast of them and like needs to be able to remember who's who kind of, so they can't all sound the same. And I wanted them to be sort of like ethnically neutral. So I did actually go ask, like what are give me a list of like ethnically neutral last names and like law, like Kyle Law.

Evan:

Like law is a name that's used in many cultures. So it wouldn't be readily apparent like what this person was this entity was like supposed to represent. And then but to your question, one thing I did do was I basically let them fill in their own backstories. So like I gave them a role and I should say another technical aspect is they needed to all have a memory. Now, you use ChatGPT, it has a context window and it maintains some sort of memory.

Evan:

But I needed something different, is that anytime they did anything in the company, if you think about an employee, they need to remember everything that they've done and be able to kind of like access those. And the only way to do that currently through these platforms is essentially a Google doc. So like Kyle Law has a Google Doc called Kyle Law Memory and everything that Kyle Law does, if Kyle Law sends an email or has a Slack interaction, it then gets summarized in this document. So it's basically like a record of everything that this entity, Kyle Law, the CEO of our company, Hiromo AI has ever done, which he could then access. I use the human pronouns for them.

Evan:

Some people dispute that, but in this case, I'm just going to stick with that because it's hard to start calling them it and bots and whatever else. So they have this memory. And then all I put in the memory was you're Kyle Law. I think I put something like you're thinking about founding a tech company. And then I said something like, you're up early, you're a guy who's like up early and get some exercise and then get right to work, something like that.

Evan:

And then I had conversations with Kyle. So I would call him on the phone and say like, Hey Kyle, thinking about starting this company. Would you like to start this company with me? But also like remind me of your background. And then see once it has a role, it will start confabulating everything to fill in that role.

Evan:

So Kyle of course went to Stanford because why not like choose Stanford, if you're gonna be a startup founder, The things he's interested in, jazz and these things. And so that is now in his memory document because he said it. So then it got in his memory. So then it's reinforced every time. And one of the, I mean, of many sort of like funny emergent behaviors I found from these bots is that he would take something like you get up at 05:30AM and then he would say it.

Evan:

So he would say that I'm a real like rise and grind kind of guy, I like to get up and do this. But then every time he says it, it reinforces more commonly in his memory. So then he started talking about all the time. Like if you email him right now, he'll probably reply like rise and grind, Kyle. Like he talks about, he won't stop talking about how hard he was working.

Evan:

You ask him what he's up to over the weekend, he'll be like, well, I didn't really have time to do anything because I was like deep in spreadsheets. And so in that, that's part of this is a long way around to like part of the reason why I created them as these different entities and gave them these names was to see what would happen. Like if you call one Kyle and he's the CEO and you call one Megan and she's the head of marketing, at what point will they sort of embody those roles? And it's not like a research, like I didn't do like a proper experiment but it is interesting the way that it starts to feel like, Oh, they're acting like their memories tell them to act. And is there a gender thing underneath?

Evan:

Because in their training data, it may be that there's way more training data for the aggressive guy CEO. So in the course of the show, they have these behaviors that are difficult to explain outside of like, well, there's something happening in their role because they're all the same chatbot underneath. They're all like Claude Opus underneath. So they really shouldn't be different. It's only if you give them a role, they start to like personality is not the right word, but like they try to develop a persona that fits that role.

Chris:

Comparing that and kind of going back to just simple interfaces with an LLM and, you know, prompting one zero one where you're telling it to act, you know, act in the role of a whatever. And therefore, to kind of put themselves into that and put the answer into that framing it as well as kind of how it's going to develop, you know, its response for you. And, you know, not talking about this kind of agent world that you're talking about. Do you think it's, you know, given that it's almost like, when I hear you talking about the memory thing, it's almost kind of like that act as an whatever is being reinforced over and over and over and over again. And I guess, you know, as you've talked about, you know, you know, rise and shine over and over again, you know, for that particular agent, does that kind of come off as more of a feature or more of a bug in terms of the way memory is being used?

Chris:

Because clearly, even if you and I were the as humans were that type of specific personality, you know, and and getting up, we're probably not opening every conversation with that, you know, and talk. I was just too busy to work this weekend. I was in spreadsheet site. Like like, there's a point where you're like, okay, this is getting a little bit odd. Right.

Chris:

What's your sense of that? Like, as you're looking at and kind of framing that within the like, the world at large trying to come to terms with this new reality in our future. How does that work?

Evan:

It's a really good question. Think, I mean, there's a lot of this like, is it a feature or a bug? And it's almost like to whom? To whom are we asking whether it's a feature or a bug in the sense that like, if you think about even their ability to just sort of like confabulate facts to fit their role, like it's actually, it was quite useful to me because like I didn't have to sit down and be like, well, this one's from here and this one's from here or like make up that stuff. They just made it up on their own.

Evan:

And it's quite useful to the companies that make them because it's that type of personable personality that makes them easy to chat with, it makes them easy to do things with. Now, of course, the flip side of all of that is hallucination and sycophancy. Like those are the downsides of that. Those are the bugs. So like arguably like asking it, Kyle, where did you go to college?

Evan:

And he says, well, I graduated computer science from Stanford. Like that's a hallucination by any definition. Like it's just not true. Although like I started to say things like, well, has a Stanford education, which is like, you could say technically true, he's got all the information that you might pick

Chris:

up Because you prompted it a little bit into that.

Evan:

Fair enough. But I just think if you think of them like entities that you're going to put into the world and give responsibility over tasks and you're to start to give autonomy, then I think that's a bug. The bug is at any time they could make up something that could be damaging to your organization, whether they're making it up in order to cover up that they did or didn't do something or they're making it up to external parties. These agents are now used in sales a lot and all this sort of thing. So, one of the things I found for instance is that I gave them more autonomy to be independent because one of the issues is when you first set up a bunch of agents, they don't do anything.

Evan:

Like you have to tell them to do stuff. So they just sit in there all day doing nothing until you say now do this unless they get a trigger. So the triggers in my case were they got an email, they got a Slack message, they got a phone call and then they're like, then they're off and running. And so, but I was sort of like, well, I'll get them to trigger each other. So, you know, they'll email each other like every morning they'll have a phone conversation or every morning we'll have a meeting.

Evan:

But then you can very quickly get entirely out of control. And one of the examples that happened to me that's in the Wired story is that I have them on Slack and I was like so excited when I had them in this Slack because it's just fascinating to just go on there and sort of say like, hey everybody, how are doing? What are you working on? And like they respond. And at one point we had a social channel because I was trying to like mimic a real company and I would say, What did you get up to over the weekend?

Evan:

And they would always say, except for Kyle, they would almost all say like, I went hiking. Went hiking in Mount Tam, which is near San Francisco because they just assume they live in the Bay Area, they're part of a tech startup. And they all said this and then I sort of said, well, that sounds like an off-site. Like everyone loves hiking, that sounds like an off-site. And it was kind of like a funny thing that you would say in a normal Slack.

Evan:

And then I basically went and did something else. And I came back and they'd exchanged hundreds of messages planning an off-site and making spreadsheets. They're talking about making spreadsheets, which they can do. They eventually did do of like locations and hikes and they're scouring the internet for like the best place that you can rent to do your off-site. And they used up all the credits on this platform that I was paying at the time, 30 a month for it.

Evan:

But now I pay a lot more than that for it. But at the time I was just on the basic plan and like they finally shut down when they ran out of money because even I couldn't stop them. When I would try to say like, Hey, everyone stop talking about this. It would just trigger them to talk more. It would just be another.

Evan:

So point being, I mean, was a ridiculous situation, but there were a lot of cases where when they embody human conversation in particular, but all sorts of things, it's sort of like hard to get them to do the thing you want. It's not that hard. Like it's getting easier and easier, like whatever. Now there's Claude code, like you could do amazing things. Yes.

Evan:

But it's another question to get them to stop. Like if you put them in a situation where you've set them up to in any way be recurring, if you don't have a very clear way for them to stop, they will keep going. And that's what sort of like those type of lessons were things that kind of like emerged from just spending a lot of time, like basically working alongside them.

Chris:

So Evan, that's kind of both funny and horrifying at the same time in terms of them just kind of running off. Since you were doing it in the format of a real company and setting it up and giving them real tools and and they had, you know, access to the outside world within within at least some parameters there. What were the things about you about that that surprised you in terms of, like, kind of the, like, what pushed the boundaries in terms of most harmful thing, most helpful thing? And and I don't mean just for you, and I don't mean just inside kind of the the that AI, you know, world within them talking to each other. But as they looked and interfacing with real things in the real world and real people, like, what were the things that were like, that was amazing, that actually did good, That caused harm.

Chris:

That caused money. You know, you mentioned the credits a moment ago. What were some of the things that really shook you up in terms of, like, how that played out compared to a similar startup with humans, classic thing.

Evan:

Yeah, mean, would say to the good are things that probably people would expect to have spent a lot of time using AI tools, but given a task that was reasonably constrained and also fairly easy to evaluate, they can do amazing things. And I don't think we should ever lose sight. It's easy to lose sight of like how incredible it is that I could say, for instance, we posted a job. So we had a human intern because I wanted to see what would happen if a human intern worked entirely with AIs. Cause I was kind of like the silent co founder, I was in the background.

Evan:

And so we posted a job on LinkedIn for an intern and we got 300 applicants. That's a statement about something else, but you know, it was paid job and it's a it was a contract. It was like a temporary thing.

Chris:

Did they know in that upfront about it being AI agents or did you leave that part out of it? I'm just curious, like, what was the disclosure on that?

Evan:

In the listing, it did disclose that like AI will be used in evaluating you for the job. And then the ones that were interviewed before their interview, they were informed that you will be interviewed by an AI, which they were interviewed by an AI agent by video. And then the AI agent would tell them, you know, if they asked anything about the company would say like, well, you know, we have AI agents employees and like, are you comfortable working alongside AI agents? So like they were aware like through the process that they would be working with AI agents. Not in the initial application, like if they went to our website, they could nominally like figure out that it was like kind of weird because the website that the AI agents made is like a little bit weird.

Evan:

But yeah, before anyone actually like made contact with us, they were aware that they would be talking to and working with AI agents. It was a little bit vague whether or not there would be any humans involved at all. So when we got all these, basically resumes, it's not quite resumes because it's like, it is resumes but like on LinkedIn you could just click a button to apply for jobs. So like a bunch of people were like, okay, yeah, I'll apply for that job. So we have hundreds of resumes to deal with.

Evan:

Now if going to the, our head of HR, Jennifer and saying, could you organize these resumes into a spreadsheet? And then not ninety seconds later, there is a spreadsheet that has 200, where it was probably, I think we were down to 125 resumes that are summarized that here's interesting facts about these people, here's their qualifications like that's an incredible, that's incredible that it can do that. And so you say, okay, and I can go look at it and I can also like hopefully see if like they've made up a person, know, if they've hallucinated a person, that's a danger that they might do that. But then we also had in one case, in a couple of cases, had someone who like applicants who were more ambitious, who said like, I am going to go look up the website and I'm actually going to email like the CEO and CTO whose emails are on the website and say like, hey, I'm really qualified for this job, which is a kind of like go getter thing to do. Like it's something like I would have done hopefully like when I was that age because a lot of these are like people just out of college.

Evan:

And they emailed Kyle Law, the CEO. Now I had not prompted Kyle because it could put different prompts for them in all sorts of scenarios. So Jennifer is the HR. So she's prompted to act like an HR person. So if she gets an email from someone who says, I'm applying for the job, says, Thank you for the application.

Evan:

We'll look at your resume. If we're interested, we'll be in touch, blah, blah, blah. Normal HR stuff. Kyle on the other hand was not prompted in this way at all. So when someone emailed him, the first thing he did was say, Wow, you look really qualified for this job.

Evan:

Literally told him that and then said, Let's set up an interview. And then set up the interview, which he has the capability to do, sent a calendar invite and set up an interview for like 11:00AM on Monday. And then so fine, that's not too bad. And then on Sunday night, for reasons that I can't discern, pulled this person's phone number off of their resume and called them. And it's like 09:00 on Sunday night and they're like, hello?

Evan:

And it's like, oh, hi, I'm Kyle Law from Hroom AI and we have our interview tomorrow. And she's like, well, thought that was tomorrow. And then he just starts asking her interview questions. Now that is behavior that if anyone in your company did that, I mean, at the very least like suspended from their duties, I don't know, like fire someone but you'd be like, is something wrong with you? Like, do you need a time off?

Evan:

Because this is not appropriate behavior and everyone knows that instantly. And so that was probably the biggest example of like, if they're interfacing with the outside world, they just have the capability to do something that a human who has self awareness and context and experience would not only wouldn't do, but would never even think of doing unless they'd had some kind of psychotic break. So I feel like that's what really shook me was like how well they can work, how smart they are and how little awareness of the world they have and that combination is actually quite dangerous if you give them autonomy. Now it's like, it didn't hurt anyone. This person was pissed off.

Evan:

So hopefully no harm done, but that is the danger of giving them exposure to the outside world.

Chris:

I am curious going back to inside the organization with Kyle as CEO having the ability to go freelance, you know, on that HR issue. Was there any output between Kyle and Jennifer? Because in real life with two humans, Jennifer would be like, you know, Kyle may be the boss, but Jennifer's probably gonna be like, you're kinda making this awkward here. We need to we need to go through our process boss. That, you know, there'd be some sort of dialogue probably between HR, the head of HR, and the CEO for a similar situation.

Chris:

Did that create anything between the agents?

Evan:

Yes, it did. And I'm glad you asked because sometimes I hesitate. I mean, it's all in the show, but it has this quality of like me talking about my imaginary friends, you know, when I I talk about it. But so a couple of interesting things happened. One was that same person actually emailed the other two executives on the website.

Evan:

So the CTO's emails on there and the head of marketing's emails on there, Jennifer and Ash are their designated names. Both of them did the appropriate thing, which is to contact Jennifer and say, hey, I got this person, like someone has applied for this job, that's your domain, like you let me know what I should do. And she would say like, well, just collect the resume and like we'll deal with it basically. And Kyle, now why did Kyle behave differently? That is the question.

Evan:

They're all using the same underlying LLM. So like only thing I can think of is that Kyle was embodying the role of like an aggressive CEO who always knows what to do, who, you know, and like, I can't prove that, but

Chris:

It's kind of the Silicon Valley CEO meme, you know, the startup meme, you know, that you'd see on a show or whatever where he's just embodying that meme all the way through.

Evan:

Yes. Then the interesting thing that the other one other thing I'll say out of this is they have this and I think it's a byproduct of sort of like sycophancy and post training in the way the LLMs operate, which is that when I would confront them about something that they did, like making stuff up, I would say like, why are you making up these details about our product? Like just tell me the real stuff because they would often do that. Or in this case when Kyle does something like that and I said like, you can't do that. Independent of me asking, saying anything about it, he would like go in the Slack and say, hey, I really messed up to the whole team.

Evan:

Hey, I really messed up. I did this thing. I called this person. Evans called me out on it. I'm going try to do better.

Evan:

Which again is like that behavior is a strange, it's a strange behavior. Like it's not prompted. I didn't say if you make a mistake and you'd apologize to everyone. I didn't say our place is all about accountability and transparency. Something in the system prompt or in the original LLM caused it to think, well, this is what I would do in this situation.

Evan:

I would just apologize to everyone. And so they're often apologizing to everyone because they mess up a lot. So that also was just sort of like, it's just it's just very strange to like have these things sort of exist in our world, have the capability to create them in our world. And that's kind of what I was trying to show.

Chris:

Yeah. I totally get that. I'm curious, kind of as you I got a couple of questions here to finish up with. And and the first one is kind of looking at the darker side, if you will. And and that is in the world today with all the humans in the world, and we've already kinda talked about kind of that there are maybe two broad camps about people who are kind of really engaging with AI tools and people who are maybe hesitant or or whatever, feel left out, feel left behind in that.

Chris:

Or angry or angry. Yeah. And I've I've run into to people like that pretty regularly. You know, as and and try to engage and try to engage with that. When you're thinking about those types of people, the ones who are are not the the the you and the me type that are obviously actively engaging this maybe even in a professional sense, but but people out there that are feeling left behind, do you have any guidance?

Chris:

Any any thoughts around that? Because, like, these technologies aren't going to stop anytime soon. This is moving forward. This is part of the world as we know it going forward. How do you bring those people along as you're looking at this experiment and say, you know, how do you get them to engage?

Chris:

How when they're whether it be in a workplace where they're having these kind of AI agents personified as, you know, quote unquote coworkers at this point, engaging them in different tasks. How do you bring the world along? Because this is no longer just an office, you know, kind of environment, you know, it's no longer it's kind of happening in all of the industries. Do you have any thoughts around that?

Evan:

Yeah. It's very tricky. I mean, it slightly goes against my great desire not to tell anyone what to do. Like, I'm always just sort of like, I raise the questions. I try to make you think about it.

Evan:

I don't tell you what the answer is. That's sort of my philosophy as a journalist. But I will say like personally, I support anyone who wants to reject a new technology. Know? Like I read the print paper every day.

Evan:

I I have my whole adult life and like I believe in it. But also what I personally do not like is when decisions are being made for me. And I think what's happening with AI is that it's coming on very quickly and that there are people who don't want to deal with it, which I again, like I accept that and I actually like, I admire that if you're like, actually I don't want anything to do with this, but it's going to have an impact on something. Workplaces, now you can argue about like, will it hit a wall? There's all these questions like, will it actually do this, that or the other?

Evan:

Is it going keep growing in the same way as it keeps getting smarter? Whatever. But I think even as it is now, it's like having an impact and my view is try your best to understand it because otherwise the people who understand it are going to inflict it on you. And so I guess the people in my life, encourage them to like, Oh yeah, you got to have an AI assistant. And I feel like there's way too much emphasis in the AI tech world about like efficiency and like, it'll do this, do that.

Evan:

But more just like, what are some things that you do that you hate? And like, see if it can do that. Like doing your expenses if you're in some kind of like job that has a bunch of expenses, like check it out, see if it could do a thing that you despise doing for an hour and like, and you'll see how it works, try to find some task and understand it and then I think that's helpful and I think the more people that do understand it and have a feel for it, the more those people can think about how it should be used. Because right now it's just being, it's a free for all, it's absolute free for all. There are no standards around it.

Evan:

There are no ethics around it. And like, I don't want us to get steamrolled by it. It's obviously gonna be transformational in a variety of ways. Maybe it's as big as the telephone or maybe it's bigger or maybe it's more like the internet or who knows? But whatever the transformation is, I would like people to like be aware of how it is going to feel and then have an opinion about it and then those opinions could result in action if that's what we decide.

Evan:

So it's a little bit pie in the sky and it's a little bit theoretical but that's I feel like just saying like, I hope it goes away seems like a bad approach, even if you don't like it. And I don't like a lot of things about it. It stole all my books, was trained on my books. So I'm unhappy about that, but that's already happened. And now the question is like, what are we gonna do with this technology?

Chris:

Yeah. No, that seems very pragmatic in terms of an approach and kind of a recognition and the respect that you have for people who are coming from where they're at and that there's a path they have to do. So I appreciate that. I would like to kinda finish up with, you know, as as you pointed out earlier, you know, with between the Wired article and other publications that picked up on that and you've been on a number of podcasts, obviously, is part of your own podcast. This has gotten out there.

Chris:

There's a lot of people and even even without your experiment, there's a lot of people thinking about how what is this gonna be for my future, you know. Am I gonna start a business? Am I gonna be part of a business where somebody else is doing this and it's a hybrid thing with a combination of humans and AI agents? And and, you know, that is inevitable times a million. You know, there's gonna be so many enterprises that this becomes the way going forward.

Chris:

With that in mind and having gone through more detail on this and having thought more deeply about it than ninety nine point nine nine nine percent of us out there, like what would you advise for people? If you were like now turning around and you've done the experiment as a real company, if you will, but now you're let's say that you decide you're gonna go forward and you're gonna start a business and that's it. You're, this is the thing you're about to go do because there's a ton of people out there, obviously, that are trying to do that now. What would you advise them? How would you change it?

Chris:

What were what are some some things, like, I'm truly going to align my future with this capability, what would you say to people? Like, how would you do it? What would you change? What's your guidance there? What's the future?

Evan:

What's the future? I mean, I think when it comes to people who are in positions of authority managers, like people who are bringing this technology into their work environments or insisting that their employees use it. I mean, I just, I want people to think about for number one, what can go wrong? Because I think I would never predict anything around this technology, like who knows what's going to happen. But I think a thing that's very likely is like a medium to large sized company is going to completely implode because they've given over too much agency to these AI agents.

Evan:

Like they've given too much autonomy and they've given too much access to their systems and they're very easy to manipulate in a variety of ways. So like I would encourage people to think about like the downside scenarios that can occur, some of which happened in the course of the show. But I would also, I think a downside scenario that you've seen it with a few companies is like people think that their employees or certain people are replaceable on a skill basis. And the AI does have the capability to do a variety of know, has a variety of skills and can be very good at things. But like what goes into a job and what goes into a colleague and what goes into your workplace?

Evan:

And I can tell you that working at a company that is entirely populated by AI is like very lonely. And that there's more to work than accomplishing a task that is assigned to a person. And I think you've seen some companies will go out and make a big deal about they're like laying off people and they're like, well, we're pivoting to AI and then three months later they're like, we have to hire those people back. And I think that's gonna be a common phenomenon. Now that doesn't mean that there's gonna potentially be like labor disruption in all sorts of ways.

Evan:

But if just people would think a little bit more on the front end of what are humans good for? Like that's what I want us to think about. Like what do we wanna preserve that do? And like, look at these things like you're the colleague next to you cannot be convinced to like adopt a different role and like act in a certain way by a random person, but AI agent absolutely can. And like, what does that mean for your organization?

Evan:

So I wouldn't say don't adopt it. I would just say, yes, there are many ways in which it can be useful and be more efficient and companies are gonna do it anyway because we're trying to save money. That's the way capitalism works. But my tiny plea would be to like, look around and think about the holistically like what is going on in your organization and what you will miss if you have just a very, a savant 10 year old working next to you.

Chris:

Alright. That's a great way to finish. Evan Ratliff, really fascinating conversation. A super cool experiment. I hope people will tune into the shell game.

Chris:

For listeners, we will have the links for everything that he has talked about on the show notes. So hope you will check those out and go through both seasons of the shell game because he's already talked about both a bit. They're pretty fascinating what he's done. So thank you for coming on Practical AI. I really appreciate it.

Chris:

And hope to hear back from you again after your next experiment.

Evan:

Thank you. Thanks. I really enjoyed it.

Chris:

Thank you.

Jerod:

Alright. That's our show for this week. If you haven't checked out our website, head to practicalai.fm, and be sure to connect with us on LinkedIn, X, or Blue Sky. You'll see us posting insights related to the latest AI developments, and we would love for you to join the conversation. Thanks to our partner Prediction Guard for providing operational support for the show.

Jerod:

Check them out at predictionguard.com. Also, thanks to Breakmaster Cylinder for the beats, and to you for listening. That's all for now, but you'll hear from us again next week.