[Dev]olution

Is your local machine lying to you?

Ben Potter, VP of Product at Coder, gets real about why AI agents aren’t living up to their potential without the right infrastructure. Developers and platform teams are dealing with bloated systems and inconsistent environments, but the real problem is a missing layer. In this episode, we uncover the key to making AI work: a development stack that’s secure, scalable, and reliable.

If you’re tired of wrestling with broken dev environments and want AI to make more sense in your workflows, this episode’s for you.

In this episode, you’ll learn:
The real reason AI agents aren’t living up to their hype
How to create a secure and scalable AI development stack
Why your current environment setup is holding you back from AI success

Things to listen for:
(00:00) Meet Ben Potter
(02:40) How AI is evolving from assistant to teammate in workflows
(05:03) The Waymo vs. Tesla analogy for agentic AI
(10:06) The risks of AI in enterprisess
(17:11) Ephemeral environments are crucial for AI agents
(22:32) Why enterprises are spending over a million dollars on AI yearly
(28:07) How the right infrastructure lets developers use AI as a reliable tool
(33:07) Rapid prototyping with AI in product management
(37:12) Why “agents are hired, not built”
(44:45) AI agents working together: teamwork or chaos?
(46:07) How boundaries enable agents and AI to work together
(57:38) Will agentic AI ship code by 2026?

Resources:
Ben Potter's LinkedIn: https://www.linkedin.com/in/bpmct/
Coder website: https://coder.com/

What is [Dev]olution?

The development world is cluttered with buzzwords and distractions. Speed, focus, and freedom? Gone.
I’m Nicky Pike. And it’s time for a reset.

[Dev]olution is here to help you get back to what matters: creating, solving, and making an impact. No trend chasing, just asking better questions.

What do devs really want?

How can platform teams drive flow, not friction?

How does AI actually help?

Join me every two weeks for straight talk with the people shaping the future of dev.

This is the [Dev]olution.

Ben Potter (00:00:00):
Developers are always just writing more and more code and it becomes really hard to manage, and I think a lot of these folks are looking at agents as ways to help manage this sprawl as opposed to creating more of it. It's another good example of AI coding agents not being used necessarily for code generation, but in service of creating better quality code or even reducing code duplication, they see agents as a way to simplify those systems or at least secure those systems as opposed to creating more mess.

Nicky Pike (00:00:24):
This is Devolution, bringing development back to speed, back to focus, back to freedom. I'm Nicky Pike. Hey everybody. So everyone is talking about agentic AI, but here's the part that's missing. Your local machine is a lie. The code that you write there isn't where it runs. The dependencies you install on it aren't necessarily what production uses. And now enterprises are spending millions trying to implement AI agents and systems and processes that aren't even providing for human developers that well. They're obsessing over models and prompts while missing the most important piece, the infrastructure that lets both humans and AI agents actually do their job. Joining me today is Ben Potter. He's the VP of Product at Coder, and he's here to really help us understand what the heck we should all be paying attention to when it comes to Agentic AI and enterprises. Coder is building the AI development infrastructure stack that makes AI agents actually work for the enterprise. It's that missing piece that we're talking about.

(00:01:23):
Today we're going to be digging into why AI agents need proper infrastructure, how Coder is solving this problem, and what the future holds when both humans and AI develop together. Before Ben and I get into that, here's the challenge on the table. How do you take something as powerful as AI, something that can write code, debug, and really take part in the entire SDLC and safely integrate it into the enterprise? Something that's already struggling with the security compliance and honestly developers that can't even get their local environments to match production. Ben, before we get started, AI has been a part of dev for years, but it's always been a sidekick. What are you seeing changed? Why are we suddenly seeing it stepping up as a coworker?

Ben Potter (00:02:05):
Yeah, absolutely, and thanks for having me. So we're seeing two patterns emerge in AI. The first is large language models are improving. Over the last few months, we're seeing new versions of models that are increasingly capable at coding on their own and running for very long periods of time. The second thing that we're seeing happen is we're taking these large language models and we're giving them tools. An agency. There's a term going around AI coding agents, which we'll get into a bit more, but this combination of the models improving and giving these LLMs agency to actually take actions is essentially what we're seeing as making AI significantly more powerful.

Nicky Pike (00:02:41):
Interesting. So we're seeing AI where it's not just helping out and playing assistant anymore. People are looking at it to become a full-on teammate. What does that actually look like in the enterprise workflows? I mean, are we talking about just code generation or is it able to do heavy lifting in other areas as well?

Ben Potter (00:02:58):
Yeah, so the way that we're seeing AI incorporated is actually a lot more boring than you would think, and I think this is where this teammate analogy is coming from. We're seeing AI being used to label issues submitted into Jira, do research and a spike to figure out where certain pieces of code are and investigations when an incident happens. We're seeing AI used throughout the software development lifecycle to do the small kind of trivial tasks that a coworker would have to do maybe dozens of times each day automated away for them. So these are the first pieces that we're seeing incorporated outside of just generating code on behalf of the developer.

Nicky Pike (00:03:32):
Okay. Well, you keep saying the word agents, so what the hell are they? I mean, why should we care? We're seeing this everywhere. People are throwing around this term. Everybody should know what it means, but I suspect that there's many out there still trying to figure out exactly what agents are and how they work.

Ben Potter (00:03:47):
Yeah, an agent is a large language model with tools and a feedback loop, meaning the large language model is able to do things such as write files, read files, execute commands, and it's given a feedback loop where once it writes a file, it can read its contents to make sure it made the right change. It can run a command such as a test suite, and in that feedback loop it's able to read what's happened and then adjust based on that. So an agent is essentially any large language model that's been given both tools to do actions and a feedback loop.

Nicky Pike (00:04:14):
We've had AI in things like Copilot and Cursor for a while. Agents are different from what we're seeing in those.

Ben Potter (00:04:21):
Yeah, I use the term assistant versus agent. An assistant is something that's more on the side. You can chat with it and it'll chat with you back, whereas an agent really has agency to then make changes in your editor. So after Copilot, two very popular editors were released, first Cursor and then Windsurf. These are both more agentic editors where there is that chat on the sidebar, but it can also start to write files, read files, run commands on your behalf.

Nicky Pike (00:04:47):
Gotcha. Well, and we've talked to Rob Whiteley before who's the CEO of Coder. He had this great analogy about Waymo versus Tesla. Break that down for me. How does that analogy work within the world of agentic AI and what's Coder's place in that analogy?

Ben Potter (00:05:03):
Yeah, that makes a lot of sense. I look at a Waymo is kind of like a Tesla, but it actually has much more agency, meaning there's no driver at all. It drives itself, but it has boundaries. And in a Waymo, you can't take it on the highway. You can't go outside of a city into dirt roads, whereas you can with the Tesla, but there's more of a human in the loop. A Waymo has less of a human in the loop. There's still a human in the car and they have this button that they can use to eject, but it's operating much more independent, not eject, sorry, stop the car. I don't think you want to be ejected from a roll away mode. That aside, it's definitely the boundary component as well as a high level of autonomy. And how that plays into Coder is you can think of Coder as the way to set boundaries for your agents and keep them running effectively with more autonomy in a safe environment.

Nicky Pike (00:05:47):
So when I think about that, when I think about Tesla, yes, it can drive for me, but I've got to keep my hand on the wheel. It's looking at my eyeballs and saying, Hey, am I watching the road? And if I'm not doing those things, it basically stops working, right? It's going to pull over and say we're done. Whereas Waymo, which I'll be honest, I got in my first Waymo not long ago in Austin, and it was a different experience. I'm sitting in the back, I'm not having to pay attention to anything. I can work on something else while it's driving me around, but I'm kind of interested in this whole eject feature that you're talking about. That would be kind of neat. If you're doing the wrong thing in Waymo, it just kicks you out.

Ben Potter (00:06:23):
Yeah, yeah, definitely more of, even in agentic situations, it's really important to have good safeguards in place.

Nicky Pike (00:06:31):
Right? Yeah, I can see that, especially when we're talking about code and how this comes into the enterprises. I mean, we're seeing right now that AI is becoming more of a peer or coworker, and I do want you to dive in because you think you have a little bit different take on that, but enterprises are freaked out about AI. They're worried about how it's going to screw things up. I mean, we've all read that recent story where AI went in and it deleted a whole production database because the test wasn't working. How do you plan on keeping this AI beast on a leash? What are you doing to help kind of assuage some of those fears?

Ben Potter (00:07:02):
Yeah, and I'll admit, I spent a lot of time getting pretty worked up about this as well around how do you secure AI effectively? And then we work with Anthropic pretty closely and we did a webinar with them recently and they gave a pretty clear explanation of how they're securing agents. They told us effectively that when they first started working with agents, the first thing they did was give them essentially no internet access, so they can't reach out, publish things that they shouldn't, access information that they shouldn't, and they were only going to read-only permissions to read internal data that they had. So therefore it's not super intimidating. There's very little that the agent can actually do wrong, but it can provide value in doing things like research or helping scan through logs and stuff like that. I think when you're first deploying agents in the enterprise, you should do something very similar, only give it read-only permissions and limit its internet access, and that's a start to really prevent the agent from doing anything wrong.

(00:07:55):
For example, I'm not really scared of my Roomba. I can put my Roomba in my house. The worst thing it can do is maybe scare my cat, but that is AI working with agency to accomplish the task. It's just giving very unintimidating tools such as a vacuum, and it's given a boundary inside of my apartment. If I gave my Roomba legs and a sword or something and access to the internet, I'd be a lot more terrified of it. So I think when you're first deploying agents, find practical things that you can deploy them for that aren't dangerous essentially, or given access, given access to dangerous things. There's also a lot of writing about this. Simon Willison is a well-known thought leader in this space, and he has this concept called the lethal trifecta, which effectively talks about different ways that as you're giving agents more capabilities, how you should be thinking about preventing its ability to do something dangerous inside your organization.

Nicky Pike (00:08:44):
Yeah, well, and you said you're not afraid of your Roomba. I bet your cat would argue that point, your cat probably terrified of the thing. But I think that's a good analogy because a lot of people out there are looking at AI and they're kind of thinking, well, this is this huge sage, right? It's going to be able to do everything for me. But comparing it to a Roomba, that's a very good take because your Roomba, it's only doing one thing, it's going out there to vacuum, but it's still getting stuck. We're all having to go pick up the Roomba and show it, get out from under the bed, or it's gotten stuck under a table. Are you seeing the same thing with the agentic AI and the enterprise and working through some of the software development lifecycle things?

Ben Potter (00:09:19):
And that's where I both get really excited by the AI coworker analogy, but I also really struggle with it in that I think the first applications of agents will be hyperspecialized in a specific part of the software development life cycle, but throughout the software development life cycle. So there'll be one that'll help audit documentation and make sure it's still to date, one that will run an end-to-end test and then provide a video of it, write an end-to-end test, and then provide a video of it writing the end-to-end test. So you can think of these little micro agents running at different stages, all of which are pretty restricted in what they can do, and there's a very low surface area. I do think down the road there's this concept of an AI coworker, and it's something that I'm really excited about and we should talk about, but I don't think enterprises should be thinking about how they deploy AI coworkers today. For starters, the technology isn't entirely there. The ROI is not there and the risk isn't. Risk isn't worth it.

Nicky Pike (00:10:06):
Okay. And when you say the risk isn't worth it, what specifically are you talking about? What should people be looking out for there?

Ben Potter (00:10:12):
The two things I think people are most concerned about are destruction of property or leaking intellectual property. So the first would be the point where you had around deleting a production database. The second would be around us uploading all of our source code to the internet for other people to see. But there's a number of other things opening up security vulnerabilities, racking up cloud costs.

Nicky Pike (00:10:35):
And I find that that's a neat little dichotomy that we see. We see a lot of people out there, they're talking about that, well, it can go to lead production database or there's studies out there, the 40 to 48% of code that's created by AI has bugs in it, or 32% of it has secure vulnerabilities, and they're getting concerned. They're saying, well, that right there says you shouldn't use it. But to me, I don't see a lot of difference between what the agent is doing and what we're really seeing in development teams. I mean, honestly, I don't know anybody out there that writes perfect code. Nobody writes code that doesn't have bugs in it. So why are we leaning harder on AI for the fact that it's getting bugs in its code or the vulnerabilities? Where do you see the difference there?

Ben Potter (00:11:17):
Yeah, it's a really interesting question. I think people set a pretty high bar for AI, and I think that people want instant gratification as well. So if you use it to try to write a feature and it doesn't write the feature the first time, or it didn't significantly impact your productivity in that one task, you'll write it off. Similar with bugs or security. Whereas I think a much better way to think about it is thinking about agents or AI as tool or automation that you have to invest some time in learning and getting comfortable with and building, and then it will continue to save you time, not just when you're actively using it, but it'll run in the background as an automation. So I think that's important both on the bug front and the security front. The other thing that we've seen leading companies do is they don't trust agent-generated code and they place trust in the humans that are submitting or proposing agent-generated code. So I'm a product manager at Coder. I used to be a software engineer. If I was to submit AI-generated code to my team, there's a number of checks and balances in place as a human that would have to occur before this code essentially gets shipped.

(00:12:26):
And we have to do that as a company, and it's part of our compliance posture, and it's also just a good way to run. And larger and larger companies have a lot of these policies. Some of it's security theater, some of it's actually implemented. But I think the more you work kind of through, okay, who's submitting the code? Who's vouched for it? Did the human review this or is the expectation for someone else to review it? I think as you start to think through those workflows, it feels a lot more manageable in that humans are still submitting code that agents help them and humans are responsible for reviewing and trusting that work.

Nicky Pike (00:12:58):
I think what you just said, there was a great point. When I look at this, it's almost like there's this unfair expectation of AI and that we're putting things, we're kind of, what's the word I'm looking for? We're projecting onto AI things that we do ourselves. So going back, like you said, we create code with bugs, we create code with vulnerabilities, but we see that differently than AI does it. It's almost an unfair comparison, but when it comes to, okay, we are going to use AI, the accountability of the engineer that reviews it, it's on them. They've got to make sure that what the AI is putting out is fair. But that's not a lot different than what we do in development teams with PR code reviews and things today. Where do you see the difference? What do you think is going to make the change where people quit putting an unfair expectation on AI and start looking at it more like a coworker and we're treating it like a junior developer?

Ben Potter (00:13:50):
I think about a year ago, people who were previously strong AI skeptics began to find clear ways that they use it and get value out of it in their own workflows. And that just happened to be ChatGPT helped me write something and I copied and pasted it in, or Copilot chat taught me something that I didn't know and I began incorporating it. So I think there are still AI skeptics out there who don't think it's valuable for any part of the workflow. I think that's now a minority, and I think we're going to see the same thing with agents. Agents are being completely overhyped right now. I've seen posts on LinkedIn saying they fired their team and they have a bunch of AI coworkers instead. And I also see people saying agents aren't able to take any tasks to completion. I think what's going to happen is a next generation of products is going to come out that help engineers and engineering adjacent roles see that there's a certain set of things that agents are very capable at, and it'll be so ingrained into a workflow that developers aren't even going to have to think about it, or developers aren't even going to have to figure it out themselves.

(00:14:49):
Someone else is going to figure it out for them. On every team I found there's usually one or two AI obsessed engineers. I see the change kind of shifting from single player developers, user AI in the chat into one AI obsessed engineer sets up an automation for the system and makes the rest of the engineering team or the repo productive. So I think agents really helps us get into that world, assuming you have a solid infrastructure and governance model to run these agents in the background.

Nicky Pike (00:15:15):
Yeah, I love your statement about the AI skeptics freely admit, I was one of those until I started looking at other things I could do. Now I spend honestly a large portion of my time with AI helping refine my ideas. The one thing that I find just hilarious to me is if you looked out on LinkedIn, the people that I see sometimes being the biggest skeptics, you can tell that their post that they're writing to talk bad about AI was written by AI. I find that hilarious. It's like you're badmouthing AI, but I can show you here's the tales on your post.

Ben Potter (00:15:48):
Yeah.

Nicky Pike (00:15:49):
So I find that funny. Well, okay, so you talked about enterprises being able to, they have this fear of loss of IP. They have their fear of that. What are you doing within Coder to kind of help prevent some of those nightmare scenarios that they're worried about coming true?

Ben Potter (00:16:04):
I think the enterprises that we're seeing adopt this the most right now are investing in automation across the board. And the place that Coder helps is automated environments for agents to run. Meaning with a platform like Coder, an agent can run inside a secure ephemeral environment that is cost managed and permission managed, meaning the agents have a very limited set of permissions to do a specific task inside that environment. They can do whatever they want, but outside the environment, not a lot could go wrong Through Coders, centrally managed platform, you can prevent from some of those additional risks such as the agent racking up your cloud bill or costing too much in tokens or assuming the permissions of a developer and deleting access to a database. So what coded does really well is automated environments and everything that surrounds that for agents to run in.

Nicky Pike (00:16:53):
When you mentioned that, so I've seen you on LinkedIn, you talk about this AI development infrastructure. One of the things when you were talking kind of what Coder does, you kind of put a lot of emphasis on ephemeral environments. Why is that important within the infrastructure for AI? What makes ephemeral environment something that you would put that much emphasis on it for?

Ben Potter (00:17:13):
Each agent needs a place to run its tools, and when an agent doing research that might just be reading files. If it's writing some code, it might be writing to files. If it's browsing the internet, it's browsing the internet. And an agent needs an environment to be able to execute those tools. And as agents continue to do more complex and complex things, you'll find that the environment that you're creating is actually very similar to the environment that developers need to work in. And by automating that across the board for agents, for humans, for whatever, you create this really nice model where you're no longer running into issues where people are saying it works on my machine, or agents are spinning up environments and just leaving them around forever or put committing secrets to environments. And by having a very consistent operating model across software engineers and agents and your test pipeline, your staging, your production, by creating consistency across the board, it reduces a lot of headaches.

(00:18:09):
People are going to have a lot later on in the process. So I think that's one. The second is as a developer, I don't think you get a ton of value just running one of these agents on your local machine. There's some, it can help you walk through a code base, it can write a feature for you, but you can really only do one of these at a time. And where I think this is going to go in the future is a lot of these agents running in the background working on tasks essentially while you're asleep, whether it's running a scale test, triaging bugs, doing all of the kind of toil that happens throughout the software development life cycle, and freeing up software engineers, product managers, QA engineers to do the more creative or more operational parts of their job, getting something done as opposed to just kind of triaging. So I think that agents need somewhere to run in this world where they're running everywhere, and I think it's pretty hard to deny that that's the future of these kind of automations happening in the background. And to do so, you need a federal environments.

Nicky Pike (00:19:04):
Okay. Well, and yeah, absolutely. It seems like because all played with code and we've all played with AI generation, and if you get something, agents are still, they're not there yet. Like you said, they're still maturing, they're still growing, they're growing at a rapid pace. But I think every one of us has experienced where AI's kind of went haywire, it's went off the rails, it's added, got caught in a loop and it's added a couple thousand lines of code. To me, that seems like the benefit. I don't have to go back and reverse engineer or try to back out everything AI's done with that environment. I can just delete it and start over, maybe adjust my prompt, maybe adjust my instructions and try again rather than having to spend days, hours, whatever it may be, trying to back that out.

Ben Potter (00:19:45):
You definitely don't want an agent messing up something on your computer or reading your emails or doing things that you shouldn't, but I think that's just going to happen, meaning not it's going to happen, but I think people are naturally going to want to run these agents more and more and they're going to end up building environments for them. So I think it's almost an inevitability that agents are going to run in ephemeral environments.

Nicky Pike (00:20:06):
So we've got the ephemerality, and I guess this is all leading into what we've heard you talk about with that AI development infrastructure stack. Ephemerality is a portion of that. What else would be involved in that and where are you seeing enterprises getting it wrong today when they're trying to build infrastructure for AI?

Ben Potter (00:20:23):
So when building out a system for agents or really any AI, we're seeing a lot of enterprises invest in a few different layers. The first is some type of LLM provider and Amazon Bedrock is one of the main ones because the models aren't trained on your data. People also using self-hosted LLMs such as Llama and stuff like that. And then they're also investing in an infrastructure layer for ephemeral environments. And there are a few really great ones out there. Kubernetes is great for running ephemeral environments. You can self-host Kubernetes, which is really cool. E2B is another one. They're doing cool sandboxes for agents. I think that where folks are getting it wrong right now is trying to build a separate developer environment for agents. Then the ones that software engineers are using, I think especially as models and agents get more capable, you're going to want it to be able to do things that your software engineers can work on too. So I think it's really important to have reproducibility across these agent specific environments as well as the ones software engineers are working in.

Nicky Pike (00:21:23):
That seems to make sense. I mean, consistency for AI is absolutely required. If you don't have consistency for AI, you're going to get inconsistent outcomes. And I think, again, going back to that kind of human partnership that we have, if we do have to go in and fix something for AI, being able to have that consistency of what we as humans are working in and what we see compared to what we're doing with AI, it kind of prevents that gap of knowledge that we may see between the two if we've got to go back and fix something.

Ben Potter (00:21:49):
Yeah. Yeah. I mean there's a model I think that we're seeing a lot of people do where they'll kick off an agent to start a task and then move into their editor to bring into completion. So it's this awesome, the agent does some of the not so fun stuff. It clones the repo, it checks out the branch, it reviews kind of what area the code needs to work on. It does an attempt, and then the human can then come in and layer on their expertise. So the only way you can do that is with an environment that both software developers and agents can work in together.

Nicky Pike (00:22:18):
Alright. Well, I saw this report that stated that 43% of all enterprises have AI budgets exceeding $1 million annually. So based on what we just talked about, are these enterprises lighting that money on fire right now? What are they getting for that investment that they're putting into AI?

Ben Potter (00:22:34):
Yeah, I really don't think people are their money on fire. In most studies, people with even a basic assistant. So this is the age agentic world, are getting roughly eight to 10% productivity gains from these assistants in their day-to-day workflow. And in that itself, at an enterprise scale begins to pay for itself just in terms of the amount of time that you've saved. However, with AI moving so quickly, I think it's really important that when people are making AI investments, they think about a durable long-term infrastructure layer as well as just a durable long-term layer for where they're going to be running and operationalizing AI. These IDE assistant are awesome and super useful and I use them all the time, but you probably don't want to be spending most of your budget on an IDE only assistant. You probably want to be thinking about ways that you can run maybe an open source assistant and self-host your LLM and then invest in an infrastructure layer to then provide this current generation of software development as well as the next one. So I think the investment's worth it. It's just important to not invest too much in a model that it will end up being less productive or less interesting in about a year or two from now.

Nicky Pike (00:23:43):
Well, and I think there's also the aspect of lot of companies, are they doing, are they following the hype? Are they really putting thought into ways that they want to use AI? AI goes way beyond just code generation. The productivity I think you can see from that goes into other parts of the software development life cycle. It's documentation, it's code test. So I mean, do you have any advice when people are setting this up and they're starting to prepare for AI the way that they need to look at it, don't just follow the height, but have a plan for how you're going to do this just like you would any other system?

Ben Potter (00:24:12):
Yeah, I don't think you should depend on developers to independently adopt and start using a tool. I think you should listen to developer feedback and procure tools and editors that developers want to use. But I think before something like Cursor became popular, for example, I don't think there were CSOs or CIOs or CTOs even being like, we need to invest in this. I think developers are able to identify which tools they need to use and we'll bring them to your attention. And then it's just a conversation of procurement and compliance. I think where leaders can really help in terms of their organization's. AI transformation is more in training enablement and also investing in infrastructure that will support a whole variety of AI tool sets. So I think the strategy component here is have solid infrastructure, listen to your developers and think about future workflows. And I think developers will pretty quickly identify ways that they can begin using AI in their workflow from what they've seen with their peers.

Nicky Pike (00:25:11):
Well, you brought up ciso, so that reminds me of another report. I see that another report when we're looking at people that want to, or enterprises that want to bring in AI that about 90% of all enterprises are looking to integrate AI agents by the end of this year by 2025, but when they really dig into it, 57% of those site security is their primary concern about AI. Is this a legitimate concern?

Ben Potter (00:25:35):
Yeah, absolutely. I think if code's in a hundred person company and if we were to bring on 30 interns by the end of this year, security would be a concern for us because at that scale, it's just something that we're not prepared to dealing with. We've never had interns before at that scale, and there's a lot of new things that you need to think about. I think that leaning on some of the thought leaders in the space like Simon Willison, what Anthropic's talking about and to some extent leading on best practices that have already existed around other models will help. And then I think there are also very safe ways to deploy agents with minimal risk, such as disabling the internet access, giving it read only capabilities. So I think that it is a legitimate risk and I think you need to be intentional about it.

(00:26:15):
I don't think it should hold people back. It's kind of funny because I'm a product manager and part of my job is to interview CISOs and people who are really thinking about information security systematically, and I ask them what their main concerns are rolling out agents and a lot of these. And keep in mind this is not by far a representation of every CISO out there or actually a lot more optimistic about agents than I thought. And they viewed agents, no pun intended, as kind of like a change agent for how code written in the enterprise, meaning an agent can assist the developer in writing better documentation or upgrading security vulnerabilities. Whereas I think a lot of times in an organization that large, it's really hard to share best practices down with the developer population. And developers are always just writing more and more code and it becomes really hard to manage.

(00:27:01):
And I think a lot of these folks are looking at agents as ways to help manage this sprawl as opposed to creating more of it. So I found that really enlightening because it's another good example of AI coding agents not being used necessarily for code generation, but in service of creating better quality code or even reducing code duplication. So I think for a lot of people who are managing really complex systems, they see agents as a way to simplify those systems or at least secure those systems as opposed to creating more mess. And I found that really interesting.

Nicky Pike (00:27:29):
Well, you said two things there I want to pick up. The first one is that top down approach of pushing AI. Again, AI to me is just another tool. It's another way that I can get better at my job, increase my productivity. When we look at the tools that developers use today, the ides aliases, all the shortcuts, all the things that developers put into their flow, I imagine it's going to be just as variable with AI. The way you use AI and your development flow could be very different from mine. It doesn't mean that either way is wrong or right, it's just a different measure. And so how do you think providing infrastructure will allow that type of variability where they can use AI? I think the way it's intended as another tool.

Ben Potter (00:28:10):
I think a good enterprise platform team is great at figuring out common patterns and building products, including AI products that solve the 80% scenario roughly. So how can I make a template or how can I make a pattern that developers can use for a number of different things and it'll work for them 80% of the time and for the other 20% of the time, developers can go off track and build their own thing. So I think applying a similar methodology to agents makes a ton of sense. If folks are early in their AI journey, the first actionable thing to do would be procure the tools that developers are already asking for, Cursor, Windsurf, maybe some of these agentic tools like Claude Code. But beyond that, I think if you wanted to take more of a proactive approach, it would be building out clear examples of how you can start using agents in your issue tracker to label issues properly, how you can start using agents to run into end tests and providing those patterns, providing clear boundaries around them. Then different teams or different developers can opt into using them in their workflow without it feeling like anything particularly daunting.

Nicky Pike (00:29:13):
Right. Yeah, I see that. I mean, the one thing that AI is really good at is pattern recognition. Its ability to see things that we may not is huge.

(00:29:22):
And your issue tracker, I think that's a great thing. You get a hundred bugs over the course of a week, maybe 15 of those are all related in some way, and we don't see the pattern because we're willing to work on 'em. We're trying to get 'em fixed, so we don't see that there may be a commonality. I think that's a great case. The other point that I wanted to bring up, as you mentioned all the thought leaders, you're selling yourself short here, Ben, because Coder itself, I'm going to put in that category of thought leader because Coder's not just talking about how to use AI. You guys are actually using it within your business. I remember we talked to Elle Wolf, we interviewed her about the rebrand and the H1 launch that you were doing, and she made the statement, we couldn't have done what Rob asked us to do and bring that in a month had we not used AI. So what are your experiences? I mean, you guys are actively using AI. There's Blink out there. What are you guys seeing in using AI as a coworker? As a tool?

Ben Potter (00:30:16):
Yeah, on the product management side, you were actually talking about product recognition, and this is a little bit separated from Coder of the product, but we'll get there. We're doing a lot with transcripts. So in meetings that we're able to record, we'll take that transcript and personally after every meeting, I will interact with that transcript and ask it was there a point where I talked over someone or what were the key points of this, or how could we improve this meeting for next time? That's just how I like to think is how could this be better? But you can apply any approach to it, but there's so much data in that I may not have recognized that I can use onto the coding aspect. We've done this in the past where we spent an hour and a half talking about a design for a new feature that we want to make at Coder, and we're calling this feature bridge, and we've actually come a long way, and I took that transcript of the hour and a half long call, I put it into Coder and I asked it to prototype a design based on the entire transcript and it reviewed it, and I didn't even need to look at the crappy mockups that we're making because the mockups were pretty crappy, didn't even need to look at the mockups, and it made something significantly better that was in the product that we're not going to take to production anytime soon, nor are we even going to try.

(00:31:19):
But it gave me a great visual then I could demo to engineers and demo to customers and work with, and it did it in a real code base so we could see that it's possible and here's how it could look and feel. So I think there's this cool combination where agents are able to take context of highly, not calling myself one, but highly skilled people who've thought about a problem really deeply and then be able to represent it in a medium, like a code base or something really well. So on the product management side, we're using it a lot for prototyping, exploring the code base, figuring out what the art of the possible is for what we make next in the product. In engineering, we're using it in a number of different ways. We're using it for docs generation. So a workflow that we are really excited about is on pull requests, checking, make sure that our docs are still up to date based on the pull request because sometimes we'll ship in code and we'll forget to update the docs, and we even want to take it a step further where maybe it can generate the screenshot itself with a browser and put it in.

(00:32:11):
Another one we're doing is around our automated test suite. So sometimes our test suite will flake, meaning it's supposed to do a certain outcome, but it won't. And for every flake we're spinning up an automated workflow to try to identify how did this flake happen? Where's the code that's involved and what's an engineer that we can assign it to, and to essentially fix that flake because otherwise we were doing it kind of on the honor system where flake happens, you should identify and report it. The agent's kind of an accountability partner and still asking the engineer to go and take the flake to completion. The next step of this is can it fix it itself? But I think the fact that we've split it up in half and said, first, let's solve the core problem that we have, which is we're not fixing flakes or identifying flakes fast enough and then let's see if it can do some engineering for us is really helpful. An example of identifying a flake is not the coding part of lifecycle, it's just detecting an event, doing some research and logging it. So those are some use cases that we're really starting to operationalize.

Nicky Pike (00:33:07):
All right. I'm smiling. You threw a lot of information at me at one time there. So we're going to step through a couple of those.

(00:33:12):
The first one about how you used it to prototype your AI bridge. I think that's one of the things that not enough people are talking about. Great ideas don't always come from software developers and software engineers. You get great ideas from everybody. I think that this ability for AI to do that rapid prototyping for it to kind of serve as a new wireframe is one of those unspoken gems that nobody's talking about. You basically just proved it, Hey, it was able to give us something that was better than what we originally thinking. We probably cut out weeks of going back and forth on your time. Now you admit it, it's nothing that we're going to take to production. But did you find the fact that it kicked out, that prototype for you spurned new ideas within the engineering team of how it could work and what you could do with it that maybe you weren't thinking out before AI kicked that out for you?

Ben Potter (00:33:59):
Yeah, it definitely makes ideation a lot cheaper because the feedback loop between idea to something tangible is a lot shorter. So yeah, we looked at it and it created a couple of dashboards and like, oh, I even ask it for this, but it kind of picked up that this would be a helpful view for people. And I'm like, yeah, that's right, or we should test that. So I think that it helps in that way. It's also going to help a lot as we continue to do research with our customers and we want to show them things versus having to present them slides. So being able to have something that we can then take to subject matter experts in their field, in their industry and show them here's how we're thinking about the view, what do you think? And then be able to change it and adjust it in real time. It's going to be really valuable, and I know there's a number of ways to do it, but being able to do it with AI is really helpful from a product management standpoint.

Nicky Pike (00:34:42):
Yeah, it's taken rapid iteration to a whole new level, and that's going to be extremely valuable because time to market, all of the things that come into getting a product done, this is improving upon that. The other thing that you stated was about documentation. Now, I talked to a lot of developers. I talked to a lot of customers. I have never found a single developer that tells me, Nicky, I love documenting my code. In fact, it's the exact opposite. It's one of those things that's necessary, but nobody likes to do what you were stating about how you're doing this on PR requests. That's turning documentation on its head because now documentation, it's living documentation, it's happening in real time. And even those developers that are really good about documenting their code, they get behind. They don't go back and they don't update with new features. Now you're updating as codes coming out. I think that not only changes the game for helping the readability of code, but also the archeology of code. What happens when Ben leaves and we need to go figure out what Ben did. It's already automated for us. It's already documented. Are your developers leaning in on that within Coder? Is that something that they're truly embracing because it's taking care of one of those brain dead tasks that they don't really want to do anyway?

Ben Potter (00:35:46):
Yeah, I think software engineers and anyone for that matter is a lot more willing to use AI in something that they're not necessarily looking forward to doing in the first place. So I'll often use AI in doing things that I'm not necessarily looking forward to doing. So yeah, I'd say software engineers are definitely leading in. I think it's important to always consider quality both in code and documentation. Documentation is something that's really important to me. I know a lot of software engineers really value the structure of code, especially for the longevity of a project. Anything that's built in an automated process is often a lot lower quality than something that was made by care with a human. And the same is a case for AI generated stuff, and that'll always be the case. I think that the argument that I've seen that makes a lot of sense to me for continuing to use AI is if we didn't have AI, this wouldn't have been done in the first place.

(00:36:35):
And I think that's really helpful in a doc standpoint. I think that's really helpful in using AI to help make a product higher quality rounding a corner where it doesn't necessarily have to be rounded, taking pride in it, but it can also be used in the opposite way. I do believe that AI generating code will always just be a little bit lower quality than the code of a human who really takes it seriously. Similar to the way an automated, a chair that was made in a factory is going to be a lot less high quality than a chair that you buy from a local woodworker, for example,

Nicky Pike (00:37:03):
Right? I guess then it becomes a thought exercise of where does the IKEA furniture fit best versus the Amish furniture that's handmade.

Ben Potter (00:37:12):
There's a place for both

Nicky Pike (00:37:13):
In this world

Ben Potter (00:37:13):
For sure.

Nicky Pike (00:37:14):
I absolutely agree with you. Now you've got this statement that I absolutely love. It's this great line. Agents are hired not built. That is a wild way to think about it. A agentic AI break down that statement for me. What do you mean by that?

Ben Potter (00:37:28):
So I've been not talking about as AI coworker stuff. I do think there's a future where AI is a lot more capable than it's today where it's not just confined to some workflow input output, and it's something that you can talk with in Slack or even in a meeting, and it could be talking with us right now and telling us it's thoughts. And I don't think every enterprise is going to build one of these basically themselves. I think they're probably going to buy one off the shelf and then tune it or train it or whatever for a given task. And we're seeing the same, even for these very basic automation agents like Cloud Code, a lot of the enterprises that we talked to were trying to build their own coding agents. Something like Cloud Code was released, they took it, they deployed it, maybe they hooked it up to their LLM provider gateway and learned pretty quickly that what Claude Code had was a lot better than the agent they were working on.

(00:38:16):
Nothing against all these teams trying to build their own agents, but I think that it's not every enterprise is skillset to make custom agents that essentially imitate humans or part of what a human does on a day-to-day basis. So I just think it's a lot more common that general purpose agents or ag agentic frameworks are going to be used tweaked with the specifics of the context of your organization and given boundaries similar to the way an employee would be hired. But I don't think each enterprise is really going to be manufacturing these agents, if that makes sense.

Nicky Pike (00:38:48):
No, it absolutely does. I don't even see it really being feasible because a lot of companies out there are saying, well, we'll go and train our own models for our code. We'll go train our own models for our business. But when you really get to looking at that, I mean you're talking to be effective, you have to train millions and millions of lines of code to be able to find the patterns and recognize them. I don't know that that's even feasible. And I mean, it comes back to the same thing that we see in technology every day. It's the buy versus builds mentality. Which one works better for you? There may be some companies out there that have the capability and the time to go build it, but most part, it's better to rely on somebody that specialized in that.

Ben Potter (00:39:27):
Yeah, I completely agree. And Coder, before doing agentic developer environments was in the cloud developer environment space and just normal dev environments. And we've worked with and talked to a lot of organizations that built their own to start, and then either they're happy with it and it's great, or they move to something like Coder because it's a lot easier to use and easier to maintain because we have a lot of engineers thinking really hard about this specific topic. So I think we'll see the same with AI agents and the infrastructure. I think some people will do a great job building it on their own, but I think others will want to invest in something where a lot of smart people are already thinking about it and investing in that area.

Nicky Pike (00:40:07):
So as one of those smart people, Ben solidify this for me, I'm a developer. I fire up cloud code in my Coder workspace. Walk me through exactly what happens, what can it access, what can it not, and why should this matter to me?

Ben Potter (00:40:20):
Yeah. The main thing that we see developers doing for the first time when they spin up an agent like Claude Code is you're in a new environment. Maybe you're just joined the company and you want to learn what's going on. So you ask it, what's in my environment? What the agent will proceed to do is scan the whole environment that you're in. In this case, it's safe because it would be ephemeral and managed by something, by Coder. It would read the file structure, the directory tree, read the file specifically, and it'll say, okay, this is what your environment has in it. Here are the files. Here's what each folder is kind of responsible for. And then you can ask an additional questions about, well, walk me through the front end, walk me through the backend. Or you could ask it a crazy question, which engineer works the craziest hours on this project?

(00:41:02):
And because it has access to Git, which would be one of the tools, it would be able to find those answers for you just by running the commands that you could probably run yourself or doing the research. You could probably do yourself by traversing it, and then you could ask it, write a feature or build a project or work on a test suite, and you're essentially just chatting with your environment. And that's kind of where we see a lot of people getting wowed by a coding agent like Claude Code. It's this interface into an environment where you can learn about the project, you can have it write files, you can have it read files, and you can also attach additional tools. So you could give it a browser, you could give it a access to your internal service catalog so that it could read what other services does it relate to, or who is the service owner if that information wouldn't be stored in something like GitHub, and you can give it additional context capabilities.

(00:41:46):
So I think that's where people start. And I think what happens over time, if you get addicted to one of these coding agents, and I did, I got obsessed with cloud code for a while, is you start to realize that once you find the right way to give it instructions, it can run for longer and longer without you in the loop. And maybe it can do really useful things where you could have it run an experiment or work on a prototype. In the case where I asked it to prototype something, I think it took 10 or 15 minutes, in which case I was just doing something else. But these things can take hours. And then the next natural course of thought that I think a lot of people have is, well, I want to run dozens of these in the background in safe places. And that's kind of where the next step of this goes is not just an assistant or an agent in one environment, but a lot of them working on different things in different parts of the software development lifecycle.

Nicky Pike (00:42:28):
And this goes back again, I'm going to go back to a point I made earlier. This isn't very different than what we do today. I mean, you don't go as part of Coder and work on a feature in isolation. You're talking to your coworkers, you're talking to your teammates, so you're doing the same thing, you're having a conversation. The difference here is I can have a conversation with Ben and Ben says, yeah, I get what you want, and then 15 minutes later I've got something to work on or look at. I also think the huge benefit and what a lot of people are discounting is software development is serial in nature. Find me the guy that can type on two different laptops, two different keyboards at the same time doing different things. It's just relatively not possible. I'm sure there's somebody out there that can do it, but in general that's not.

(00:43:08):
But now we're able to parallelize that work. To your point, I'm able to kick it off to do GitHubs. My next iteration, and we're seeing some of this already, is where, okay, now I've got a team of agents. I can have one agent go work on one thing, another, go work on the other, and then report back to each other, play off of each other. And I think that's going to be the next future is when we're starting to direct this orchestra of AI agents. I mean, have you played around with any of that already or even tried it?

Ben Potter (00:43:36):
Yeah, I've had agents talk to each other before, and it's kind of funny, I had a coding agent that we're working on Blink, talked to Devin, and they had a nice chat and ultimately got into an argument. But that's awesome. I think that what we're seeing right now happen is one conductor agent basically in charge of everything, spin off little subagent and then the subagents report back to the conductor, and it's just kind of one stream. It can be paralyzed, but then there's one blocking decision maker conductor kind of thing. But where it gets really interesting is it's like, well, what if there are multiple conductors or what if the human is a conductor and then each conductor, then it gets really weird. And I think that we're close to that, but we're not there quite yet just because we haven't gotten agents to do extremely useful multi-step workflows effectively. I think it's great at make sure the docs are up to date or take a screenshot of this page, but it's not necessarily instrument a feature end to end. And I think as we get closer to that, we'll start to see more of those workflows.

Nicky Pike (00:44:36):
I want to hear more about the two agents getting into a fight. So give me a quick lead up and then at the end, did you have to step in as the parent or the referee and say, which one's my favorite? Who wins this one? That's an awesome story.

Ben Potter (00:44:49):
Yeah, I did. I think what happened was they're kind of talking past each other. So I took Blink, which is the agent we're working on, asked to find a nice issue for Devin, which is the other agent. Blink went off and started looking for an issue. Devin found an issue that already wanted to start working on totally unrelated, and it didn't follow instructions. Blink tried to correct it, Deon wasn't having it. I had to step in, and I won't say that'll be the last time I've had agents talk to each other. It was very amusing, but it was a good realization that I think it's still very helpful to have someone in the loop watching over it besides watching it unfold. It wasn't particularly productive.

Nicky Pike (00:45:26):
So even with this new age of virtual peers and virtual agents, we're still going to have office drama. We're still going to have, well, blink doesn't like Devin. Devin thinks that blinks too slow. Claude code jumps in and says, well, both of you're dumb. That's highly entertaining. All right, well, let me get us back on track here. So I know, and you've mentioned one of 'em already. You're working on a couple of features. So Coders working on a couple of features called Agent Boundaries and AI Bridge, which in my mind is kind of this defense and death approach to securing AI agents with boundaries, working at kind of the workspace level and AI bridge at the organizational level. Am I right in that? And how are these features able to do that securing without neutering the usefulness that is AI?

Ben Potter (00:46:10):
Yeah, that's a great question. And yeah, I think your analogy is right. So boundaries works on the workspace level, like you said, and its goal is to give the agent a restricted set of permissions from what the developer can have and allow 'em to work together in the same environment. So I wouldn't trust an agent with my password to GitHub, for example. I can do things in GitHub that I wouldn't want my agent to do, and for that reason, I would want to create a boundary between where my credentials are in my workspace. So the credentials that the agent can work with. So the agent would be its own GitHub identity, for example. It would have its own permissions. It would have its own applications and tools that it could use. But then ultimately I can still review that work in my environment into a quality standard that I feel comfortable with.

(00:46:51):
And at that point, as a human in the loop, I can approve that work and bring it in and say, this is my own and here's what I want to bring into the picture and here's the code I want to emerge. So boundaries creates a separation between what the human can do and what the developer can do in an environment. Still giving the software engineer permissions to do things like install packages or clone repos or things that I should be able to do with my own identity. What Bridge aims to do is give people visibility into what agents are actually doing in the first place, a full log of tools that the agent called prompts that the agent did. Things like, basically what have agents been doing lately is the question of bridge aims to answer from a visibility and an observability perspective. And then with that data you can add policies, identity and access management. You can start pre configuring tools on behalf of developers so they don't have to think about it. You can do so much more, but the first problem that we aim to solve with Bridge is essentially visibility into what your agents are doing. So the case of an incident, you would have a full audit trail back to what exactly happened.

Nicky Pike (00:47:52):
So I like what you said about boundaries. So you're separating out the agent from the developer, you've got different permissions. I mean, I've got some questions about how you recognize which is which, but you also, is this something where, okay, the agent cannot install packages, but it can recommend to the developer to go and install packages. Hey, you're missing this. You might want to install it, but I don't have permissions. Now, would we consider that a capability of a boundary or is that the capability of a boundary plus the agent itself?

Ben Potter (00:48:21):
I think it's both. I think the role of boundaries is exactly that. To prevent an agent from doing a package install, for example, and boundaries should be able to give the agent a great feedback loop back saying, Hey, I can't do this. I'm a boundary, but ask the dev to. And then that's a pretty simple thing for the agent to pick up and tell the dev, Hey, I'm recommending you install this. So I think that would totally work. And the way that we're doing it is actually on the operating system level. So we're creating a process isolation namespace essentially for the agent so that on a system level, the agent is prevented from doing certain things. We're not using LLMs or any complicated AI to prevent agents from doing so. There are things, we're doing it on the operating system level, which keeps things nicely apart.

Nicky Pike (00:49:01):
That's interesting. And then from the AI bridge level, now, is this a direct tie into MCP servers or is AI Bridge, when you tell me this, I hear API gateway, but for MCP servers, is that correct or is it a little bit different? That's right.

Ben Potter (00:49:17):
Yeah. You can think of it as an API gateway for LLMs or MCP servers. So your request to go through Bridge, bridge has visibility auditing, it has the ability to layer in additional MCP, and it also has the ability to reject requests that have specific tools that the gateway doesn't want you to use. So that's a great way of thinking about it.

Nicky Pike (00:49:34):
Now, will you be able to AI bridge kind of to really take that gateway approach and say, this certain agent will be able to access our customer database while this agent can't and be able to really route those calls back and forth through the gateway based on agent identity or how are you identifying the agents and what their capability is?

Ben Potter (00:49:54):
So eventually we'll be able to do something like that. Right now it's tied to user identity. We think a lot of these workflows, at least in the short term, are going to be user specific. And the next step would be system specific. So if you created a system account or a service account for an agent, that account would then have access to different credentials down the road. We'd look at things like process identity and stuff, but I think the focus of something like Bridge in the short term is essentially visibility into what's going on. And then policy and governance will be layered on once we have really solid visibility into how agents are working.

Nicky Pike (00:50:24):
That's awesome. So you're building the security, the governance, and I'm guessing even some of the compliance aspects at the inset. You're centralizing that for all of your MCP servers? So we're talking, we do a lot of talking about, the thing with AI right now is you have infinite optionality in what you can use and how you can use it. Being able to use something like AI Bridge gives you a central point to make sure that this is not, we're not bolting security onto this, we're not adding it after the fact, but it's actually an integral part of how you're going to do communication and bring context to your agents.

Ben Potter (00:50:56):
Yeah, it's pretty central to the way you would run agents.

Nicky Pike (00:50:59):
That's awesome. Well, and we kind of talked about this, but in all of the talks that I'm doing, and you brought this up, I'm seeing this movement where we're going from one agent to do everything. Here's the LLM to rule 'em all to having those specialized agents. Enterprises are wanting an agent to provide documentation to do that living documentation. We talked about one for test creation, another for code generation, and they want to do this all in the same environments. How does this not become complete chaos when you're looking at it? How do you manage that? Is this one of those places where AI Bridge is going to come in to help try to keep that context consistent?

Ben Potter (00:51:35):
Yeah, absolutely. I think if you start to decompose what an agent is and what it needs access to, the problem becomes a little easier to grapple. For starters, we started explaining an agent is an LLM plus deal to be called tools and a feedback loop from there. So if you think about what a QA agent is compared to a software engineer agent, you still need the LLM component. And you can standardize that. You still need a set of tools that you can standardize such as writing and reading a file. There may be specialized tools for QA agent versus software engineer agent as well. But the main things that you're going to find yourself customizing is not the infrastructure, it's not the LLM. A lot of the tools will be standardized. It'll essentially just be the system prompt that you're giving the agent and the context that you're feeding into it, which to some extent is tools. So when you think about all of these specialized agents, we're actually thinking about specialized system prompts, a little bit of specialized context and a different set of tools. But you can have the same foundation through Coder and through bridge and through boundaries that protect these agents that run them in consistent environments and give them the organization wide context. And then you're just sprinkling in a little bit of extra context through the system prompt and the tools for these specialized agents. So that's really how we see this

Nicky Pike (00:52:42):
Unfolding. Okay. So love everything about that and everything you said is great, it's exciting, but let's talk failure modes now. So there are a ton of stories out there about AI doing unwanted things, the hallucinations, what happens when AI goes rogue and it takes you on this unexpected trip. What safeguards or circuit breakers kick in to prevent damage like we heard where it deleted that production database? What have you got there?

Ben Potter (00:53:07):
I think it's really important to exercise the principles of zero trust when building out infrastructure for agents, meaning only give agents permission to do what they need to do, even then find ways where you can restrict things further.

Nicky Pike (00:53:21):
Okay, so just make sure you get the isolation again, ephemeral nature ephemeral environments come into play here. Well, most of what we're hearing right now when we're talking about AI is really on the individual level of development. We are talking about agents, but it's mostly how can those help the individual developer? When we start talking about enterprise band, we've got to think about scale. So if I'm a company with a thousand developers, what do you expect is going to start breaking? When I have thousands of developers and thousands of AI agents per developer all working in the same infrastructure,

Ben Potter (00:53:54):
You're going to need a lot of infrastructure and you're going to need ways to scale your infrastructure out across a control plane, across provisioners. You can need a lifecycle for agents and infrastructure where they start, stop, update, delete. You're essentially building a full infrastructure management platform, a full API for managing agents and then a full way for humans to then interact with agents. And it gets very challenging at large scales. So I think what's going to break is everything that comes with scaling a multifactor application combined with the fact that this multifactor application is spinning up infrastructure and then running unpredictable workloads on the infrastructure. So it gets very, very complex very quickly. So I think even at scale of 20, you're going to start running into issues around managing these environments

Nicky Pike (00:54:45):
Well, and there's also, I mean when I think about this in my mind, and we're spinning up AI agents, so if I compare that to humans, I see humans, their environments, we're going to say long-term, but if we're looking at Coder, that long-term could be a day, right? We are talking ephemeral nature, but when I start thinking about AI agents, I'm thinking almost task-based. These things may spin up for 15 minutes, they may spin up for an hour. If it's a really complex project, it may spin up for a day or so. So that nature of being able to spin up having ency when you're talking machine scale, how important is it for these environments for agents to be able to be damn near instantaneous for 'em?

Ben Potter (00:55:21):
It needs to feel fast if you are a developer or someone working with it, and it needs to feel very, very snappy. You need to have a pool of these on standby essentially to pick up a task. And on the other side, you're going to need something cleaning up the pool and pausing agents when they're no longer actively thinking and running to save on cloud costs, to save on storage. And it's a security problem too. You can't have a bunch of old environments running that may or may not have been updated. So not only do you need a fast developer experience, but you also need solid policies and governance around just managing the life cycle of these environments.

Nicky Pike (00:55:57):
I agree with all of that. Well, I'm going to ask you one more question and then I'm going to get into some predictions with you. So Coder has the objective to define and become the leader of AI dev infrastructure. Paint me a picture, Ben. What needs to happen in the next 12 months for you guys to realize that objective and make it real?

Ben Potter (00:56:13):
I think we're really good at developer environments. We've been the leader in enterprise developer environments for the last at least three years, and have some fantastic customers and design partners that we're working with. We are not yet experts in AI workloads and what it means to build AI first environments, AI first workloads. So we're now taking a look at a lot of the core principles that we made when developing Coder and we we're looking and saying, well, some of these are great. For example, the way we start and stop workspaces, we still need to do that. We were just talking about that. The way developers can connect in to work alongside it, it's great, but there's a whole set of things that need to either be adjusted or built for this AI first landscape. And we think we're in a pretty great place. We have some awesome customers we're working with. We have solid foundation for developer environments, but I think we need to become experts in AI workflows. And part of it's this, it's us talking and saying, here's what we're seeing working, here's what we're seeing, not working. It's going to be a collaboration with our current customers and us finding new people who want to use Coder for AI workloads and working with us on that journey.

Nicky Pike (00:57:15):
Yeah, I can only imagine that your partnership with Anthropic is helping out there. Well, having a company that is the tip of the spear when it comes to AI, that's got to be awesome.

Ben Potter (00:57:23):
Yeah, they're very smart people and I learn a lot from them every time we chat.

Nicky Pike (00:57:27):
Alright, well let's get into some predictions. You've been noted as saying that sandboxes won't suffice for AI development. It's September, 2026. I'm a developer at a Fortune 500 company, are agents shipping code to production, and what does my daily interaction with those agents look like?

Ben Potter (00:57:42):
It's a really good question. I don't think agents are shipping code to production in 2026. I think agents are helping developers write more code and write more tests and write documentation and all these other things that feel boring, but help in the day-to-day workflow. And they're very bite-sized pieces. I don't think 2026 is going to be the year of the AI software engineer for the enterprise. It might be for the startup. We might have a few AI software engineers, but with a lot of enterprises and especially Fortune 500 companies, they move a lot slower and they look for very safe and predictable ways they can bring in technology for demonstrable improvement of their lifecycle while also showing people that it's modern and it's a cool place to work. But I think they're going to be assisting developers and QA engineers and product managers in a lot of different workflows, and they already are today. And I think agents are going to do that to an accelerated way. But I don't think my AI software engineer at a Fortune 500 company is going to be shipping code to production by itself.

Nicky Pike (00:58:37):
Gotcha. All right. We'll see. We'll invite you back in September to see how that makes true. Here's my last question, Ben, I'm going to put you on the spot. I'm going to stir things up a little bit. I want you to give me your most controversial prediction about AI development, something that you think most people in industry will disagree with, maybe even something that gets you booed off stage at a DevOp conference. What is your most controversial opinion?

Ben Potter (00:59:00):
Yeah, and this is interesting because I spend a lot of time talking to enterprise leaders and very talented software engineers, and it's really interesting to see who would you guess I guess would be the larger optimist for AI?

Nicky Pike (00:59:13):
Well, I'm looking at the AI companies. I would say OpenAI, Anthropic, those type of companies.

Ben Potter (00:59:19):
That makes sense. And I think they are, obviously they're building the models, but in terms of the average software engineer I think sees AI from a very pragmatic way. It either helped me in this workflow or it didn't. And I think enterprise leaders are really excited about AI is another way to transform the way work is done overall. And I'm finding that CISOs and CTOs are often a lot more excited about AI agents than a software engineer is because they're thinking of ways they can automate systems and they view it as less of a threat than engineers will. So I think my controversial take is just an observation of that polarization, those two different sides around agents. And I'm really curious if other people have seen that too. I've worked with plenty of software engineers that are extremely optimistic about agents and they have multiple of them working on experiments all the time. But I'd say that's definitely more in the minority. I would say that most software engineers I work with are like, yeah, it's like an incremental gain. It's pretty cool. And then I'll talk to a CISO and they're like, yeah, this agent's going to patch all my security vulnerabilities overnight in six months and I can't wait. And I'm like, huh, it's not what I've expected by far.

Nicky Pike (01:00:23):
Interesting. Yeah, I wouldn't have guessed that either. I wouldn't have guessed that CISOs would be more apt than developers. Now I can get it. When we look at the LinkedIn, we look at all the debates going out there. We're only here in the extremes. We talked about it earlier. We only hear in the extremes. And if you're not out there actively using it or if you're only considering the narrow scope of only code generation, I can see where software developers would be concerned. But if you kind of broaden your mind outside that scope, I think there's a lot of potential there. It's here to stay. I don't see that changing any anytime soon.

Ben Potter (01:00:53):
I totally agree. I think what people get excited by is the idea of the grunt work that developers don't really want to do and probably aren't doing. Being automated with AI, such as making sure the docs are up to date or patching a security vulnerability or something like that. And I have talked to are like, that's the hardest part of my job is getting this code written or suggested or contributed that helps the health of the company, but isn't something that a developer or a team can always prioritize. So I think that that's kind of where the split is, is developers aren't really doing some of this stuff anyway. It's just not happening. So it has to be incorporated on a systematic level. And even then, I think the answer is somewhere in the middle, right? People have an inflated expectation of what AI can do if they're not trying to use it every day for code generation. And then I think it takes a little bit of extra effort on the software engineering side to really get a large amount of value out of it in the software engineering workflow.

Nicky Pike (01:01:47):
Excellent. Well, Ben, I want to thank you for coming on Devolution and talking to us. It was a very interesting conversation about agents and where we see the future going. Anybody that's out there listening, do you agree with Ben? Do you disagree? Do you want to yell at him or do you want to congratulate him? Get in the comments, let us know. Don't forget to hit the like and subscribe button so that when the next episode comes out. But Ben, it was great talking with you. Any parting thoughts before we end up?

Ben Potter (01:02:10):
I'm not great on the spot, but thank you for having me. This was a great conversation.

Nicky Pike (01:02:15):
I enjoyed it as well. Alright, until next time, want to tell everybody, be looking at what you can do. Welcome to the Devolution. Thank you. Thank you for listening to Devolution. If you've got something for us to decode, let me know. You can message me, Nicky Pike, on LinkedIn or join our Discord community and drop it there. And seriously, don't forget to subscribe. You do not want to miss what's next.

More episodes

Chapters

What is [Dev]olution?