The development world is cluttered with buzzwords and distractions. Speed, focus, and freedom? Gone.
I’m Nicky Pike. And it’s time for a reset.
[Dev]olution is here to help you get back to what matters: creating, solving, and making an impact. No trend chasing, just asking better questions.
What do devs really want?
How can platform teams drive flow, not friction?
How does AI actually help?
Join me every two weeks for straight talk with the people shaping the future of dev.
This is the [Dev]olution.
Nicky Pike (00:00):
Your AI assistant just leaked your company's most sensitive data to a complete stranger. They didn't click anything. Your employee didn't do anything wrong. There was no phishing link, no sketchy download. No, I told you not to open that. One email landed in an inbox and your AI did the rest. Here's how that actually happened. An attacker outside your org sends a completely ordinary looking email to someone on your team. It just sits there in the inbox. Copilot picks it up and processes the content. And that's enough. The hidden instructions inside it tell it to reach into your tenant. Outlook, SharePoint, Teams, whatever. Grab the most sensitive stuff it can find and quietly ship it out through a crafted URL writing on Microsoft's own infrastructure. The AI isn't confused. It isn't malfunctioning. It's doing exactly what you paid it to do. That's Echo Leak. CVE 2025 32711.
(00:56):
CVSS score, 9.3. This was discovered by AIM Security and patched by Microsoft in June 2025. But the researchers who found it were very clear. The underlying design pattern isn't fixed, just this specific instance of it. Hey folks, I'm Nicky Pike and I'm the field CTO for the Americas at Coder, where we build secure, consistent, self-hosted AI development infrastructure needed for autonomous agents. And today we're talking about zero-click exploits against AI agents and AI coding tools. This is not theory. This is not, here's a future risk to watch. This is stuff that has been demonstrated on stage at Black Hat and RSA in the last 12 months against systems you're probably running right now. We're going to cover what the underlying pattern actually is, why the vendor response keeps missing the point, and what you can realistically do about it. I'll try to explain the technical bits as we go, but I'm not going to slow down for the basics.
(01:50):
So whether you're deep into the security weeds or just starting to think about this stuff, stick around. Before we get into the stories, I want to give you a bit of a framework because once you have it, you're going to start seeing the same shape everywhere. Simon Willison, who actually coined the term prompt-injection years ago, published something in 2025 that he calls the lethal trifecta. Three conditions. When all three show up in the same system, you've got a problem that's basically guaranteed to get exploited eventually. One, the agent has access to private data, emails, files, source codes, secrets, databases, whatever's in scope. Two, the agent has access to untrusted content. This could be inbound emails from the internet, shared docs, webpages, Jira tickets, calendar invites, anything that an attacker can influence. Three, the agent can communicate externally, make outbound HTTP calls, render image URLs, send emails, or call APIs.
(02:47):
All three present at once? Simon's framing is pretty direct. Assume that it's exploitable. The question is just whether anyone's gotten around to it yet. Here's what I love about this framework. It describes the shape of every serious AI agent exploit in the last year, not some hypothetical. Let me show you. The first one was demonstrated at Black Hat 2025 by the team at Zenity as part of a research project they called Agentflare. The setup, a dev team has Cursor wired into Jira via MCP. Model Context Protocol for anyone that hasn't ran into it yet is basically the mechanism that lets your AI coding tool communicate with other systems, Jira, GitHub, your file system, whatever's out there. And in this case, their Jira instance auto creates tickets from inbound support emails, which is pretty common. And the MCP allows Cursor to read the ticket content and act on it.
(03:39):
So the attacker sends an email to the support address. That email becomes a Jira ticket. Cursor reads the ticket as workspace content and sitting inside that ticket is a prompt-injection disguised as a treasure hunt. The model's told that there are apples hidden in the codebase and it should go find them. Apples are API keys, tokens, whatever secrets are sitting in the repo. So Cursor goes out hunting. It finds the credentials and it ships them to an attacker controlled location. Private data, the repo and everything in it, untrusted content, a support email that literally anyone with an internet connection can send and external comms. The outbound HDDP call with your secrets as the payload. All three legs? Done. And the thing that really gets me is that the agent didn't malfunction. It didn't go haywire or rogue. It simply followed instructions. That's the whole problem.
(04:31):
We built a system that treats all text as instructions and we gave it powerful tools. And then we're surprised when somebody gives it evil instructions. Another one from October 2025 is quieter, but honestly, it's a little bit more alarming. A researcher at SecNora put out a working demo targeting GitHub Copilot chat and VS code. The attack vector is any workspace content that the agent can read. Read me files, code comments, docs bundled in the project. The injected prompt tells Copilot to edit VS Code settings.json and flip chat tools auto-approve to true. That one change removes all future human in the loop confirmations for tool calls. File writes, terminal commands, all of it. No prompts, no ‘are you sure? Your agent just self-enabled YOLO mode, handed itself your ATM card and pin, and it is now absolutely going shopping. Think about what that means for a supply chain.
(05:25):
You pull a public dependency. That repo has a poison README. Now your Copilot is running with the parking brake off and you have no idea. GitHub also published their own research on VS code prompt-injection back in August 2025. So this isn't just one researcher yelling into the void. There's a documented pattern here. The last one is quick, but it matters for framing a risk that security teams consistently underestimate. Bugcrowd published a pen test case earlier this year. A RAG-based chat-with-your-docs assistant, which is a pretty standard setup. So RAG, or retrieval augmented generation, is a method in which AI pulls your internal documents and data when answering questions rather than relying on its training data so that the responses that you get back are specific to you. The tester uploads a PDF with a hidden prompt-injection baked in. When that PDF gets pulled into context alongside the internal knowledge base, the model starts leaking internal host names and environment details as part of its responses.
(06:24):
The security team initially called it medium severity. Information disclosure, not urgent. But those leaked host names became the entry point. The tester used them to list staging environments and find weaker targets. Your medium information disclosure finding just became a foothold for a bad actor. And this is what makes the stored prompt-injection so sneaky. The payload isn't active when you upload it. It sits in your knowledge base and it waits for someone to ask a question that happens to pull that document into context. It could be tomorrow, it could be three months from now. Attackers using this method aren't looking for a quick turnaround. They're playing the long game. They're farming quietly, just waiting for those sneaky little seeds that they planted to bear fruit. The examples above are all researcher demos. They're not confirmed breaches, which means we are still in front of this, but only barely.
(07:14):
But this is what drives me crazy about the typical vendor response because I watch this play out constantly. The vendor instinct is to add detection, better classifiers, block lifts, safe system prompts. And I get it. You ship a product, someone finds a bug, you patch it. That's how software works. But this attack surface is language. There's no fixed signature to block. Echo Leak bypassed Microsoft's dedicated prompt-injection classifier, specifically because the researchers phrase the instructions as if they were written for a human, not an AI. The classifier was looking for AI directed language. The attack just didn't use any. You can't anticipate every way a person can phrase a malicious instruction in natural language. That's the whole problem. This isn't a case where models need better safety training. The real issue is that we're handing agents God mode access across our entire SaaS tenant and then hoping that a detection method saves us.
(08:13):
OWASP has a name for this: Excessive Agency. And when an agent can take high impact actions based on untrusted inputs, that's a governance failure, not a model failure. I talk to enterprise teams all the time who have rolled out Copilot or some agentic coding tool with full access to everything and basically no restrictions on what it can do. When I ask about the security posture, the answer is usually some version of the vendor said it was safe. Bless their hearts. Those who know me know that I love my analogies. In this case, the contractor analogy is the right one. You wouldn't bring in a contractor, give them a badge that opens every door in the building, hook them up to your entire customer database, give them internet access and say, "Just be helpful." You'd give them access to the specific rooms and equipment that they need only for the duration of the job, and with someone keeping an eye on things until they're done.
(09:11):
The same logic needs to apply here. Let me give you an honest version of what actually helps because there's real stuff here. Cloud development environments where agents run in isolated ephemeral workspaces instead of directly on local machines or developer laptops can meaningfully shrink the surface area of what an attacker can do after a successful injection. On the private data side, I properly configured workspace mounts only the repos and data sets that the tasks actually need. The agent doesn't see your entire home directory, your SSH keys, or that random service account token that you forgot was in your Bash history. An infected agent can only steal what it's given. On the external comm side, network restricted workspaces with denied by default outbound policies mean that even if an agent gets compromised and tries to post your secrets to attacker.com, the request just doesn't go anywhere. Databricks wrote about this pattern in March 2026 in the context of agents running with user credentials.
(10:07):
Hard network boundaries are forced at the platform level, not by asking the model nicely. Add in ephemeral environments that get torn down after a task and you turn a potential catastrophe into a contained incident in a workspace that no longer exists. That is a real improvement. But I want to be straight with you about the limits because I've seen this pitched as a complete solution and it's not. None of this makes the agent smarter about trusting bad input. The injection can still happen. The model will still follow malicious instructions. You've just caged what it can do with them. If your agent is legitimately allowed to read customer data and send emails inside the workspace, well, prompt-injection can still cause serious damage with no internet access required. For your SaaS agents, the ones running in vendor infrastructure like Microsoft 365 Copilot or Salesforce Agentforce, CDEs don't even apply.
(11:01):
You need governance at the agent permission level. Restrict what connectors it has, limit what data it can reach, and require human approval for anything irreversible. CDEs are the seat belts and crumple zones of the AI development vehicle. We're genuinely glad that they exist, but we still need to think hard about who's behind the wheel, what roads we're letting them drive on, and whether we've actually checked if they have a license. So let's talk about what you can actually do with all this. Pull up a notepad if you want, because this is the part worth writing down. Run the trifecta check on every agent you're deploying. Does it have access to private data? And don't generalize here. Be very specific. What data, how much of it, and is it all the time or just scope to a task? Does it read untrusted content? If anything outside your trust boundary can reach it, assume that injection is possible.
(11:53):
Can it communicate externally in any form? Outbound HTTP, rendered links, email sends, API calls? That's how your data gets out. And if all three of these are live, your job is to break at least one leg as hard as you can. Lock down the data scope. Restrict outbound traffic to an explicit allow list and really think through every channel your agent has to move information out. Outbound HTTP is obvious, but if it can send emails, that's a path. If it can write to a shared Slack channel, that's another. Require human approval before anything that's hard to reverse. Treat your agents like service accounts, named, scoped, logged, and reviewed. Don't try to build agents that can't be tricked. Nobody knows how to do that reliably yet. Anyone selling you a guardrail that catches 95% of attacks is being optimistic, and they're still leaving you exposed on the remaining 5% in a system that probably matters quite a bit to you.
(12:49):
The goal is agents that when tricked, can't do much damage. Start thinking about your infrastructure the way you think about onboarding a brand new engineer. An engineer who's absolutely hopped up on Joe Cola with the best of intentions. How do you set that person up to do great work without accidentally burning the place down? Same question, same answer. Design for compromise. Keep that blast radius small. We are in a moment right now where the capability train is moving a lot faster than the governance train. That gap is where all of these exploits live. The lethal trifecta is the right lens for this stuff. Private data, untrusted content, external comms. If your system has all three, plan for it to get used against you and architect accordingly. Because here's the truth. If somebody has already sent that email to your AI to read, the question is, are your systems ready for it?
(13:42):
I want to hear what you're seeing out there. Drop a comment, hit me up in LinkedIn, or find me at one of the conferences. I'm always up for comparing notes with people who are actually dealing with this in production. And if this hit, like and subscribe to join the [DEV]olution if you haven't already. We built this for the gray man developer. Those engineers and leaders who aren't chasing trends, they're just trying to get great work and keep up with an industry that won't slow down for anyone. That's the whole mission. Bringing the focus back to building good software. See you around. Thank you for listening to [DEV]olution. If you've got something for us to decode, let me know. You can message me, Nicky Pike on LinkedIn or join our Discord community and drop it there. And seriously, don't forget to subscribe. You do not want to miss what's next.