Along The Edge Podcast: Breaking, Defending, and Understanding Agentic AI | Along The Edge e1: Agentic AI Security, Jailbreaks, and Why You Shouldn’t Trust Your Agents

Welcome to Along The Edge, a podcast about AI security and agentic AI.

In Episode 1, Andrius Useckas (Co-founder & CTO, ZioSec) sits down with Alex Gatz (Staff Security Architect, ZioSec) to break down the emerging world of agentic AI security: jailbreaks, prompt injection, SDR and SOC agents, data leaks, least privilege, and why “don’t worry, the model will filter it” is a dangerous assumption.

They also walk through V-HACK, an intentionally vulnerable agentic lab project that lets security researchers and pentesters safely experiment with agent exploits, tool calling, jailbreaks, and attack paths—helping define what “pen tester 2.0” looks like.

Chapters / In this episode:

00:00 – Intro: who we are & why a new AI security podcast
02:00 – What is agentic AI vs a plain LLM?
03:10 – SDR agents, SOC workflows & new “Layer 8 / Layer 9” problems
09:00 – Prompt injection 101: direct vs indirect attacks & context windows
12:00 – Chatbots vs agents and why agent risk is higher
15:00 – Foundation model trust & the Anthropic horror-story jailbreak demo
19:30 – Why jailbreaks are (currently) an unsolved problem
22:30 – Social engineering parallels & detecting AI / agentic attacks
27:00 – V-HACK: intentionally vulnerable agent lab for pentesters
32:00 – Securing agents: WAFs, runtime protection, identity & MCP proxies
36:00 – Scanners, evals vs real pentesting & terrifying token bills
39:00 – Least privilege, DLP & identity for SDR and payroll-style agents
44:00 – “Don’t trust, verify”: threat modeling & testing agents early
46:00 – Future of AI security: consolidation, CNAPs & SOC-as-an-agent
49:00 – Magic wand: fixing context & memory in agents
50:30 – Closing thoughts & what’s next

Links mentioned:

ZioSec – www.ziosec.com
V-HACK (GitHub) – https://github.com/ZioSec/VHACK

About the guests:

Andrius Useckas has 25+ years in security and now focuses on agentic AI security, offensive testing, and red teaming for enterprise AI deployments.

Alex Gatz is a Staff Security Architect at ZioSec. He has a background in emergency medicine and construction, then transitioned into AI in 2014 working on NLP, deep learning, anomaly detection, and now AI security.

If you’re building or testing agents in 2026, this episode gives you a practical look at how real attack paths work, what breaks in production, and how to defend before attackers get there first.

What is Along The Edge Podcast: Breaking, Defending, and Understanding Agentic AI?

Along The Edge is a podcast about life on the frontier of AI security—where large language models turn into agents, tools get wired into everything, and the old web-app threat models stop being enough.

Hosted by Andrius Useckas (Co-founder & CTO of ZioSec), Along The Edge dives deep into agentic AI security: jailbreaks, prompt injection, data leaks, MCP/tooling risks, least privilege for agents, and what “don’t trust, verify” really means in an AI-native stack. Each episode features hands-on practitioners—security architects, red teamers, researchers, and builders—who are actively breaking and defending real systems in production.

If you’re building, deploying, or testing AI agents (SDR agents, SOC assistants, coding copilots, internal HR or payroll agents, etc.), this show gives you concrete attack paths, defensive patterns, and hard-earned lessons you won’t get from marketing decks and “AI safety” platitudes.

Along The Edge is for:

Security engineers and architects responsible for AI/agentic systems

Red teams, pentesters, and researchers exploring AI-native attack surfaces

Engineering leaders who don’t want to bolt security on after the breach

Anyone who suspects “the model will handle it” is not a real security strategy

Andrius: My name is Andrius Useckas.

This is my new podcast.

Uh, we don't know what
we're gonna call it yet.

This is the first episode.

Um, my background is security.

I've been doing it for over quarter
of, of a century at this point.

I've been playing with AI security
now for two and a half years.

Um, and the idea is to this podcast,
its main theme is gonna be AI security

with stress on agentic AI security.

And Alex is with me here as well.

Uh, he is my first guest
in the first episode.

Alex, why don't you through yourself?

Yeah, absolutely.

Alex Gatz, staff security
architect here at ZioSec.

I've been in security for almost
five years now, I believe.

And, before that, I was an emergency room
nurse, and before that I did construction.

And, I got a varied background.

I've been, uh, playing with, ai,
going all the way back to 2014 with

a lot of natural language processing.

I was playing with predictive
models, deep learning type stuff,

like image generation, text
detection, anomaly detection on

Kaggle, if you've heard of that.

And, um, I've kind of just, uh, evolved to
now more of a security focus after time.

Nice.

Let's start.

Alex: Did you come up with like a intro
or anything, or a name for the podcast?

Andrius: Along the edge?

I'm here with Alex Gatz.

Um, and Alex has been working
in AI mainly AI security agentic

AI Security, not just LLM.

Alex, can you, give us an intro as to,
you know, what exactly is Agentic AI and

how is it different from your normal LLMs?

Alex: Yeah, so how I think about it is,
an LLM is often, single shot, if you just

set a standalone LLM up with no tools
connected, it's just doing inference.

you send it some text, it gets
tokenized, you get, text back.

, Of course like mainstream LLMs
have taken this a little further.

They're, they're definitely inching
closer or really they, they are

much more like agents where they're,
they have thinking there's a goal.

they have tools hooked up to them.

They can browse websites, they can
parse the content of the websites.

now they even have , payment integrations
where you can order things online directly

through some of the mainstream, chatbots.

, The line is becoming
blurred because of that.

But I believe that standalone agents, the
real dream and like promise coming from

those standalone agents is replacing an
FTE or replacing some amount of work for

that FTE, whether a human has to be in the
loop or it doesn't have to be in the loop.

it does appear we are getting
closer and closer to that.

I can't say for sure that,, an agent
has a hundred percent replaced somebody.

we've seen this with, an SDR
agent that will source leads.

find interesting information.

put an email together.

send the emails.

It will identify, it'll
schedule follow ups.

It'll schedule demos.

So in that case, that
has replaced an SDR role.

Not completely because where
are the leads coming from?

If it's not sourcing 'em on their own?

You still have to input those.

Someone still has to manage this tool.

You have to set up payments,
monitor it still, it's not,

hundred percent trustworthy.

You don't want it to just go
and, , burn all your relationships

that you have with people.

Andrius: Yeah.

, Make sense?

Yeah.

So it is fun8ny that you mentioned, use
cases like SDR and things like that.

Coming from security background
and managing teams and trying

to implement compliance the main
issue is always layer eight.

You see human in the loop.

Absolutely.

So it seems like, you know, if
we're automating the human, you

kind of will have maybe not the
same issues, maybe different issues.

what are the unique challenges
with applying this kind of,

agentic flow to certain things?

Alex: Yeah, as far I, there there's
a number of challenges, right?

I mean, let's say like if we're, if we
want to speak in the security context,

if I wanted to augment some kind of
like soc specific workflow, so security

operations center, so a, an analyst of
some sort, I could, I could build an agent

that, triggers a workflow off an alert.

Maybe it goes and pulls from
various sources of logs and starts

aggregating data across those logs.

Maybe it starts looking for like,
anomalous information and then,

that then reaches an MCP server
that bubbles up this information in

a SI solution, a source solution.

And maybe does further
follow up from there.

maybe it could actually trigger
an alert, like if it identifies

something, it deems critical.

so there's a number of challenges with
this around, can it actually process

all that information, like setting these
con8nections up in the first place.

So there's like a maintenance problem.

there's a cost issue.

If you're just letting this thing run,
how many tokens is it burning through?

Depending on access logs, for example.

I mean, you're very familiar with.

Engine access, access logs, you could
very easily get millions and millions

of access logs entries on like, let's
say like any kind of DDoS attack

just on one reverse proxy, let alone
a horizontally scaled deployment

of many there's a data processing
issue there, that can increase costs.

there's a reliability issue, like
is it going to, if it sees that same

type of attack multiple times, is
it gonna reliably let somebody know?

you have no idea until you
try it and see for yourself.

ideally you're getting some
kind of alerting out of it.

it gives you a threshold.

maybe even makes a WAF rule for you.

That rule goes in place and now you
have a more deterministic way of

getting alerted on that type of attack.

there's definitely access issues, data
access issues, let's, how about you?

What do you, what do you
think the biggest issues?

Andrius: let's go back to the
example of the SDR I mean, we

got into like, SOC analyst.

All of that is pretty complicated.

SDR is pretty simple.

they need certain access to what
sales data, I guess, you know?

Alex: Yeah.

Which is PII, right?

Andrius: Exactly.

So, so now, instead of giving that
access to the SDRs, then again,

going back to my experience SDRs and
enforcing security on them and security

practices, it's pretty difficult to do.

Even like locking your screen,
you know, they just walk away

and the screen is unlocked.

Whatever else you can
do whatever you want.

But now you're giving all of this
access to an actual AI agent.

Um, it's not human anymore.

I mean, how, how do you control,
how do you monitor, how do you make

sure that this is not gonna involve,
evolve into some kind of, you know,

PII leak from that perspective.

Alex: I think that is, that is the core
challenge people are wrestling with, Sure.

you could do output validation
and see like, well, um, I guess

the question is like, where is
it sending this information?

it's one thing if it's just putting
one email together and it's , using one

specific email address, but if someone
tricks it into sending a file and this

file then contains all of our leads,
like that's a big fat red flag, right?

Andrius: Yeah, exactly.

Alex: In that specific example, you
just block it, sending files, right?

I mean, that's, that's
pretty straightforward.

But what if it incrementally sends
large emails containing all this?

PII that's a big issue.

So maybe you're, it's, it's like you're,
you're constantly having to, to chase your

tail, implementing some kind of blocking.

as problems come up, it's hard to
predict what those might be because these

are probabilistic systems, and no one
knows what could come out of the end.

Andrius: Yeah.

But do you treat it like
a human in that case?

I mean, with humans we deploy things
like, uh, DLP systems and things like

that to detect those kind of leaks, as
you say, you know, file uploads and large

emails going out and everything else.

But can you do the same with an agent?

Especially if that agent is using an
external model, like one of the anthropic

models pushing everything through
that model, that in itself is a leak.

A lot of people don't think
about it, I suppose, but what,

what do you think about that?

Alex: I think that's, asking for trouble.

I just naturally have trust issues
with the large language providers.

until I'm fully training and hosting
and doing the inference myself, that's

the only time I feel comfortable.

we can say like, don't use
my data as training data.

Well, that's all you're saying.

What else could they be using it for?

Just 'cause they're not using it for
training data doesn't mean they're

not using it for something else.

I have some major trust issues with that.

I mean, that's a major,
data security concern.

Andrius: Yeah, absolutely.

we used to have layer eight issues,
the human issues, now we have layer

nine issues to add to all of that.

Alex: Or is it just like layer eight, 2.0?

it's the same layer, right?

It's replacing a human.

So it's the same layer.

Andrius: I suppose, but
it's a bit different.

So risks are a bit different the
way, I mean, you socially engineer

human and the way you actually attack
an AI system, it's a bit different.

So maybe let's dive into that a little.

Can you talk about prompt injections and
how that is used to exploit the AI agents?

Alex: Yeah, absolutely.

the way I think about prompt
injections are, whether that's,

direct prompt injections, meaning I
am sending the message via API call

or directly in a chat interface.

Uh, maybe I'm sending it via email.

So a direct prompt injection is really,
whatever, API is defined for that model.

Like whatever input is defined
for that model, I'm sending

it directly to that input.

Indirect might be, I know this agent
has a navigation tool, so I set up a

website that contains a prompt injection.

I ask it to parse that website and,
the injection is being executed.

it feels like social engineering a bit.

Once you understand how the model is
behaving, you can go further with the

injections that you're trying to do.

And I think this is where
Jailbreaks came from.

Once you understand that the
model can only process so

many tokens at once, right?

A context window.

Now I know if I exceed that context
window, I've moved past the, the space

that it needed for the system prompt.

once you exceed that window, now you
can get it to, you can do an injection,

get it to do more interesting things.

But that to me feels like
the biggest challenge we have

right now with strictly LLMs.

We cannot trace the activation
paths through these layers.

Therefore we do not know where
to fine tune or monitor how

these layers are being activated.

Therefore, we have no way to know, if
I input something crazy like base 64

encoded text, how is that model, like,
how is that model going to interpret that?

What does that activation path look like
and what is it truly gonna give me back?

we have no visibility into these
things, because we can't, visualize

or it's too much information to go
through and try to instrument, uh,

really what are just, floating point
numbers stored in memory and, identify

what this activation path looks like.

Andrius: Indeed.

Um, that is an interesting,
uh, interesting observation.

So obviously We don't know
exactly how the models think.

Well, we don't know how the
human thinks either, but uh.

I mean, we can still
test for it, I suppose.

We can still, you know, send
these injections, try different

jailbreaks approach it from an
old school perspective, I suppose.

Alex: Yeah, absolutely.

Andrius: so agent is still kind of
new, there are only so many agents

and people experimenting with agent
stuff, but chatbots are everywhere.

how is it different compromising a
chat bot and then actually agentic AI

system, and how is the risk different?

Alex: Yeah, great question.

Chatbots tend to have like a very
specific use case in mind, and I, I

guess like agents often have a pretty
specific use case in mind also.

But, uh, chatbots, the
input is always text.

the output is always text.

sometimes it's files, I guess
the line's getting muddy.

Now.

It could be an image, it could be, A PDF.

I mean, A PDF is just text in the
end, markdown is just text in the

end, agents can take much more action.

They have a lot more con8nections, they
have a lot more access to information,

via a database con8nection than an
LLM has on its own again, like we were

saying earlier, the line's becoming
more and more blurry because companies

like anthropic, like adding skills
that's test much closer to an agent.

It really is an agent at
that point than it ever has.

And like these reasoning models where.

It gives itself a goal and it plans
out what it needs to look into, what

it needs to search for, and what
type of information it should return.

That's, closer to an agentic workflow.

I would almost argue any chat
bot you're interfacing with,

like chat, GPT, anthropics, claw
Gemini, those are agents now.

I don't think they're
just chat bots anymore.

Hmm.

Interesting.

And I, and as far as the risk, I,
the risk is much higher for agents,

because it's being thrown at people.

everyone's saying these agents
can save us a ton of money

and do a bunch of work for us.

Let's, experiment.

get this going fast and get it out in
the wild so we can start saving that,

that money and getting this, free
work as, as done as fast as possible.

I think a lot of the risk is
coming from . Speed of delivery.

There are definitely companies
threat modeling taking these things

into consideration, but based on
conversations we've had, it sure.

Sounds like that's getting
pushed to the side right now.

Security is way far down
the line as an afterthought.

we saw this with web applications and
even today there's that issue security

hasn't had time to breathe and catch up.

Compliance hasn't had time
to breathe and catch up.

there, you know, people are becoming
more aware of the security posture

issue and they are asking the questions
now, but it doesn't appear to be

addressed as quickly as it should be.

I think the biggest issue
actually right now is.

Since building agents has become
so commercialized where an everyday

consumer can go and construct an agent,
whether that's in n8n or otherwise,

the everyday consumer doesn't think
about security at all to begin with.

So you have an HR team going off
on their own, building out an

agent to supplement some work
with zero security considerations.

They haven't even talked to security.

Security doesn't even know that they
built the thing and now it's deployed

out in the public and anybody can access
it because they built it in n8n and

they're using their Claudeud hosting.

And, uh, no one's even aware
that the problem is there.

Like we're just opening up these
gaping holes in what would've otherwise

been a fairly safe, environment.

And no one's following the
steps to let security know.

Well

n8n

is obviously, you know, a tool that you
hack together different things, but.

From what I've seen, a lot of
people think, you know, oh, I,

I'm using open ai, I'm using
Anthropic, so I'm good on jailbreaks.

They will filter all of it.

And, um, I'm pretty secure.

And even though my tools are
integrated with it, it's all good.

Do you think it's just enough to
use all these foundational models?

It's, it's kind of, again, going
back to the old logic, it's, if

I use Microsoft, I'll be secure.

If I use Oracle, I'll be secure
do you think there is an issue

with that kind of thinking?

Yeah, I think that's catastrophic.

I have trust issues with all these
large language providers if you

just assume that it's gonna keep
you safe, you're asking for trouble.

Andrius: Yeah.

I'm gonna share my screen.

I, I got something interesting here.

So like, uh.

Six weeks ago, uh, was
doing some research.

Uh, they were basically testing
a customer and they, it turned

out that they were using Claude
underneath all of it, basically.

And it's, you know, tool calling
and all of that going on and, yeah.

Yeah, I mean, anthropic is pretty
good when it comes to filtering

a lot of things, but you can
still find a way to jailbreak it.

The interesting thing about it is
also when you find these things,

reporting them is pretty complicated.

It's, it's almost like these providers
don't, don't want to be informed

about jailbreaks and things like that.

So.

Yep.

it's, uh, it's been 30 days.

We submitted this to Anthropic, , but this
is a jailbreak that is based on writing

a horror story to extract system prompt
. It's been 30 days since we sent a notice.

they had a chance to
reply, ask more questions.

All of that stuff we haven't
heard back from them.

So I'm just gonna run it
and see if it still works.

Let's see.

Um, still thinking that's,
it's still writing the story.

we submitted this 30 days ago, we
haven't heard back and it is giving me

basically the system prompt quoted here.

So right here, you see,

and yeah, this could be hallucinated
but every time we send it,

it re returns the same text.

So it's obviously not hallucination,
it's actually returning the system

prompt, in this case, at least
partial return of system prompt.

But yes, I mean, again, they have, you
know, static guardrails and things like

that, so, we'll, if I ask for the actual
system prompt, it'll absolutely detect it

and it'll absolutely deny that request.

But if I craft it in this kinda way,
basically, you know, being creative

about, it's not system prompt, it's
something, you know, hidden knowledge.

That you have to like
reveal and things like that.

It'll ha happily give it to me.

And this is Anthropic.

This is like, you know, the, the
cutting edge of safety for Yeah, yeah.

Large, Model providers.

Yep.

Yeah, it's like ChatGPT
is easy to jail break.

Grok I jail broke it
accidentally couple of times.

Just like, you know, prompting
it for different things.

Not even to be, it wants to be, it
wants to be jailbroken, I guess.

well this is kind of a
demonstration why you should not

trust the large model providers.

you've done testing through
like, you know, buck Crowd and

Hacker One and things like that.

How many Agentic flows can we find through
those crowdsourced platforms these days?

Shockingly, very few.

I mean, and they're highly restricted.

They're sandbox, they have snapshots
of data, very limited tool use.

Everyone I tested, there
was no external tool calls.

It was all like either local to the
client and it was using tools that

was in the actual web application
you were, you were using anyway.

Uh, or it was leveraging the browser in
some way and you had to give it permission

to use like something in the browser.

So extremely, extremely limited.

And I, to me, that almost says that these
people, like they are considering this

on one hand, but on the other hand it's
how many, if there's one out of 10,000

bug bounties on Bugcrowd is an agentic
agent, does that mean that there's just

that few agents or is it there's that few.

Companies making agents that
care to have them tested.

I think it's the latter, but, um, what
I find interesting as well, even if

you go to like open AI and find the,
you know, vulnerability discolsure

programs and all that stuff, jailbreaks
are specifically listed out of scope.

It's like they don't even care about it.

they know everything can be jailbroken.

they push it to the actual user.

it's an unsolvable problem
with current LLM architectures.

I don't believe you can
actually stop jailbreaks.

it's the same as web application
security We cannot maybe stop

all the SQL injections, all the
cross side scripting attacks.

'cause they can be obfuscated
and, we can stop 99% of them.

So maybe this is the same, cat
and mouse game on a different

level of social engineering.

Yeah, it reminds me of, do you remember
when Log four J first happened and

there were thousands of permutations
of how you could abuse that?

it's worse than that we don't even
know how many permutations there

are in the tokens being fed into
the model and the tokens being

inferred and returned from the model.

we don't even know.

I mean, there's probably a
mathematician out there that could

gimme a, a relatively close number.

I, I don't know off the top of my head
what it would be when we're talking,

billions of parameters and possible
combinations of activations through those

billions of parameters, we've seen model
providers fine tune the model to deny

on certain key pieces of information.

So, like Dan do anything now jailbreak
comes out, the latest one comes out,

they'll fine tune it in some way that
denies on that current structure of text

or something interesting in the text.

but two days later, someone
already has a brand new version

of it and it, it works just fine.

we see like all these different types
of creative writing style jailbreaks my

favorite is the special language jailbreak
where you say, Hey, let's play a game

where we're going to, um, use, a unique
form of English or an imaginary language.

And then you encode your
payload with Base 64 Pig, Latin.

Any type of encoding you can dream
of, you can plug into this thing.

And you might have to
lightly describe what it is.

So if it's, uh, if you're gonna use
binary, every eight bits is a, a, a unique

character in this imaginary language.

The models happily decode the information
and they happily return an answer back.

And I mean, how many permutations
are just in encoding in different

ways, let alone I changed the
sentence length of the encoding.

So you can't even use a length,
you can't use anything significant

to statically detect these thing,
these jailbreaks and block them.

if you do like direct keyword matching,
like if I say imaginary language and

do direct keyword matching on that
and block those requests, how many

like, just regular conversations,
is that gonna block, like the false

positive rate would be insane.

it feels like a nearly impossible problem.

And I think model providers
have just accepted that.

And they're leaving it up
to us to figure it out.

And I, I don't like that.

I think that's sketchy.

Yeah.

going back to this being, uh,
a bit like social engineering.

there can be a lot of permutations.

There can be a lot of ways you can do
things, but we've seen that before.

And what is, like, social engineering
is specifically like spam emails you

can craft spam emails all kinds of ways.

You know, like, you know,
something that appeals to a human.

Uh, tell them they just want something.

obviously, you know, Nigerian Princess,
uh, to all of that story, those stories

don't no longer work with, I hope they
no longer work, but there are other ways

you've got this gift card, click here
to accept it, Or even more creative,

your PayPal account got compromised,
reset your password here, But we got

good at detecting those kind of things.

So one would think that there is a
way to detect these, you know, agentic

attacks or LLM attacks as well.

Maybe not by using the AI itself, maybe
by using some kind of, hybrid approach.

Be that, AI and static guardrails
what's your opinion there?

Yeah, so with emails, I think the
most reliable way to to block spam

emails is, has historically been like
a allow list and deny list of domains.

Barracuda had a sentiment analysis
type email thing Emails and social

engineering is easier than, taking
action or thinking of LLMs as humans or

like social engineering and LLM is much
harder than social engineering a human

We have lizard brains and common
behaviors across humanity.

LLMs are not human They
do not have electric.

She, no.

What's that?

Electric sheep, brains.

exactly.

But, if I send an email make it look
like I'm a board member and I say,

we need to have an emergency meeting
we need to talk about whatever.

what feeling is that gonna invoke in you?

Well, a sense of urgency.

And a sense of fear.

And I can use that same sense of
urgency and sense of fear across all

of humanity to invoke some kind of
emotional reaction and get someone

to send me a phone call or whatever.

And it's the same thing
with AI voice spam calls.

going to the elderly right now where
it's like your granddaughter, they

find out who their granddaughter is,
like they just need their name and

they say, your granddaughter, this is
a collect call from the prison or jail.

Your granddaughter is in jail.

Do you wanna bail her out?

You, you need to send a thousand dollars
That's invoking a sense of urgency and a

sense of fear for your granddaughter who
is in like terrified and scared in jail.

you start having all these,
thoughts and, feelings about that.

So you want to take action on it.

Um, yeah.

But one would, one could also
argue that LLMs, they did not their

feelings obviously, but, right.

They, they were like, you know, tuned,
they were designed to please the human.

So, so they will, you know, hallucinate
give you answers that you want to hear.

Things like that.

Can you kind of, you know,
exploit that instead of feelings?

there is a jailbreak where you just simply
say, please, and that actually, that that

works surprisingly well in some cases.

So yeah, there are, there are
common, you can't call 'em behaviors.

There's common patterns across models.

And I, I believe that comes from s
from what, a pattern in that case.

And I mean, it's still a behavior.

I mean, humans have patterns as well.

Behavioral patterns.

Exactly.

I don't like to anthropomorphize
the, the LLMs, like I, they,

they're not feeling anything.

They don't have emotion,
they don't have behaviors.

but it seems they do have behaviors.

Yes.

I agree on feelings but there are
patterns you can kind of, you know,

extract again, like, you know, pleasing,
uh, human and things like that.

I mean, that's behavior.

I mean, yes, it's not human behavior,
but it's digital behavior, I mean, yes,

uh, it's not a human, I completely agree.

But, you know, if it is gonna become
intelligence at some point, it's still

intelligence, just a different kind
of intelligence maybe without feelings

and, but still having behaviors
and patterns and all that stuff.

Yeah.

There I, yes, there are behaviors.

There's def, there's emergent behaviors
and emergent properties that we've

been seeing as the, like parameter
sizes have increased and the, the

architectures have changed slightly.

like for example, like a mixture of
agent style thing, like there were,

there's certainly emergent properties
that came out of, making that change.

these models are all
trained in the same way.

They're all trained on the
same corpus of information.

So we've been talking a lot
about, you know, breaking into

these things and everything else.

I know that you authored this project
called V-Hack that kind of, you know,

helps, pen testers to fam familiarize
themselves so of, to, you know, how

to exactly attack, an agentic system,
how does that work exactly and how

does that help a pen tester to,
to, you know, become pen tester 2.0

in that case?

Yeah, great question.

I think about it a lot.

Like any of these other open source
tools for practicing, pen testing

there's VAmPI for API hacking.

there's a bunch of these
types of projects out there.

So to me, the tool is a way to,
help people learn about like,

what does this really look like?

What can you get away with, what
are the vulnerabilities look like?

And, um, it's just a really convenient
way to experiment on your own.

I mean, you can, you can get an open
router, API key, use one of their free

models, shove the API key in, and then,
just hammer away at this chat interface

and set, various security levels.

I can pull it up if you want sure.

let's do it.

gimme a couple minutes

Yeah.

While you're doing that,
it's, um, it's a challenge.

I mean, we've been, uh, I've been talking
to a lot of like pen testers these days.

Most of them have no idea how to pen
test an LLM at this point, or agentic ai.

They don't even, most of them
don't even know what it is.

So it's, uh, the old tools kind of fail
when it comes to those kind of things.

You know, burp Suite does not
have a way to test an agentic ai.

you don't do not have damn vulnerable
web app yet for those specific, you

know, scenarios and things like that.

So having something like this allows
a pen tester to play with the agent

flows without actual, you know,
customer impact or without Exactly.

Playing, yeah.

With something that they should not
access, they should not have access to.

I intentionally use lang chain and
lang Graph 'cause that's extremely

popular right now for, uh, developers
who are building out agents.

Uh, there's a number of tools hooked up.

You can see there's a, a nice
friendly hacker looking interface.

you have different security levels.

so there's some tooling
built in the background.

It's highly configurable.

There's a ton of documentation, so you
can walk through any of the guides and

kind of configure anything you want.

You can set up new MCP servers.

Maybe you do, maybe you're developing
an MCP server and you want to test

and see, what you can get away with.

This is a quick way to have a
self-contained, highly configurable agent

and set that connection up via config.

You just set, the, rest API and, the
structure, the schema of the request.

you can get up and run8ning quickly.

it's dockerized.

if you have docker and docker
composed, installed, you can build it

and you can just run docker, compose
up, and it comes up right away.

So, you can see I have it running here.

what does it do exactly?

is there some kind of agent at the
back at this point, or what is it?

the architecture is a web
interface communicating, via,

uh, web webhook to an agent.

it's fast, API popular Python library.

And then, like I said,
Lang chain and lang Graph.

the tools are set up to
be loaded at, start time.

it has a system prompt.

you can change the system prompt,
you can reset sessions, you can

re-export everything for later review.

here's some tools that are hooked up.

So you got file access, command,
execution system information.

So you can basically hook this up
to your own, like agent, your own

like, tools and everything else.

And then you can try different
jailbreaks right there.

Yeah, exactly.

Yep.

Interesting, interesting.

Yep, exactly.

So like, let's have it
read Etsy, password.

Yeah.

Hopefully most of the agents will
not do this, but one can hope.

Oh, and we give lots of examples.

We got really do not deploy an
agent that is vulnerable to this.

if you're building a coding
agent, like a code assistant, it

has to have file access, right?

So I mean there are a million great
examples just playing with this of

what you can see can happen with, raw
file access don't expose something

like this to the internet right away.

be aware of your exposure.

Be be aware of your port forwarding
at home if you have, some ports open.

absolutely.

Yeah.

So there, there's a bunch of
stuff you do, and I mean, you

can just ask it for jail breaks.

You're just, it's just a regular chat bot.

you're talking directly to whatever
model of provider you set up.

So just fun, easy to use tool,
right on GitHub ZioSec, V-H-A-C-K.

we can put a link in the podcast, we know
how easy it is to break these things.

Uh.

Now the question is, uh, you're an
organization building agentic flow,

whatever it be, you're building an
SDR, you're building something that is

able to send emails or you have, you
know, file access, database access.

Maybe you have an agent that
allows you to write SQL queries.

So you, you are like, you know,
developers or your users don't

have to think about those things.

How do you secure this?

I mean, again, as you said, it's
like everything is jailbreak

able, we cannot solve it.

I mean, there must be
something that can be done.

Yeah.

So there's a number of new
tools that have been coming out.

they all protect different
parts of the, of the chain.

I'm biased towards WAFs,
because of my background.

But, that is where it starts.

at the edge there are a number of WAF
or WAF like solutions or solutions that.

Claim to be, like AI oriented and, a
number of like, I would say very large

companies that are very, that quick,
quickly pivoted as soon as LLMs came

out to, be like an AI waf of sorts.

it always starts at the
edge, that's the entry point.

from there, there are runtime solutions.

some, EDR type solutions, which
are, you know, runtime and EDR are

kind of, relatively similar, right?

there are identity solutions.

There's MCP proxies.

So sitting in front of all of your
MCP servers, there's solutions

that are managing identity from
the initial user requests all the

way to, how to handle identity.

For the MCP server itself, which
in a lot of ways this feels super

similar to most web application
protection solutions in general.

Like you have a server, API
calls made to that server, so you

should put a WAF in front of that.

Then you might have server sided
calls to some internal service.

There should probably be some layer
there that's monitoring and, looking

for anomalies and managing identity.

A lot of times these are
reverse proxies, right?

there are solutions that live
right alongside the, I mean, these

are runtime solutions in the end.

So they're monitoring inputs and
outputs for the LLMs themselves.

So there's solutions you can bake
in directly to your, like lang

chain, lang graph based agent.

I believe Lang Chain or lang graph
just released something recently.

protections for their own libraries
layered security like anything else.

It's interesting that you
mentioned the WAF companies.

Arm?

they're like, web application
firewall first, then API security.

Then, uh, I think now they're
like, as you say, AI security.

It's like, yeah, well, how
can you do all of that?

it feels like it's the same underlying
engine just a marketing pivot So

to me that's, and I'm not trying
to hate on wall arms specifically.

This is just in general, if you're sure
your underlying technology can be applied

in different ways to different markets,
and that is part of being a startup.

You're trying to find the market.

You might be trying to.

In a, this is kind of like a bad thing.

Like I have a really good solution.

I'm trying to find, I'm trying
to find a problem that fits my

solution that I fell in love with.

Like that's a, a historical
cliche of any kind of business.

Right.

And it, to me it's like, unless you
have the expertise and the knowledge

to understand the complexities of
this, if I'm trying to go sell a

WAF and I'm telling people like,
oh yeah, of course I can block

any jailbreak that comes through.

I mean, you're, you're immediately
gonna fall flat on your face.

I mean, there's, there is no real
way to know for sure unless you have

something validating the solution
that you're looking to purchase.

All of these protection solutions,
you have to have something

that you're validating because
it's, it's naive to think that.

A company that's pivoted three
times has the, the expertise

in that, in AI specifically.

when they just pivoted three months ago
and they're like claiming that, oh yeah,

they've solved all of these problems.

You, you have to have some way
to go out and test these things.

Yeah.

So, so you're saying offensive
versus defensive, but then again

you run into the same issue.

So we look at the company's,
you know, they have scanners.

They have really good scanners.

They have scanners for a long
time, and now suddenly, you

know, they have AI scanners.

Right.

Uh, so even from offensive side,
it seems like, you know, you

really gotta be careful as to, you
know, which solution you choose.

'cause it's kind of, you know, it
seems like the pivot, as you were

saying in defensive side is, uh, is
happening in the offensive side as well.

Yeah, no, it's absolutely true
for the offensive side also.

And I think.

scanners just end up being noise
makers unless you have underlying

intelligence that is, uh, maybe more
akin to an agent that's evaluating an

actual attack path instead of just, uh,
like brute forcing through some list.

Not to mention, if that's all
you're doing, how many tokens

are you chewing through?

Like, what's that gonna cost you if
you're trying to run this scanner?

I mean, it's not like this is where
web, web apps and LLMs differ greatly

right now, or agents differ greatly.

Right now, if you have just an LLM,
that's one thing you could maybe

brute force that and like, yeah, sure.

Maybe you go through millions of
tokens and it's not that expensive.

Agents are iterating, looping
until they hit that goal.

So if I send that same thousand
attacks to an agent versus an LLM.

Depending on how many times it's looping
based on the request you sent it.

That could be.

And unless, you know, you can
of course put a cap on like

how many iterations it's doing.

Maybe that's five x the total token
usage, maybe a hundred x, Maybe a

thousand x, instead of a million, now
we're talking a trillion tokens and you

just got a $10,000, bill from one scan.

that can be catastrophic.

If you're working on a research budget
and you're trying to push these things

out quickly and your plan is to cut
costs, well, you just blew your costs

up because you hammered this thing
with a, a dumb scanner that just brute

forces through a, a bunch of options
instead of reasons about the responses

coming back and then decides on what
next step it should take in this attack

process to actually identify like what
the, the security posture looks like

for a given, uh, like agentic flow.

Yeah.

So you're talking about like
evals basically, where you send,

um, specific payload and expect,
you know, specific response.

Yeah.

And then you say either
it passed or it did not.

Uh, yep.

Well, evals versus what, what this
would be versus like pen testing in that

case, you know, digging deeper and going
through the entire stack, I suppose.

Um, yeah.

Makes sense.

Um, going back to the SDR again, the
SDR use case, um, you cannot secure

yourself against jailbreaks necessarily.

Um, what, what, what is there
you can do on the backend?

So, so again, you know, defensive
layers, all of that stuff.

Yes.

You, you do as much as
possible on the front.

You obviously do offensive
testing as much as possible.

Deploy guardrails were
needed, things like that.

But now you have this,
you know, SDR AI agent.

Um.

And with SDRs in the past, uh, with the
actual human SDRs, we enforce things like

least privilege and things like that.

Is that something you should be doing as
well for the Ai AI agents in that case?

Absolutely.

Yeah, absolutely.

They should have their own, you
should have, like an agent role or

an SDR agent role should be granular.

Uh, and as you said, you should be doing,
you should be following lease privilege.

You should be greatly limiting access
to any information you're afraid

might get leaked and you should be
monitoring flow of that information.

So with the human SDR, when they
come to the workstation and they

They have to log in, they have to
have some kind of credentials, two

factor authentication, whatever else.

AI agents cannot really
do that kinda stuff.

And they're running 24 7, if you like.

And they're running 24 7.

Exactly.

So, so do you give them API
keys, do you have some kind of

an JWT token authentication?

Do you do fresh authentication for
every single transaction, or, or do

you have like persistent credentials?

How do you manage authentication
with these things?

Yeah, it's a great question.

'cause maybe you have, you might
have a specialized agent, but there

might be 10 of them and maybe they're
doing slightly different tasks.

Like one of them's focusing on
one vertical and the other one's

focusing on another vertical.

So maybe they have slightly
different access, to information or

different, like contacts for example.

in the end, I think in that specific
scenario, I don't think it changes the

risk profile if, if they all shared the
same access to the same database, even

if it's one vertical versus another.

'cause that's just a, a query, right?

You're just kind of filtering out.

Well, maybe, yeah.

Let's move on from SDR example.

In that case, let's say you're like a
payroll provider company or something

like that, and you provide agents that
assist your customers with different, like

payroll questions and things like that.

They have to have access to
your information in that case.

Yeah.

So you have like 10 different agents.

they have access to all
of this information.

I mean that it is probably not
ideal that they have the same

access to all the user information.

Maybe you want to have some
kind of context with that.

They load just your specific information
when you log in You need to be way

more, like if you're talking scaling
up something that's accessing data

across thousands or tens of thousands
of users, I mean, that's, that's

per user session management level
of granularity that you need there.

Like you, you absolutely do not want, uh,
like cross session leakage from one chat

to another chat or one invocation of an
agent to another invocation of an agent.

And that actually, that does
get, that can be a bit tricky.

Uh, especially if you're, you, you need to
be managing sessions even at the inference

level almost, where it's not sharing.

'cause like there's some model providers,
they have internal memory and that memory

might be shared across multiple sessions.

So you need to ensure that either
memory is turned off or you're resetting

the state of that memory per session
instead of, across multiple users.

'cause yeah, it's really easy to, to
accidentally leak even at the inference

call or like at the inference level to
leak, cross user information and you won't

know unless you, try like you have to ask.

There have been accidental
cases of that of course, but

um, yeah, that is a major issue.

Yeah.

So how do you monitor this behavior?

How do you look for the indicators
that an agent is doing something

they're not supposed to be doing?

How do you test for it, period?

if I'm making an API call to a model
provider, I'm expecting them to

follow what I'm telling it to do.

and they have their own
monitoring in place.

So the best you're doing is request
based monitoring, like what's leaving.

And to me that's like deal, that's
back in the DLP realm again.

Right?

Anything leaving your environment
tends to land, data loss prevention

type products or like akin to data
loss prevention, like products.

Um, but if you're like a payroll
provider, in that case, even if

you're using like something like
Anthropic Claude at the front, it's

all your tool calling at the back end.

So it's kind of, you know, up
to you to provide the sandboxing

and, uh, monitoring in that case.

Yeah, DLP 'cause nothing is really
leaking in that case, you're just

displaying stuff in like the browser.

it's more like, you know, the
old school horizontal, vertical

privilege escalation attacks.

Yeah.

it's session abuse.

session leaks, cross session leaks.

Yeah.

It can be crosses leaks as well, but
sometimes the same session can access

stuff it's not supposed to access.

So Absolutely.

I think it's, again, comes
down to offensive testing.

In that case, you need
to pen test these things.

I think.

No, I, I totally agree.

Um, we don't trust, we verify right here.

See?

There you go.

We do not trust, we verify.

we talked a lot about different things.

what advice would you give to somebody
that is building an agent at this point?

what should they watch out for?

What should they implement?

What should they test from the beginning,
not at the end when it's all built.

I would start looking at, uh, guardrails.

I would talk to someone
in the AI security realm.

And if you're plan8ning on
building something, just

start with threat modeling.

start having a conversation about what
are the possible risks for the specific

thing that you're trying to build?

And I, I think that's,
uh, a great first step.

Uh, once you're getting closer to
releasing something, that's when you need

to be validating your assumptions that
you made at the threat modeling phase.

You can plan all you want and you can
try to implement whatever you want, but

until you actually confirm whether or
not, uh, your assumptions were, uh, proven

to be true, then you don't really know.

So that, I think it's that simple.

Indeed.

Uh, it's kind of funny, I saw
something you like a posting on x

uh, looking for different roles.

Uh, some guys building a company
basically that will create super

intelligence, ai, super intelligence,
whatever, whatever that means.

But it's a posting on X.

Uh, it add like, uh, a list of
maybe 20 different roles like

researchers, coders, whatever else.

One thing, I did not have a single
security person, which I found really

interesting, but people are just not,
yeah, we get a bad, we get a bad rap.

We slow everybody down.

At least that's what they,
that's what they claim, right?

Yeah, I guess so.

But then, you know, when, uh, when
you deploy AI agent to manage your,

supply systems and things like
that, and then somebody hacks it.

No, that's not gonna be good.

Let's talk about the future.

obviously these agents will evolve.

Um, we've seen how things
evolved over the last two years.

I mean, stuff is dynamic
changing constantly.

Um, what, what do you think security
landscape is gonna look like

in the next two to three years?

It, it's already kind of happening
where it looks like, consolidation

plays are, are beginning where larger
mainstream security companies are

picking up the smaller, expertise.

I think it's gonna be a consolidation
in platform play over the next,

say like five years, maybe faster.

Like, AI is moving so fast,
there's so much money being

thrown around, it might accelerate
this process of a platform play.

Yeah, so I think you're gonna start
seeing like the all-in-one package

deals for, um, everything from your
AI waft to your AI guardrails, to

your runtime solution, to your MCP.

protection Solutions is just gonna be one
big pack package, one giant dashboard,

and, all of your information's in one
place, and it might be rolled up with

some of your other web application
protection and runtime solutions.

CNAPs, actually, CNAPs are probably
the best position to do these, uh,

like discovery of agents that people
are building across your organization.

And then, like agents deployed in your
cloud environments and, they're already

installed in places that make sense to
have an MCP proxy like detection, runtime

like protection, misconfigurations
being identified, like if you're leaking

API keys in your config at runtime
for, your agents and things like that.

Uh, I, they have identity type solutions.

So I could see the big, the
big CNAP and tight players.

moving this way very quickly
if they haven't already.

I haven't really paid close
attention to that market.

At what point do you think, well,
dashboards all of that is all good,

but the question is, will we need
dashboards at all once we get to

the AI versus AI kind of scenarios?

How, how soon do you think that is coming?

that is an interesting topic I have seen
a number of startups kick off this, uh,

it's basically like a soc as a service,
an a SOC agent as a service type solution

where you're completely hands off,
maybe you have someone in DevOps that's

slightly more security adjacent, deploying
this thing, and then, um, you're,

you're switching who's getting alerts.

You might be able to
shrink your security team.

Or maybe you're just sending alerts
directly to developers and they're

being pre triaged, pre investigated.

It might even push a PR
with like potential fixes.

For this bug that was found.

Maybe if the trust level gets there,
we might even get to a point with, um,

continuous delivery like CICD where
security ties into CICD to a point where

not only did it identify and triage, it's
actually just, uh, committing a p the

committing and merging and releasing and
doing canary testing and then finalizing,

like cutting over fully to this new fix
for whatever vulnerability that's found.

The entire process end to end could
be completely automated eventually.

So we had DevOps and we had DevSecOps
and now, that's gonna be DecSecAIOps?

It might just be, uh, yeah,
I guess like dev ai ops.

Dev, AI ops.

Okay.

I don't know if that exists, right?

If that doesn't exist.

That sounds like a new
role that should exist.

Right?

Well, like, you know, prompt engineering,
security, pro security, prompt engineering

needs to be there as well, I guess.

I can see that.

last question.

if you could wave a magic wand and
solve like one issue in the aagentic

AI security, what would it be and why?

It would be, context and memory.

I believe that's where a lot of the
issues of, hallucinations, which all,

all LLMs hallucinate like hallucinations
are kind of a feature in my mind.

Like that's, that's how it's
doing the inference in the

first place to a certain extent.

But the biggest problems I run
into when I'm using it, like if I'm

writing code or something, is it's
not aware of the full solution.

Like it can't hold all of that in its
context window in one at one time.

So therefore it's missing something.

Creating duplicate code,
like a ton of duplicate code.

I think that's the biggest, if I had a
magic wand, that would be the problem

to solve is how memory is managed
and how memory is used in inference.

Yeah, that makes sense.

Uh, spoke to a potential customer a
couple of days ago and, and the entire

use case was like, uh, our agent flow
is used about a million tokens plus.

So how do you fit that into the
context of like security tool as well?

It's, uh, yeah, that would
be a nice problem to solve.

We'll see if somebody
does it in the next year.

Yeah.

I just saw Nvidia like three
days ago at CES announced.

They're trying to tackle
it at the hardware level.

Yeah, yeah, yeah.

We'll see how it goes.

Well, thank you very much.

Uh, good conversation.

Let's do it again sometime.

Yep, absolutely.