In this episode of Crazy Wisdom, I, Stewart Alsop, sit down with Naman Mishra, CTO of Repello AI, to unpack the real-world security risks behind deploying large language models. We talk about layered vulnerabilities—from the model, infrastructure, and application layers—to attack vectors like prompt injection, indirect prompt injection through agents, and even how a simple email summarizer could be exploited to trigger a reverse shell. Naman shares stories like the accidental leak of a Windows activation key via an LLM and explains why red teaming isn’t just a checkbox, but a continuous mindset. If you want to learn more about his work, check out Repello's website at
repello.ai.
Check out this GPT we trained on the conversation!
Timestamps
00:00 - Stewart Alsop introduces Naman Mishra, CTO of Repel AI. They frame the episode around
AI security, contrasting
prompt injection risks with traditional
cybersecurity in ML apps.
05:00 - Naman explains the
layered security model:
model,
infrastructure, and
application layers. He distinguishes
safety (bias, hallucination) from
security (unauthorized access, data leaks).
10:00 - Focus on the
application layer, especially in
finance,
healthcare, and
legal. Naman shares how ChatGPT leaked a
Windows activation key and stresses
data minimization and
security-by-design.
15:00 - They discuss
red teaming, how Repel AI simulates attacks, and Anthropic’s
HackerOne challenge. Naman shares how adversarial testing strengthens LLM
guardrails.
20:00 - Conversation shifts to
AI agents and
autonomy. Naman explains
indirect prompt injection via email or calendar, leading to real exploits like
reverse shells—all triggered by summarizing an email.
25:00 - Stewart compares the Internet to a castle without doors. Naman explains the
cat-and-mouse game of security—attackers need one flaw; defenders must lock every door.
LLM insecurity lowers the barrier for attackers.
30:00 - They explore
input/output filtering,
role-based access control, and
clean fine-tuning. Naman admits most
guardrails can be broken and only block
low-hanging fruit.
35:00 - They cover
denial-of-wallet attacks—LLMs exploited to run up massive token costs. Naman critiques
DeepSeek’s weak
alignment and
state bias, noting
training data risks.
40:00 - Naman breaks down India’s AI scene:
Bangalore as a hub,
US-India GTM, and the debate between
sovereignty vs. pragmatism. He leans toward India building
foundational models.
45:00 - Closing thoughts on India’s AI future. Naman mentions
Sarvam AI,
Krutrim, and
Paris Chopra’s Loss Funk. He urges devs to
red team before shipping—"close the doors before enemies walk in."
Key Insights
- AI security requires a layered approach. Naman emphasizes that GenAI applications have vulnerabilities across three primary layers: the model layer, infrastructure layer, and application layer. It's not enough to patch up just one—true security-by-design means thinking holistically about how these layers interact and where they can be exploited.
- Prompt injection is more dangerous than it sounds. Direct prompt injection is already risky, but indirect prompt injection—where an attacker hides malicious instructions in content that the model will process later, like an email or webpage—poses an even more insidious threat. Naman compares it to smuggling weapons past the castle gates by hiding them in the food.
- Red teaming should be continuous, not a one-off. One of the critical mistakes teams make is treating red teaming like a compliance checkbox. Naman argues that red teaming should be embedded into the development lifecycle, constantly testing edge cases and probing for failure modes, especially as models evolve or interact with new data sources.
- LLMs can unintentionally leak sensitive data. In one real-world case, a language model fine-tuned on internal documentation ended up leaking a Windows activation key when asked a completely unrelated question. This illustrates how even seemingly benign outputs can compromise system integrity when training data isn’t properly scoped or sanitized.
- Denial-of-wallet is an emerging threat vector. Unlike traditional denial-of-service attacks, LLMs are vulnerable to economic attacks where a bad actor can force the system to perform expensive computations, draining API credits or infrastructure budgets. This kind of vulnerability is particularly dangerous in scalable GenAI deployments with limited cost monitoring.
- Agents amplify security risks. While autonomous agents offer exciting capabilities, they also open the door to complex, compounded vulnerabilities. When agents start reading web content or calling tools on their own, indirect prompt injection can escalate into real-world consequences—like issuing financial transactions or triggering scripts—without human review.
- The Indian AI ecosystem needs to balance speed with sovereignty. Naman reflects on the Indian and global context, warning against simply importing models and infrastructure from abroad without understanding the security implications. There’s a need for sovereign control over critical layers of AI systems—not just for innovation’s sake, but for national resilience in an increasingly AI-mediated world.
What is Crazy Wisdom?
In his series "Crazy Wisdom," Stewart Alsop explores cutting-edge topics, particularly in the realm of technology, such as Urbit and artificial intelligence. Alsop embarks on a quest for meaning, engaging with others to expand his own understanding of reality and that of his audience. The topics covered in "Crazy Wisdom" are diverse, ranging from emerging technologies to spirituality, philosophy, and general life experiences. Alsop's unique approach aims to make connections between seemingly unrelated subjects, tying together ideas in unconventional ways.