Building production AI systems is hard — especially when you're pioneering entirely new categories. In this episode, Yuval speaks with Guy Becker, Group Product Manager at AI21, to trace the evolution from task-specific models to Agent planning and orchestration systems. Guy shares hard-won lessons from building some of the first RAG-as-a-service offerings when there were literally zero handbooks to follow.
Key Topics:
- Task-specific models vs. general LLMs: Why focused, smaller models with pre and post-processing beat general purpose LLMs for business use cases.
- Building RAG before it was cool: Creating one of the first RAG-as-a-service platforms in early 2023 without any established patterns.
- The one-size-fits-all problem: Why chunking strategies, embedding models, and retrieval parameters need customization per use case.
- From SaaS to on-prem: Scaling deployment models for enterprise customers with sensitive data.
- When RAG breaks down: Multi-hop queries, metadata filtering, and why semantic search isn't always enough.
- Multi-agent orchestration: How AI21 Maestro uses automated planning to break complex queries into parallelizable subtasks.
- Production lessons: Evaluation strategies, quality guarantees, and building explainable AI systems for enterprise..
What is YAAP (Yet Another AI Podcast)?
YAAP brings you practical conversations with the people actually building generative AI solutions. No hype, no sales pitches, just honest discussions about challenges, solutions, and lessons learned.
Listen to developers and engineers share what works, what doesn't, and what they wish they'd known sooner. Simple, useful insights for anyone working with AI — hosted by AI21's Yuval Belfer.