Subscribe
Copied to clipboard
Share
Share
Copied to clipboard
Embed
Copied to clipboard
Practical AI
Trailer
Bonus
Episode 290
Season 1
Towards high-quality (maybe synthetic) datasets
As Argilla puts it: “Data quality is what makes or breaks AI.” However, what exactly does this mean and how can AI team probably collaborate with domain experts towards improved data quality? David Berenstein & Ben Burtenshaw, who are building Argilla & Distilabel at Hugging Face, join us to dig into these topics along with synthetic data generation & AI-generated labeling / feedback.
Changelog++ members save 11 minutes on this episode because they made the ads disappear. Join today!
Sponsors:
- Fly.io – The home of Changelog.com — Deploy your apps close to your users — global Anycast load-balancing, zero-configuration private networking, hardware isolation, and instant WireGuard VPN connections. Push-button deployments that scale to thousands of instances. Check out the speedrun to get started in minutes.
- WorkOS – A platform that gives developers a set of building blocks for quickly adding enterprise-ready features to their application. Add Single Sign-On (Okta, Azure, Google, Microsoft OAuth), sync users from any SCIM directory, HRIS integration, audit trails (SIEM), free magic link sign-in. WorkOS is designed for developers and offers a single, elegant interface that abstracts dozens of enterprise integrations. Learn more and get started at WorkOS.com
- Eight Sleep – Take your sleep and recovery to the next level. Go to eightsleep.com/PRACTICALAI and use the code PRACTICALAI to get $350 off your very own Pod 4 Ultra. You can try it for free for 30 days - but we’re confident you will not want to return it. Once you experience AI-optimized sleep, you’ll wonder how you ever slept without it. Currently shipping to: United States, Canada, United Kingdom, Europe, and Australia.
Featuring:
- Ben Burtenshaw – GitHub, LinkedIn, X
- David Berenstein – GitHub, LinkedIn, X
- Chris Benson – Website, GitHub, LinkedIn, X
- Daniel Whitenack – Website, GitHub, X
Show Notes:
Something missing or broken? PRs welcome!
★ Support this podcast ★
Chapters
- Welcome to Practical AI
- Sponsor: Fly
- What does data collaboration mean?
- Understanding your data
- How to start curating data
- Practical steps to scale
- Sponsor: WorkOS
- Traditional & new usecases
- Virtues of smaller models
- What Argilla looks like
- User backgrounds
- The non-technical POV
- Sponsor: Eight Sleep
- AI feedback
- Hallucination issues
- What is Distilabel
- Usage & adoption
- Where things are going
- This is muy bueno
- Outro