Subscribe
Share
Share
Embed
Switching gears, we focus on how Yannick’s been training his model using reinforcement learning. He explains the differences from David’s supervised learning approach. We find out how his system performs against a player that makes random tic-tac-toe moves.
A writer and a software engineer from Google's People + AI Research team explore the human choices that shape machine learning systems by building competing tic-tac-toe agents.