TalkRL: The Reinforcement Learning Podcast

Posters and Hallway episodes are short interviews and poster summaries.  Recorded at RLC 2024 in Amherst MA.  

Featuring:  

Creators & Guests

Host
Robin Ranjit Singh Chauhan
🌱 Head of Eng @AgFunder 🧠 AI:Reinforcement Learning/ML/DL/NLP🎙️Host @TalkRLPodcast 💳 ex-@Microsoft ecomm PgmMgr 🤖 @UWaterloo CompEng 🇨🇦 🇮🇳

What is TalkRL: The Reinforcement Learning Podcast?

TalkRL podcast is All Reinforcement Learning, All the Time.
In-depth interviews with brilliant people at the forefront of RL research and practice.
Guests from places like MILA, OpenAI, MIT, DeepMind, Berkeley, Amii, Oxford, Google Research, Brown, Waymo, Caltech, and Vector Institute.
Hosted by Robin Ranjit Singh Chauhan.

Speaker 1:

My name is An Huang. My work is about learning dynamics and geometry or recurring neurocontrollers, and this is, work done, at Harvard. And so here, we're really opening to a black box of DPRL agents and trying to apply dynamical, dynamical system theory to the analysis of DPRL agents. And, we're looking to the network's learning dynamics at 3 different levels of analysis. The first one is the policy level.

Speaker 1:

Like, how does the policy landscape evolve over time? And the second level is the network's, representation or dynamics. We're looking to how the internal agent change over time. And by applying, fixed point analysis from dynamical system, we're trying to see what's the, like, the equilibrium state of the network. And, is that where and also, like, why there is a equilibrium birth state is where the policy is, approaching, like pushing the agent toward.

Speaker 1:

And we're also analyzing the network's ways. And because, if you look into the eigenspectral Sure. Of the recurrent ways, our recurrent neural network, you can actually get, how many times that this is integrating information from its path. So this is really, giving some insight about the memory usage of the network in, an RL task. So I think, like, in short, we're trying to apply dynamical system theory to, deep RL agent at 3 different level at different levels of analysis and trying to establish a more coherent, more comprehensive understanding of the learning dynamics of our agents.

Speaker 2:

I'm Jannis Plumeld from the T. O. Darmstadt, and my work is HEC Atari, Atari Learning Environments for Robust and Continuous Reinforcement Learning. And the idea is basically, you know that Atari is the most used framework for reinforcement learning. In many cases, it's used since 2000 I don't know.

Speaker 2:

Since ever. And, what we want to do is to add to the entire environment, to the family of Atari games, because we think that Atari can be much more. In our work, we create new Atari environments or variation of existing games by manipulating the RAM so that we can create test environments. So you can test your robustness, your generalization, or even test if your agent falls apart by slight changes in the game, for example, a color change. So by adding these new variations, these modifications, we make, basically, infinite environments possible by parameterize the actual Atari environment so that you can create exactly the environment or the position or state you want to wanna have in your training.

Speaker 2:

For this, we have multiple ways. We can change colors, dynamics. We enable curriculum learning, and we also enable to change the reward function of the so that you can set new games. For example, there is a game where you have to shoot enemies, and we punish now shooting because we want to have a peaceful agent to play the game. Not getting really much points, but that's not the case because it gets reward from us for not being killer, which is quite quite interesting.

Speaker 2:

And it's less about reinforcement learning. It's more about the set of environments we provide, but you can use it in any way you like. And that's the main selling point in our opinion.

Speaker 3:

My name is Benjamin Furer. I'm affiliated with NVIDIA. And, here we're introducing gradient boosting reinforcement learning, which is a new gradient boosting phase library, dedicated for reinforcement learning. In our work, we show that in certain, environments and certain domains, we can outperform or perform just as well as neural networks. And, we believe that by using gradient boosting trees in reinforcement learning, we can improve interpretability.

Speaker 3:

We can perform better in environments with categorical or structured presentations, and we can lower the computational burden, for example, by training,

Speaker 4:

on edge devices. I'm here with Paul Fester from Imperial College London, and he's presenting his work, How Vulnerable Are Doctors to Unsafe Hallucinatory AI Suggestions, a framework for evaluation of safety in clinical human AI cooperation. Paul, can you

Speaker 5:

tell me a bit about this? Absolutely. So in my lab, we work about work on the deployment of reinforcement learning systems for clinical decision support in hospitals. And I think there's a long way to go up until these systems automatically give treatments to patients. So we've had a lot of interest into the collaborative mechanics of human AI decision making.

Speaker 5:

The doctor assesses a patient, has an AI on the side, how do they work together to take decisions. In this poster, we present results of us bringing doctors in a physical simulation center, so an actual hospital room, interactive with an AI system, and us following this interaction through eye tracking. And what we have seen, in a nutshell, is that doctors are really good at spotting recommendations that are outside of what you would expect. We sometimes tweaked the eye recommendations for something completely unreasonable. They were great at noticing this.

Speaker 5:

However, we also provided them with explainability around these recommendations and thought that doctors would use these explainable sources of information to understand why the recommendation looks odd. What we realized after running the study, and this this was absolutely not the case, while doctors looked more at the recommendation itself to notice that it's unreasonable, they did not try to debug it. There can be many reasons around this, and we discuss it in our paper published in MPJ Digital Medicine in further details.