TalkRL: The Reinforcement Learning Podcast

Dr. Vincent Moens is an Applied Machine Learning Research Scientist at Meta, and an author of TorchRL and TensorDict in pytorch.

Featured References

TorchRL: A data-driven decision-making library for PyTorch
Albert Bou, Matteo Bettini, Sebastian Dittert, Vikash Kumar, Shagun Sodhani, Xiaomeng Yang, Gianni De Fabritiis, Vincent Moens

Additional References

Creators & Guests

Host

Robin Ranjit Singh Chauhan

🌱 Head of Eng @AgFunder 🧠 AI:Reinforcement Learning/ML/DL/NLP🎙️Host @TalkRLPodcast 💳 ex-@Microsoft ecomm PgmMgr 🤖 @UWaterloo CompEng 🇨🇦 🇮🇳

What is TalkRL: The Reinforcement Learning Podcast?

TalkRL podcast is All Reinforcement Learning, All the Time.
In-depth interviews with brilliant people at the forefront of RL research and practice.
Guests from places like MILA, OpenAI, MIT, DeepMind, Berkeley, Amii, Oxford, Google Research, Brown, Waymo, Caltech, and Vector Institute.
Hosted by Robin Ranjit Singh Chauhan.

Robin: 00:05

TalkRL Podcast is all reinforcement learning all the time, featuring brilliant guests, both research and applied. Join the conversation on Twitter at talkRL podcast. I'm your host, Robin Chauhan. Today, we're joined by doctor Vincent Moens, an applied machine learning research scientist at Meta. And Vincent is an author of TorchRL and TensorDict in PyTorch.

Robin: 00:33

Thank you for being here, Vincent.

Vincent: 00:34

Thanks for having me.

Robin: 00:35

Mostly we're gonna talk about TorchRL today. Can you tell us what is TorchRL and what is the goal of TorchRL?

Vincent: 00:41

TorchRL is, an an initiative from, PyTorch. So we started this project a couple of years ago. At the time, PyTorch wanted to have more domain libraries. You know, you you probably know Torchvision, which is the the most famous one, but there are others like, you know, Torch Rec for recommendation systems and and other things. So we had a couple of projects that started at that time, and one of them was Torcharel.

Vincent: 01:06

So we we wanted basically just to support better the reinforcement learning community. The idea being that RL usage of, a framework like PyTorch is quite specific with specific needs. And and so we wanted to have a dedicated tool for that. So what we did is that we we started by exploring a little bit what people were asking for. And so when I started on this project, I was looking around at various documents that were put, some of them by my colleagues, but also some of them by by people from the open source community.

Vincent: 01:46

And I reached out to people from the open source community to see what they thought was lacking. And it it was kind of a fun thing because it was, as you can imagine, going in several directions. Some people were saying, what we need is a library with, low level functionals. Things like value functions and these kind of things that that is efficient, reliable, properly tested, but also not too high level because at the end of the day, RL people, they like to record every everything from scratch more or less and do their own, stack of of primitives. And other people were like, oh, well, no.

Vincent: 02:22

What we need is something that is actually high level that has, very nice features like distributed training or distributed repeat buffers and all these kind of things that can be hard to do yourself, but are already quite intricate. And so I started looking at all of these things and trying to shape what the library was going to be based on that. So, yeah, that's basically how the project started. And after I looked at that, maybe we can dig into that later, but we started to think about, okay, okay, how can we, if possible, make everyone happy with a single library? And that's what we're trying to do, basically.

Vincent: 03:06

So to help the AIRE community that is using PyTorch as much as we can.

Robin: 03:13

So when when people talk about RL, I mean, the scope of of of rl is really vast and growing every day and has so many unusual corners to it. So I gather, TorchRL has kind of certain things that are are its focus. Can you talk about what types of areas are in scope and maybe not in scope, in terms of different types of RL?

Vincent: 03:36

Yeah. Yeah. Definitely. So, first, you're totally right that RL is very heterogeneous, and and it's very hard to make everyone happy with, with a single library, as you probably know. And, the other thing is there is RL, but there is everything that is RL ish.

Vincent: 03:55

You know, things like imitation learning or other types of decision making algorithms that have to do with control and these kind of things. And so I kind of always wrap everything inside the RL term, but it can be quite confusing because sometimes you interact with people and they're like, oh, yeah, this is not properly speaking RL. And what I would like TorchRao to do is basically to be more like the decision making library for Python rather than an RL. But the thing is that if you call that decision making, you will always have, like, people that are a bit confused and say, oh, well, I I did not want that. I wanted the library to do RL.

Vincent: 04:32

So we thought we thought, okay, let's go for RL even though it might be only, like, 70% of the community that really talks about RL. But, like, I think we cover other things. So what does Torchlight cover and not cover, to answer your question? It, so the philosophy is really that we want to to provide people with the basic building blocks to do a lot of things. But we don't want to provide an API that is too advanced, and and too high level such that you can, in two lines of code, like, create a PPO agent and train it on your environment.

Vincent: 05:12

That's not the scope of Torch Shadow. You know, we we we I think there are great libraries that already do that. You know, you can, in a very simple script, say PPO agent, you just put your environment in it, and it's gonna it's gonna train that, you know, with some default hyperparameters. And it's really great and it works really well. So things like stable baseline do that really, really well.

Vincent: 05:29

So we absolutely don't want to replace that. The philosophy is much closer to what you would see in something like Torchvision. You know? If you look at Torchvision, it's actually a good example because, a lot of the inspiration of how Torch. Io was built at the beginning comes from actually people working on Torchvision, which are actually my close colleagues.

Vincent: 05:51

So, the idea is to have, you know, some reusable components that could do can easily plug into your algorithm in order for you not to have to to code everything from scratch, like a replay buffer. You know? So in order for you not to recode the replay buffer for the thousandth time and make sure that you did not forget a tiny detail somewhere, we provide that to you. You know? And the way I see TorchRL, is not really as something that should replace other libraries like Rlib or stable baselines or Perl or Chantoo.

Vincent: 06:28

You know, I think that all of these libraries have something really great that they can do really, really well. And in the case of Torchile, what we want to do is to say, okay, guys, maybe there is a primitive within your code that you don't want to maintain because you think that it's kind of orthogonal to what the goal of your library is. And so we can we can take that upon us. You know, like, this this primitive, we we can maintain that for you to enable you to do the best that you that you can within the field that you're working in. So in this sense, I I would see our work with our child more, more as a service provider, if you see what I mean.

Vincent: 07:12

Like PyTorch is basically, you know. No one is ever gonna say, oh, you know, I'm competing with PyTorch or something if you're using PyTorch. You know, you're using PyTorch as a service provider to build your library on top of that. And I would like Torch. IoL to basically be the same thing, you know, help people to build their library on top of it.

Robin: 07:38

Can you tell us about the, the road map for for TorchRL, like, over the next couple years and and maybe also a longer term vision?

Vincent: 07:46

Sure. So, maybe first so can can can you pause a second?

Robin: 07:55

Yeah.

Vincent: 07:56

Yeah. Sorry about that. So I I wanted, maybe before talking the road map, I I could talk a little bit more about the philosophy of the library, you know, because I I I I see that you have questions about TensorDict. But I think that if I can place TensorDict now and talk about how what the philosophy of the library is, what is the problem that we're trying to solve. And then once I've done that, it's gonna be easier to talk about the road map for the future.

Robin: 08:21

Yeah. Sure. So, do you wanna say more about the the philosophy behind the the the project?

Vincent: 08:27

Yes. Yeah. So, the the the philosophy behind Torcharel is basically that when we started working on that, we we we saw that there was, an essential issue when you're building a diary library, which is that the various components that you have will have very different signatures depending in the context where you are using them. And to to see that, you can think, for instance, of, a policy class. You know?

Vincent: 09:02

So a policy, the very simplest way you can see that is something that takes as input an observation and outputs one action. Right? But then you have other context like Multi Agent. And in Multi Agent, you have a policy that takes, for instance, one observation or a set of observations and outputs a set of actions. And those actions may or may not be contiguous.

Vincent: 09:21

You know, you might have different tensors that represent different different actions for different agents. But you might also have a policy that does that, and on top of that, provides you with the probability that you have taken, these actions. You know? Something like in PPO or something like that. So, basically, the the problem you're looking at really conditions what your class is gonna do.

Vincent: 09:42

But still, at the end of the day of the day, it's a policy. You know, it's something that takes an action. And same goes for an environment. You know, an environment does not have a precise set of of outputs, etcetera etcetera. And so, that that is a huge contrast with other libraries that we have in PyTorch.

Vincent: 10:05

You know, if you think about, Torchvision, I think their job is much easier because they know that every component they're gonna look at is gonna read an image as input and output some transformation of that image. You know? Whether it is the prediction of a label or something like that. So in our case, we can't really do that. You know, RL is about the algorithm rather than the the the data type, to it very simply.

Vincent: 10:30

So that that seemed like the biggest challenge that we had when building the library. Because if you want to have to have reusable components that you can easily plug and play, you know, you can't really do that if you don't know in advance what the signature is gonna be because that won't fit in. You know? So what we thought is, I think as many other people do, okay. Let's build an abstract class that is our data carrier in the entire sort of ecosystem we're building in the library.

Vincent: 11:01

You know? And so the only thing that people, who want to use the library need to buy is this data carrier. And I think that it's not something that is absolutely new. You know, other people have done the same. For instance, Tianqiu has something called batch.

Vincent: 11:17

You know, this batch class is basically, solving that problem. But if you look at other libraries, they they will have a similar solution. And our solution is called TensorDict. And so TensorDict is basically, as the name says, a dictionary of tensors, which has some extra features from tensors. And so in Torchile, most of the time you will see that classes talk to each other through TensorDict.

Vincent: 11:42

You know, they will write data in a TensorDict and read data from a TensorDict. And by using this abstraction, we are capable basically of writing abstract, you know, data collectors, to which you can provide a policy in an environment, and they know exactly what to do. Because the only thing that they need to do is to pass a TensorDict to the policy, receive a new TensorDict, send it to the environment, call environment dot step with that TensorDict, and receive a new TensorDict out of that. And so basically, by exchanging TensorDict all the way through, you don't really need to care about what your policies are putting, etcetera, etcetera. And and that has a lot no a lot of nice features, like for instance, moving a TensorDict from, device to device.

Vincent: 12:25

You just need to call TensorDict dot CUDA. If you want to, serialize it on disk, it's very easy to easy to so you have a lot of things like that that that you can do with it. So that was like the the the first building block for the library. And once we had that, then we're like, okay. Now we can really build the library with as many components as we want because we know that for most of the things we want to do, TensorFlow.

Vincent: 12:50

It is gonna be like the the right solution. And it proved to be very, very effective for things like multi agent, for instance, you know, because it's very easy to organize your data even if it's very nested. You know, you might have things like group of agents and then individual other individual agents within those groups, and each of these agents is taking an action or receiving a certain observation or these kind of things. And it's very easy to represent that this nested structure through TensorDict because TensorDict supports nesting very in a very intuitive way, I would say. And another thing that TensorFlowDeck allows us to do, and maybe we can chat a little bit about that later, is also on the parameter side to represent parameters of models to do functional calls and vectorize map and things like that.

Vincent: 13:36

So basically, we we're using TensorFlow extensively in the library, and that's really, I think, like the the one thing we're usually putting at the forefront as, the one problem we're trying to solve.

Robin: 13:51

And do I understand correctly that that TensorDict is, is part of PyTorch, but it doesn't depend on the rest of, TorchRL, and it might be useful, as a general deep learning primitive?

Vincent: 14:03

That's that's exactly true. So the story is that at the beginning, TensorFlow used to be part of Torcharell. And, our early customers came to us and say, well, this is really cool, but, you know, RL is only half of my job. And during the other half, I would like to use TensorDict too. So why should I install TorchRL if I'm not doing RL stuff?

Vincent: 14:27

And so we we started, you know, we we we made this other library called TensorDict. So if you go on github.com/piter/tensodict, you will find TensorDict. You can also just install TensorDict. And, and yeah. So what we're seeing right now is that actually, we're making 2 observations with TensorDict.

Vincent: 14:46

The the first one is that it's being used by people outside of the RL realm, for things like diffusion models or, fine tuning or data loading. You know, like, there there are many usages of Tensorlink that go way beyond RL. But the other observation I'm making is that there are people that are using TensorDict for reinforcement learning without using TorchRL, which is kind of interesting because it it really feels like it's a primitive that is more essential to RL than anything that TorchRL can really provide. So I'm I'm quite glad with that thing. You know, I I think we we nailed it right.

Vincent: 15:26

You know, it's, it's it's a new way of coding in PyTorch anyway because, you know, you you have to get used to representing things as packs of tensors rather than individual individual tensors. And I know that many people don't like to work with dictionaries because you don't really know what's hidden inside, and that and that's a that's a very valid thing. But yeah. So it it there's a little bit of, like, mental gymnastics to do to to to record things, but usually, you will find that your code runs smoother and is also clearer to read if you're than than not, at least in my opinion.

Robin: 16:11

I haven't used them yet, but I think it sounds great. I've wanted something like this, before, and it seems like you say in in multi agent cases or in cases where you have a agent that has like a a multimodal inputs and you wanna just keeping everything organized and yeah. It makes so much sense. So I I guess, if it I would imagine that, that it's performant. Is that right?

Robin: 16:34

A lot more performant in in space and and time than than a naive implementation would be.

Vincent: 16:40

Yeah. So we we took care of, making sure that a bunch of things were more efficient than what most people would do at first. So I can give you a bunch of examples. One example of that is, for instance, if you want to represent, replay buffer on disk. Right?

Vincent: 17:04

So what you can do with denseodict is that you just take a dictionary with, the tensors of 1, step in your environment. You know? So you have the observation, your next observation, the action, the don't stay, the reward, all these kind of things in your dictionary. Okay? And then you just call that object.

Vincent: 17:24

Let's call it data. You call data dot expand, and then you can say, how big you want that to be. So for instance, 1,000,000. You know? So that's a no copy.

Vincent: 17:33

I know there is no, new tensor that is being created because in PyTorch, if you call expand, the only thing that is gonna happen is that you're gonna represent, you know, that thing as 1,000,000 copies of the same thing, you know, in memory. And then you can call dot mem map, and what that thing is gonna do is that it's gonna put that on a memory map, set on a set of memory mapped tensors on disk. And what that means is basically that you're gonna now have those very big Tensor stored either on file system or on disk that can that you can index very, very quickly because the idea of memory map is that if you want to read a simple row of that dataset, you're not gonna load the whole thing into memory. You're just gonna focus on that row even though the storage is contiguous. So you can do that.

Vincent: 18:23

It's very quick, very easy, and then you can do that index indexing very easily and also writing very easily just by saying, okay. I take my data, and then I say, I I index it with the indices that, that I want to write, you know, like, for instance, 0123, and it's gonna dispatch that indexing operation to all the tensors stored within the dictionary and write whatever you want to write on those places for you without need without needing you to go through those tensors one at a time and rewriting the data. So it's gonna do everything for you, you know, just from the outside, so to say. So it's easy. It's efficient.

Vincent: 19:08

And for other things like, for instance, again serialization on disk, it's pretty fast because we are using multithreading. So you you are gonna save those tensors very, very quickly on disk. And for other operations, we we make sure that things are coded in a way that does not introduce overhead, and that is actually as fast as what you would have done yourself, if not faster. So let me think of, another example of something that is that is pretty fast. Yeah.

Vincent: 19:39

For instance, functional calls are pretty fast with Tensor Lake and usually faster slightly faster than what you would have with, torch dot dot func dot functional call, which is like the official API to do functional calls with PyTorch. So we have a we have a bunch of things where where performance is really there. And if if you're not using TensorFlow for performance, you can definitely use it for ease ease of use. You know, I think for many, many things, it's much easier. What many people tell me is things like, you know, reshaping all the tensors that you have in a batch or stacking all the tensors that you have in a batch.

Vincent: 20:15

All these kind of things where you might use something like PyTree, but PyTree is not very intuitive, you can use tensor date. And the other thing that, if you want to compare, TensorDate to PyTree so PyTree has a version in PyTorch, and it also has a version in JAX. The thing with PyTree is that, as I said, it's not very intuitive, but the other thing with PyTree is that it does not carry metadata. So imagine that I have a set of tensors that are on CPU, and I want to send them all on CUDA. I could do something like pytree.treemap Lambda x x.

Vincent: 20:51

Cuda, and it's gonna dispatch that operation to all the tensors, you know, that I have in my data structure. So far, so good. But now if I want to know, are all the tensors in that batch on CUDA or on CPU? The only way I can do that is iterate through each and every single tensor and query the device. With TensorDict, it's not gonna be the case.

Vincent: 21:13

If you with TensorDict, you call you can call tensorDict dot CUDA. It's gonna send everything on CUDA. But then if you call tensorDict dot is CUDA, it's gonna tell you yes or no. You know? Like, where your data is, basically.

Vincent: 21:25

So because Tensordic has those metadata, it makes it even more compelling, you know, because you can keep track of the operations that you have done in the past. It also has a shape. It also has, dimension names and other things, you know, that really help you to handle that thing in the best way possible.

Robin: 21:44

That makes sense. Sometimes the hardest thing is just keeping track of what's going on with all these sensors flying around.

Vincent: 21:51

I'm sure

Robin: 21:51

it makes debug ability way, way easier, which I think at the end of the day, sometimes that's the most important factor. Can you can you talk about, any applications, that you've seen on on TorchRL so far?

Vincent: 22:05

Yeah. Sure. So we we have a community that is quite active. We have a Discord channel which, basically helps us to talk with the various people that are using the library. And also for for me, it's quite interesting, you know, to see the the the various applications of of the thing.

Vincent: 22:25

So let me give you a a bunch of examples. I know there are people that are doing combinatorial optimization with, with Torcharel, other people that are piloting drones. We've just submitted a workshop with, other, authors of the library about drug discovery, you know, using, RLHF with TorchRL. So, yeah, the the the applications are basically the breadth of of RL. You know, RL can do basically everything and nothing, and and it's the same with Torcharel.

Robin: 23:02

And I see that, Torcharel has, some moral support at this point, and I I I I see from your Google Scholar history that you have a background in in multi agent, RL as well. So can can you talk about the multi agent aspect here? How how what what do we have so far in TorchRL?

Vincent: 23:21

Sure. So, to to be clear, I don't really have a background in multi agent. I have worked several times with people that that do. So I had the chance last year to have a a marvelous intern working with me on TorShavel called Matteo. And Matteo did, an amazing job in basically building the the the the basic features that you need to do to do multi agent with Torchlight.

Vincent: 23:46

And so right now, we support a lot of stuff, like homogeneous agents, but also heterogeneous agents. We have a lot of features, like for instance, we have multiagent, MLPs and CNNs. We're building multiagent, RNNs at the moment with, with another collaborator. And so basically right now, in the library, we also have wrappers for environments like Petting Zoo or Smack or other things. The repair buffers are generic enough generic enough so that they can support multi agent as single agent.

Vincent: 24:23

Same thing for the data collector. So basically, the whole library, we made sure that it was really compatible with Multi Agent at every single stage. And I think it's not something that is easy to come about, you know, because a very basic example is that usually you just have one done state. Right? So you you run your environment and you will have a single boolean that says whether your environment is done or not.

Vincent: 24:50

But in multi agent settings, it's not true anymore. You might have one agent that is done and the other that is not. And so you need to have some custom way of dealing with that. And so because we took that into account early enough when building the environment class that we have, our environments are robust to that. You know, they really tailor to have more than one action and more than one done state, and everything is blend in as perfectly as it could.

Vincent: 25:17

And I think, like, it it doesn't feel like you're doing something wrong, you know, when you're using the environment in, in multi agent settings, which I think is nice. So, yeah, so we we did that, and we have a bunch of benchmarks that, are mostly written by Matteo, actually. And and yeah. So, the other thing that Matteo did is that he wrote another library called Benchmar that you can find under Facebook research. So it's github.com/facebookresearch/benchmar.

Vincent: 25:47

And in Benchmar, you will have, as the name says, a lot of benchmarks in benchmarks in, Multi Agent. And he took the time of rewriting algorithm by algorithm those benchmarks from Multi Agent, making sure that he was reproducing the results from the papers. And everything is based on Torcharel and has flexible configuration files that you can easily, you know, pull and modify. So I think it's a it's a really great work that, you know, really sets the tone, you know. And when you you you you can start using that, and you can basically benchmark your new algorithm against existing one in a very, very easy way.

Vincent: 26:32

So, yeah, kudos to to to Mateo for for doing that work.

Robin: 26:38

K. We'll have a link to that in the show notes as well. And, but so so what about, cases that are maybe a little more exotic like a model based RL or in, you know, inverse, inverse RL or different things like that. Is that are those things that you would wanna see in the library in future? Are you looking for contributions in those areas?

Vincent: 26:59

Sure. Yeah. So the so first, regarding contributions, we're always looking for contributions. Mine my main job is maintaining Torcharel, but it's a very hard thing to do, on my own. And also, I kind of overfit, and I'm like anyone, so I can make mistakes and stuff.

Vincent: 27:21

So, I I really like the fact that right now, you know, I wake up in the morning and there's always a bunch of PRs from other people, you know, that that are trying to help. So that's that's really great to see. Regarding things like, model based. Yes. So we we have an implementation of Dreamer already.

Vincent: 27:39

We're looking at implementing Dreamer, v 2, But it's not an easy thing to to come about. I think, I'd I think it's, it's it's challenging because, you know, things are moving all over the place and there are many different types aspects of an algorithm like that. You know, there is like the the simulation and then there is like the, the the most base part and those kind of things that you need to basically organize properly, and that's not always easy. So, yeah, definitely model based. Inverse RL, so far, we haven't had any, you know, implementation within the library, but I know that we have users that are using TorchReel to do Inverse RL.

Vincent: 28:26

So I would encourage anyone who wants to to showcase an example of TorchReel in Inverse RL to to do so. And regarding the last thing that that you said, you know, and and our prospects for the future of the library, I think that, so maybe I I'm gonna make a step back. But when I started Torcharell, I I was basically coached by, the the founder of Torchvision. His name is Francisco. And Francisco told me, you know, usually in domain libraries in PyTorch, we have 5 pillars, and, some of these pillars are, we have pre trained models, we have datasets, we have transforms, we have IO, There is one missing.

Vincent: 29:12

I'm not sure. Anyway, but so what he told me is, you know, pre trained models probably that doesn't really make sense in RL. And datasets, yeah, that probably doesn't make sense in RL. And so I build the transforms, you know, for the environments and repair buffers. And then I let those 22 things on the side.

Vincent: 29:32

And what you're seeing right now in RL is that, actually, we're catching up, sort of. You know, we're using datasets for offline RL, and we have pre trained models for foundational models. And so that's really something that I think the library should cover more and more. So we already have a data hub where you can find a lot of datasets like OpenX or Minari or D4L, you know, with a single data format. So that basically means that all of these classes, you can just instantiate them, and it's gonna download the data for you and format the data for you in a conventional way such that you can really exchange one dataset with the other.

Vincent: 30:11

You can append your own transforms and all these kind of things. So it's quite flexible. There is still some work to do, to make it even easier to use, but I think we're we're on the right path with that. And the other thing is regarding pre trained models. You know?

Vincent: 30:26

I have colleagues at fair that work a lot on that here at Meta, and we're collaborating, you know, to, when they do when they put some work, you know, on on archive, to have the the printed models available in Torchlight for you to play with, very quickly integrate that into your algorithm without, without any hassle. You know? And again, I think that having things that are interchangeable, you know, like, if you have one pre trained model and you can easily swap it with another one, it's really, really makes a difference. So those are 2 of the things that, I would like to focus on more in the future. And finally, like, if I if I'm really ambitious and I'm looking at what we should have in the library that we don't have already, I I think on the the first one that comes to mind is MCTS and, related planning algorithms.

Vincent: 31:21

In the sense that those things are really, really hard to come about, But I think they they will have a place in the future of fine tuning large language models or, multimodal versions of those, in the sense that, you know, things like hallucinations, etcetera, can only be fixed with a smarter models that can really think ahead, you know, when they're trying to solve a problem. And I think that things like MCTS must be part of the solution. It's it's impossible otherwise. You know, there there will be some planning in in the algorithms that will come in the future, and I would really love for TorchArl to be to be part of the solution.

Robin: 32:12

And do you see it being used for, LLMs and RLHF?

Vincent: 32:18

Yes. So, something I already mentioned, but, we we have a workshop submitted where we use TorCHarell for RNA shelf, but in the domain of, drug discovery. But there are other things, and and some probably more anecdotal than others. But, yeah, I I definitely think that we we have a small user base that is using, Torchlight for for these kind of things. So I would like I would like to see more of these, obviously, and I think we can support them.

Vincent: 32:49

But, yeah, that that's already the case. Yeah.

Robin: 32:53

Can you talk about the, major themes in in your own research, in the past and and maybe in your future? I see other RAA related work, on your Google Scholar.

Vincent: 33:02

Yeah. Sure. So, it happens that I have, I guess, quite an unusual background for, researcher in RL because I I started as a medical doctor back in the days, and I had the opportunity to do a PhD in neuroscience. So it was supposed to be something quite experimental at the beginning, working with Parkinson and patients, understanding how do people form procedural procedural or habitual behaviors. And, what happened was that I started modeling people's behavior using Bayesian statistics and RL and got really interested into into that topic.

Vincent: 33:41

And at the end of the day, my PhD was just about new techniques in ML, RL, and and batch and stats. And when I finished my PhD, I basically had the opportunity to keep keep on being a doctor, going back to the to the hospital, and work as a neurologist, or switch to a career in in machine learning. And I I I didn't have to to really think a lot about it, and so I I I really liked what I was doing in the mail, and I decided to go to to go for that. So after that, I worked for a couple of years in the financial industry and then moved to Huawei where I worked in a reinforcement learning lab. And we're basically focusing on the problem of how do how can you make, a reinforcement learning algorithm that incorporates planning.

Vincent: 34:30

So something like model based RL, but also that is safe and that uses active learning. So we wanted to combine basically those three things to to make sure that you had an algorithm that was efficient, you know, would not use too much data, from the real world. But also that, you know, was safe. You know, where you're when you are gathering data, you would not do something completely crazy. The way we went about that was that we we use Gaussian processes basically to assess how much uncertainty we had about the environment and trying to decouple the uncertainty that came from the fact that, the environment is stochastic, you know, and so the uncertainty just comes from whatever is happening in the environment or the uncertainty that you have because you have not explored part of the environment.

Vincent: 35:18

And the other insight that we had was also that we wanted to give an incentive to the agent to explore parts of the environment that were not explored before, but also parts that we thought were relatively safe. And so we try to combine all of this, within a single algorithm that we called SAMBA. So I invite you to to check out the paper. It was, a kind of a nice piece of work. I really liked working on that working on that.

Vincent: 35:45

And so after working on Samba, I worked for a couple of years on things like generative AI and namely at the time, normalizing flows and how to incorporate that in RL. That was also quite interesting. Although, I don't think we we really managed to publish anything out of that. And yeah. And after that, I I had the opportunity to move here at Meta and start working on on Torchlight, which I did.

Robin: 36:10

Beside your own work, are there other things happening in our all that that you find interesting lately?

Vincent: 36:15

Obviously, I think that a couple of years ago, you know, when chatgpt came around, everyone was like, oh, well, this is amazing. And look, there's a bit of RL in it. So it kind of seemed like RL was becoming cool again. And, people were enthusiastic about, RL as a tool for fine tuning. And and then came DPO and people like, hey, Ariel is dead again.

Vincent: 36:40

So I think that, Ariel still has a huge opportunity within that space, but my bet would be more around planning and reasoning. I I think that us as a community, we have basically all the tools that other people are looking for. You know, things like trajectories and how to reason about, you know, sequence of events and and explore, a tree of, possible actions to to to select the best course of actions, which is basically everything that reasoning is about. Right? So I think that on long term, even though maybe the the next generation of, JNI models won't be fully based on RL.

Vincent: 37:26

I think it's kind of an illusion that that is gonna be the case. But I think we will have a lot to say in that story. And also another thing that I see moving a lot is that now that we have those very nice tools, you know, to interact with and and to query, you know, like Chargegpt and all all all of that stuff. You know? People are starting to think, well, how can we scale that up to the real world with, Embody AI?

Vincent: 37:54

And I think that's again, you know, us as a community, we'll we'll have a huge impact there. So I would say that those those two things really, keep me very excited about the future of ARROW as a field.

Robin: 38:07

Is there anything else you wanna share with the audience, while you're here?

Vincent: 38:10

What I would really like for for people to do more is well, a lot of people are already doing it, but, what I would like to to see more is to have people actively engaging with the open source community. So what I mean by that is that I think a lot of people when they see a bug or they see that something doesn't work, they, they're just like, oh, well, maybe I'm not using it right or maybe, this bug is just there and I just don't want to use that tool anymore. And I think there's, a little bit of of work from all of us, you know, when we see this kind of situation, which is to to go to the authors of the of the repo and tell them about the bug or tell them about the behaviour. Or if you have suggestions about, you know, oh, well, you know, I'm seeing this primitive and it's amazing, but it's a little bit hard to use. Have you thought about doing this and that?

Vincent: 39:01

And that's the kind of feedback, you know, that we as developers are really looking for. And and we have not enough of that. Because when I start talking with people, you know, informally at conferences and things like that. People actually have a lot of ideas about how to make things better. But I think that sometimes they're a little bit shy or we just think, oh well, I don't have the time.

Vincent: 39:22

But you know, posting an issue issue on the on the repo doesn't take long. And it can really make a difference on the long term. So I would encourage, everyone in the audience who need to be active members of the OSS community, not necessarily by posting, you know, new pull request with new features and reposts. You know, that's obviously something that is amazing, but, you know, sometimes just posting an issue or just suggesting, a new a new way of interact interacting with stuff within the library can really make a difference, you know, and and it goes a long way. So, yeah, that would be like my main take home message.

Robin: 39:59

This has been great, Dr. Vincent Moens. Thanks so much for doing this.

Vincent: 40:03

Yeah. You're welcome. Thanks for having me.

TalkRL: The Reinforcement Learning Podcast

Vincent Moens on TorchRL

Vincent Moens on TorchRLVincent Moens on TorchRL

More episodes

Vincent Moens on TorchRL

Vincent Moens on TorchRL

Chapters

Creators & Guests

What is TalkRL: The Reinforcement Learning Podcast?