Recsperts - Recommender Systems Experts | #13: The Netflix Recommender System and Beyond with Justin Basilico

This episode of Recsperts features Justin Basilico who is director of research and engineering at Netflix. Justin leads the team that is in charge of creating a personalized homepage. We learn more about the evolution of the Netflix recommender system from rating prediction to using deep learning, contextual multi-armed bandits and reinforcement learning to perform personalized page construction. Deep content understanding drives the creation of useful groupings of videos to be shown in a personalized homepage.

Justin and I discuss the misalignment of metrics as just one out of many elements that is making personalization still “super hard”. We hear more about the journey of deep learning for recommender systems where real usefulness comes from taking advantage of the variety of data besides pure user-item interactions, i.e. histories, content, and context. We also briefly touch on RecSysOps for detecting, predicting, diagnosing and resolving issues in a large-scale recommender systems and how it helps to alleviate item cold-start.

In the end of this episode, we talk about the company culture at Netflix. Key elements are freedom and responsibility as well as providing context instead of exerting control. We hear that being really comfortable with feedback is important for high-performance people and teams.

Enjoy this enriching episode of RECSPERTS - Recommender Systems Experts.

Chapters:

(03:13) - Introduction Justin Basilico
(07:37) - Evolution of the Netflix Recommender System
(22:28) - Page Construction of the Personalized Netflix Homepage
(32:12) - Misalignment of Metrics
(37:36) - Experience with Deep Learning for Recommender Systens
(48:10) - RecSysOps for Issue Detection, Diagnosis and Response
(55:38) - Bandits Recommender Systems
(01:03:22) - The Netflix Culture
(01:13:33) - Further Challenges
(01:15:48) - RecSys 2023 Industry Track
(01:17:25) - Closing Remarks

Links from the Episode:

Papers:

General Links:

Follow me on Twitter: https://twitter.com/MarcelKurovski
Send me your comments, questions and suggestions to marcel@recsperts.com
Podcast Website: https://www.recsperts.com/

What is Recsperts - Recommender Systems Experts?

Recommender Systems are the most challenging, powerful and ubiquitous area of machine learning and artificial intelligence. This podcast hosts the experts in recommender systems research and application. From understanding what users really want to driving large-scale content discovery - from delivering personalized online experiences to catering to multi-stakeholder goals. Guests from industry and academia share how they tackle these and many more challenges. With Recsperts coming from universities all around the globe or from various industries like streaming, ecommerce, news, or social media, this podcast provides depth and insights. We go far beyond your 101 on RecSys and the shallowness of another matrix factorization based rating prediction blogpost! The motto is: be relevant or become irrelevant!
Expect a brand-new interview each month and follow Recsperts on your favorite podcast player.

Note: This transcript has been generated automatically using OpenAI's whisper and may contain inaccuracies or errors. We recommend listening to the audio for a better understanding of the content. Please feel free to reach out if you spot any corrections that need to be made. Thank you for your understanding.

Personalization is still super hard.
People need recommender systems when they don't know what they want.
So I think it's a little more of a back and forth between the system and the person using it because if you know what you want, you can just go and you search for it and you just start watching.
Any pixel that we're showing for our members, like how is it helping them make a good decision about what to watch?
Those have been some of the major stages and then beyond that, we've been focusing on also just increasing the level of interactivity with the recommender as well.
Page construction, so trying to optimize, not just the one-dimensional ranking of the items that we have, but instead thinking about how do we take those and then organize them in a way that people can find what they want to watch.
Deep learning is very useful for recommendation, but more in that it enables you to take advantage of the real form of the data you have when you have a real-world recommender system.
The metric is part of the recommendation system.
Your recommendations can only be as good as the metric that you're measuring it on.
To have really high-performance teams, you need to get beyond just being a collection of individuals, but really being able to work well together.
Being able to work well together is things like being really comfortable with giving each other feedback because there's always opportunities for everyone to improve.
Hello and welcome to this new episode of RECSPERTS, a recommender systems experts.
I'm very delighted to be joined by Justin Basilico.
Justin Basilico is the Director of Research and Engineering at Netflix.
He leads an applied research team that works on the personalization of the Netflix homepage in particular, and this is why we talk today, with the support of recommender systems.
He joins the company in 2011 and has a degree in computer science from Brown University, and I guess is also known to many of the listeners as the co-organizer of the RE-REWEL workshop latest at RecSys 2022.
He also co-authored several Netflix publications at RecSys, along with several case studies on recommender systems.
Hello, Justin. I'm very delighted to have you on the show.
Thanks, Marcel. I'm really happy to be here.
Yeah, it's great to have you, and I guess we have plenty of topics to talk about for this episode of RECSPERTS.
I mean, Netflix has ever since be a great contributor to the community with doing workshops, with presenting lots of papers and sharing a lot of research.
I actually see your research blog posts as really one of the great blog posts that is frequently showing new stuff.
Can you brought us a bit of background about yourself, so how you join Netflix and actually also how you went into research in recommender systems?
Sure. Yeah. I got into research recommended in the systems actually about 20 years ago, and it started probably before that, when I was in college, I was really lucky to find my way into doing computer science and taking an AI class that gave me a little exposure to machine learning.
We learned about the Perceptron, and that sparked an interest in me in doing machine learning.
Then I was able to take a class in neural networks, and this was back in the time where there was very little work in neural networks that was not the major area of study, the application that it currently is.
That kind of persuaded me to go into grad school and focus on trying to learn more about machine learning because I realized I couldn't do that just with a bachelor's degree.
Then when I was there, I was looking around for a project, for the end of my first machine learning class.
At that time, my wife and I are really into watching movies, and we would go to a blockbuster that was down the street.
It also happened to be like right a few blocks actually for my grandpa.
Actually, I used to own a bakery in Providence.
We would spend 45 minutes or an hour trying to find a movie to watch that might be, the movie might be only an hour and a half, and it just felt like, oh my gosh, it's so hard to find things.
Maybe it's a bit of indecisiveness on our part, but there's got to be a better way.
At the same time, my advisor introduced me to EachMovie dataset.
They said, oh hey, there's this dataset with movie ratings in it, and then put those two interests together and was able to start doing some work and recommendation and then did that when I was in graduate school.
From there, it was really just a unification of some of the things I really liked doing with applying machine learning algorithms.
At that point, it was more like kernel methods and using those, trying to build systems, and then also just really liking to try to understand what do people really like and enjoy and try to predict them to help make better decisions.
That's how I originally got into it.
The initial work was around trying to figure out how you combine cloud filter filtering and content-based filtering back in the day.
That was the focus.
Then from there, in terms of getting it to Netflix, I spent about six or seven years at Sandia National Laboratories, working on a variety of different machine learning problems from things with trying to understand when people are in difficult driving situations in a car, in a project we had with Daimler.
We built a personalized internal paper search engine at the labs, and a whole bunch of other things, lots of different applications.
Then at one point, Netflix reached out to me about this opportunity.
It was going back to something I was really fond of the problem of recommendation.
That made that transition and focused on that again ever since.
One of the first recommenders that you basically built for others was actually a paper recommender system that recommended papers to other people based on collaborative filtering, I guess.
It was more like a content-based filtering because it was a very small user.
We're just trying to get something started for the library there.
It was a very small pilot project.
There was definitely some engineering challenges with how everything was put together and the latencies and things there.
We learned a lot.
I would say that was the first one that got a little bit of use.
But we had done a lot of research.
In the group I was in there, we worked with psychologists too.
It's like a computer science psychologist trying to understand people and how to make better training programs and things like that.
That was a really good experience.
But then was able to leverage the broad view of how to use machine learning to solve these lot of different types of problems and then bring that into how to build personalization algorithms at Netflix.
I'm actually missing something there.
I'm very curious about what your point is on this.
You joined Netflix in 2011.
In terms of recommender systems, Netflix has already become very popular before with its Netflix prize that took place from 2006 to 2009.
Have you actually engaged in the Netflix prize as a researcher back then?
Yeah. That's interesting question.
I remember the day the Netflix prize was announced and I actually went to my manager.
I was like, oh, hey, there's this competition.
Can we work on it?
Being Department of Energy Lab, he was like, no, I don't think that that makes sense to focus on.
That's something that I think that was just where one of my passions were.
What I was trying to say is that the experience I had there gave me a broad view of lots of different ways you can use machine learning to solve problems and how to make it easy to put it into various different systems, including building infrastructure around that.
The combination of that plus having previously done some research and recommendations and coming into Netflix put me in a spot where when I joined, we already had the ratings predictions system set up and some ways of putting that together, but we're really starting the transition of doing machine learning ranking and other really stepping up how we were applying personalization in there.
Being able to leverage that is broad view of machine learning and how you can use it to solve these recommendation problems plus thinking through the systems and how to build them, put me in a really lucky spot to be able to help work on those with a bunch of other great people to make that happen.
In the meantime, DVDs disappeared and on the other side, I guess the way that personalization is performed has gotten so much better in terms of finding the right stuff much more quickly instead of having to go through a 45 minute walk through one of that video stores.
I would also assume nowadays you can get a much better picture of what you actually decide to go with in a much broader sense in terms of reading, for example, a synopsis, seeing a short trailer.
This was not possible back then.
So things have gotten much better and much more efficient there thanks to personalization in that domain.
Yeah, for sure. I mean, I think the Netflix prize at the time represented that there was a different type of recommendation problem in the DVD space because there is such a delay between when someone would select something and have to be shipped out to them and then they would view it.
And then they would send it back.
And so it was much more of like a high level decision support tool.
And so being able to try to figure out which ones you should try to kind of prioritize versus in a streaming world, we can show you more information and we can play trailers and we can you can also just start watching it and see how if it's something that you're going to enjoy from what you see kind of within the first few minutes or not.
And so it's more immediate that feedback loop and kind of the choices you're going to make.
And so I think that that really kind of shifts the problem, shifts the kind of data that you also can use.
That's, I think, behind also kind of the shift in focus from like reading prediction to then saying, OK, I think more things like ranking kind of matches.
You know, we're more trying to do when we're doing something like streaming.
Yeah, which already brings us to one of the first points that we want to dive into in this episode.
I guess that most of the listeners are to a certain degree well aware of the history, but to get the ball rolling, can you walk us through the evolution of the recommender systems at Netflix?
I mean, we already started with rating prediction.
It's going far beyond than that very popular starting point to how you do personalization nowadays.
Sure, yes. I think we started covering that a little bit.
But yeah, the Netflix prize and the kind of DVD world is very much about you're trying to figure out how do we predict what feedback someone's going to give for a particular TV.
I guess it was mostly movies at that point.
Some TV shows that they would watch on DVD.
That was really the primary way that people would see the recommendations, that those predictions were very front and center so that people kind of try to understand from the personalized nature of like if that's something that we think that they would like and be worth going through with that.
And then from there, we then kind of transitioned from just doing that rating prediction to trying to come up with personalized rankings.
So the idea being that we're trying to do is just help people have surface the types of content that people are going to most likely want to watch and enjoy.
So that you can find that really easily on the site without kind of having to browse through lots and lots of different titles.
And I think that becomes kind of one of the core parts of the recommendations of figuring out of all the things that you could possibly watch, which are the ones that you're most likely going to want to watch and then you're going to actually enjoy watching.
And then from there, we kind of focused on layering on top of it, this problem of doing we call page construction.
So trying to optimize not just sort of the one dimensional ranking of the items that we have, but instead thinking about how do we take those and then organize them in a way that people can find what they want to watch, you know, kind of no matter what mood they're in or all the variety of different reasons people might want to be watching Netflix.
How can we get them all the recommendations up there in a way so people can easily understand what they're seeing?
They can have a real nice diversity of recommendations that kind of cover their different interests.
And for us, we found this two dimensional layout of the content where there's these different we call rows of recommendations.
So they have some title that kind of explains that whole set that we're showing that you can understand what's in it and decide, hey, this is something I kind of want to dig more into.
And then you can kind of scroll you along the row or you can kind of scroll up and down the page vertically to kind of skip over the things you're not as interested in.
And I think that moving between that was like a big step because you're not only kind of changing the shape of the problem, but you're also trying to go for a higher level objective, which is it's not just about, say, helping discover brand new things, but also helping people jump back into the latest season of something that just, you know, that they'd watch before that came out or the next episode of something they've been watching the previous evenings and things like that.
So that was sort of the kind of one of the next stages of evolution.
And then from there, we started then kind of focusing on not just kind of how we're presenting these sets of recommendations, but then also basically how do we explain them and like, you know, kind of help people understand what they are and personalize that.
So I think that's kind of really trying to personalize like the user experience layer that's on that and thinking about like any pixel that we're showing for our members, like how is it helping them make a good decision about what to watch?
Those have been some of the major stages and then beyond that, we've been focusing on kind of also just increasing the level of like interactivity with the recommender as well.
Besides that, by introducing now the double thumbs up button that has, I guess, appeared within the recent year or someone.
Yes, I think that the double thumbs up kind of represents, you know, over time, we've tried to modify how we collect explicit feedback from the members about what they really enjoy.
And the star rating system kind of represented one way of doing it.
We moved to a thumbs system because it was something where people would be able to engage with a lot more and it's kind of better conveyed that it's really about you and what you think about a piece of content and not just some kind of more critic rating or something like that.
But it's something where over time, we keep trying to evolve and improve that to make sure that the double thumb representing make it easier for people to tell us like, oh, they really, really like this versus, you know, it's something that they just like.
Yeah, it was good, but this is something I really enjoyed.
And so making sure that the system is really focusing on optimizing for those elements.
Yeah, you just mentioned that you also are over trying or adding more explanations for recommendations or at least for the titles being shown or for the rows being shown.
Can you give a more specific example of that?
Because I currently have to think about where I see or where I may see some explanations.
What would be an example for explanations?
Maybe explanations is not the best way of framing it, but like it's kind of just we call it like the evidence of like kind of all the information that we're showing.
You know, what is it that the synopsis is conveying?
You know what the images we're using, the trailers we're using, are they able to kind of really connect and explain the content?
Because as you're going through this evolution, Netflix is also moving more towards original content.
And one of the challenges with original content is that there's a lot of new things that we made just for someone like you to watch.
But you may never have heard of it before, right?
So you could have a really good recommendation.
But if someone doesn't understand what it is, right, you have to be able to kind of show it to them in a compelling way so that they they get like, oh, yeah, this is something that I'm really interested in.
And so being able to make that connection and personalize what you're doing there.
So someone says like, oh, yeah, I get why you're showing this to me.
It's not something that they might have heard about before.
But like, yeah, now I want to spend time with this and check it out.
I think that's been part of the evolution.
And I think it also represents a little bit of at least for myself.
You know, I think about like the early days, I think one of my personal conceptualizations of the perfect recommender system is you kind of just turn it up.
You know, you can just turn it on.
It just, you know, somehow knows the perfect thing that you want to watch and just starts playing it.
I think that that's not quite enough because even if that was the case, you also have to convince the person that it is that recommendation.
Right. Or in being able to do that for something that someone has never heard about.
Right. Like you need to be able to kind of show and explain.
And I think we could do better at that.
But also, you kind of what it is so they can kind of connect and say, oh, I get why I'm being shown this.
People need recommender systems when they don't know what they want.
And so I think it's a little more of like a back and forth between the system and the person using it.
Because if you know what you want, you can just go and you search for it and you just start watching.
Yeah, it's like that kitchen table talk that you might be having with colleagues and then you exchange on the most recent series that you might have been watching.
And then you exchange, you get new recommendations by people in real life.
And then you would directly fire up, for example, Netflix and go to that very show without needing any recommendations at all.
Right. Exactly. Like you don't want to replace that.
That's great. You know, you want people to have those conversations like that.
You know, such a strong way of getting recommendations for people, you know.
And in some ways, what I would love over time is for the recommendation system to feel more like it's that, you know, that that friend or that person who knows you well, that you could have that conversation with too.
But in some ways, you know, where people do need the recommender, though, is like when they're not exactly sure what it is that they want.
Like that's where it provides the most value. So it's like, let's help you find something.
And so that's why the way you're presenting the recommendation, you know, the evidence that you're providing and the ability to kind of have that back and forth with the members to understand like, OK, what is it you're in the mood for?
Because everyone has like such a wide variety of tastes. You know, it's super hard to just predict ahead of time what people want.
And we try to do that. You know, we try to do it as best we can.
And we keep trying to get better at that.
And part of the reason of having a page structure is that you can surface all these different potential interests that people might have.
So you can try to cover them, you know, match all of them so that no matter kind of what interests someone has, you can do that.
I think that's part of the interesting challenge we have in that regard.
So explanations does not mean what we, I would say, generally perceive as RecSys practitioners and researchers.
So, for example, explanations why I am being shown the recommendation.
But it's rather about explaining the content in various ways so that people are able to understand the content and then can better relate to it and basically better relate to it in terms of making a decision whether they like or dislikes the content.
Because, for example, they view some personalized artwork, but also they read the synopsis and this is making the mechanism of conveying the mostly original title to people more easier.
Is it that we write?
Yeah, I think of it in the broad sense of all the information that you can show there and like how you can use it.
But then also like I think the traditional way people think about kind of the explanation of recommendation, that could be one thing that's useful to surface in some cases.
And again, it's something I think we could also get better at how we do that.
But I think that's kind of the general idea.
Okay. You mentioned, in other words, that I found quite interesting, the mood of people and then tailoring the page to their mood.
And that makes me remembering the last episode that I had with Richard Marotta.
And we were talking about intent, so detecting what is the reason the user has coming to the page or what is the task that the user wants to get done.
So talking about mood, which I would say somehow shapes your intent, even though the intent might be something more latent and then leads to you consuming some certain content.
Going back to that mood or notion of mood, how do you detect mood or what would you say is mood or is it possible space of the mood?
And how do you detect it when a user comes to the page?
Yeah, I mean, that's a hard one.
You know, I think when I think about it more, you know, I think in our context, you know, there's a few like high level pieces of trying to understand, you know, again, some of these tradeoffs of, you know, are they more in the mood of trying to find something new to watch, kind of continue with something else that they've already engaged with in the past?
I also think mood can match.
So for us, it's like kind of we have a lot of different, you know, micro genres and moods can be part of that.
When I think about mood, you know, it's not necessarily trying to predict the mood, but trying to predict someone who's in a certain mood, we're trying to figure out, OK, what is the kind of content that really would match with that?
It's just kind of, you know, kind of this latent variable that you kind of have to deal with when you're trying to figure out what recommendations to surface and how to kind of have that good coverage and diversity in that set that you're providing.
OK, OK. In terms of personalizing the Netflix homepage, you wrote a blog post about this.
It was, I guess, back in 2015.
I picked one sentence out of it and maybe we can take it to go further from there.
She said, part of the challenge and fun of creating a personalized homepage is figuring out new ways to create useful groupings of videos, which we are constantly experimenting with.
So these useful groupings of videos, I guess they are the foundation for doing everything on top.
If you look at it from a bottom up perspective for creating that final personalized homepage.
So these groupings of videos in the blog post, there was a picture that said, OK, there are tens of thousands different rows that you come up with.
And there are all these micro genres.
Can you share how you actually come up with these groups and walk us through the steps, how that corpus of groupings given a user and the context results in the final page?
So, yeah, we have a team that really focuses on understanding all the TV shows, movies and now games that are in our catalog and trying to think through how do you organize all of that content so that it's really set up in a way that people can understand it.
And so how people actually kind of watch the content and understand it and try to build a taxonomy of what types of content are out there and they kind of maintain a lot of that and provide a whole bunch of these what we call a candidate rows that's kind of derived from all of that understanding they have.
And they're constantly trying to come up with a kind of new ways of thinking through kind of like what would be interesting groupings of this that we can kind of explain to people that would really resonate.
And then in the page construction problem, they're able to come up with those and then we're able to figure out, OK, who these groupings actually respond to.
So we're recommending titles.
We're also recommending like these whole groups of titles and recommendations to try to figure that out.
So that's kind of like one way of coming up with the groupings that people probably most see those be like all these specialized genres so they can be dark comedies from the 1980s and all of those things.
Sometimes can really speak to what you're interested in there, but they can also be collections like, hey, here's our Emmy award winning or Emmy nominated content as well.
And then we also are constantly trying to come up with other types of collections that are maybe kind of more focused on different dimensions of the problems.
Like we recently added one that was like, hey, here's some shows that have a new episode since you last watched them.
So you might have missed them when they came out.
So I find that really helpful for me because sometimes I'm watching some TV show and there's a new one that comes and it didn't necessarily at that point.
Maybe I saw it and I want to come back to it, but then it's like a good place to kind of come in and reengage with those.
So it's always fun to kind of think about what are new ways that you can kind of surface these things both in this kind of looking through and understanding the whole content space.
But then also in coming up with these kind of new dimensions are kind of more dynamic that we can kind of put into the homepage.
OK, in terms of that, I sometimes have the feeling like there's only just a single row at Netflix that is just using pure collaborative filtering, even though it might be any of the many approaches there.
But so many more that come or stem from these groupings of videos that are then brought into a personalized ranking to my demands.
And of course, the item ordering within that specific group is, of course, also personalized.
Yeah, OK, so and that means there's a whole corpus of sets that form these groups.
And now if I'm entering Netflix, so how do these two points connect together?
So if you see, OK, Marcel is there again, we know about Marcel.
He's into these certain genres or he has watched this the last time.
How is the page actually constituted that is there?
Because I remember that, for example, also in your in your blog post, you mentioned that you don't just do it very naively row by row because then you might enter the risk of having not very diverse rows.
So you do it somewhat stage wise.
Is this something that you could elaborate on?
What we try to do is there is a huge space of rows and we want a lot of recommendation problems that you want to have really good recall of like all the potential.
You think of the rows representing the potential interests for the member and kind of different ways of people being able to kind of browse and interact with the catalog.
So you want to like have high coverage of that.
But then when you're going and putting into a page, there's really very little or essentially no processing kind of outs after what the algorithm is choosing.
And so you have to balance like all the factors that are important and recommendation when you're putting together.
Right. So the accuracy of the recommendations that we're showing and really be able to kind of get people's interest.
But then things like diversity are super important and novelty super important.
Helping you know, cold start and you make sure that new content is being treated fairly in the system is very important.
These new rows or modules are putting in the UI like that they're being treated in a fair way.
You have to balance all of those factors.
You can imagine it's very hard to do that and just like, oh, there's one score that somehow you can individually isolation, you know, each item.
And that tells you all about it.
Like you have to understand the relationships between items. Right.
You know, one way of doing that is by reevaluating kind of the scores that you're building the page.
There's lots of different ways of trying to do that optimization.
And I think that's why we try to think of it as like it is a problem of putting like the whole page together.
And you're balancing how good you can make that like total page, that kind of balancing all of these variety of different factors you're trying to go after.
While at the same time being computationally feasible, essentially with like what we're doing.
And we had some also blog plus written back in the day to kind of explaining that, you know, a lot of our these algorithms run in like a precompute mode for, you know, at least some of the key pieces of it.
And so that's kind of how we can also kind of balance some of like the latency or dealing with, you know, people who have like really long histories and how long it takes to kind of process and build those pages versus others.
So that's why balancing the technique you choose that can try to put together a good page, which, you know, just kind of that the simple score and sort probably is not going to work enough for that with all these different factors.
And even little things like, you know, not showing duplicates, you know, really pervasively through the whole UI, like some something like that, which is like a very intuitive thing you want to have and output the recommendation system.
It creates huge challenges because, you know, everything in every position, every time you pick something, it kind of changes, you know, what everything else you're going to show.
And so how do you deal with that?
And, you know, again, make things efficient and build a really good experience out of all those different sets of recommendations.
Yeah, in terms of deduplication or having duplicates in certain rows, would you say that this is in general bad to have duplicates or is it okay to a certain point?
So, for example, because it might help you pushing some certain items or is it really basically like you want to avoid having any duplicates at all in the rows that you can scroll through or what is your take on this?
If there's one thing I kind of learned with a lot of the work on these problems with Netflix, it's all these things are a balance and trying to find the right thing to do there.
And so lots of duplicates is obviously bad because you're just repeating yourself over and over.
But sometimes duplication has a purpose or a point.
Right. Again, you know, just the way people interact with things or with our product and browse, you know, we can show a lot of different TV shows or movies on a row.
And so maybe you didn't actually see it, you know, even though it was rendered on the screen.
So sometimes being able to show those duplicates is valuable.
But other times, you know, and especially if you put it in the context of you might have skipped a row because the name of the row didn't really resonate with you.
But there was a good title recommendation there, but then we might then say, oh, in this other context is actually good.
So I think that's where it's challenging.
All these things are a balance and you need to find the right balance for what people want.
And then sometimes these balances are something this one might be hard to learn.
But like if you could figure out how to personalize, like some people might prefer that because of the how to get how they scroll.
Right. People tend to scroll really fast and they're not really considering a lot of things, you know, on part of the page.
Maybe that's a sign, you know, that you want to make sure that some of the good content you might have been putting up on the first few rows get another chance.
But perfect answer for these.
I think it's just you have to try to find the right balance between all these different factors.
I really like what you are saying about the receptivity of the user, how sensitive a user is towards showing certain content several times.
And even then, you might show it in different ways.
So, for example, by changing the artwork that that changes from person to person and that this sensitivities are not all the same.
And then for some, you have maybe a chance again to show the same item and others perceive this as there is a boring signal or something like that.
I actually also have to think about your evergreen presentation and talks that you're giving.
And that is always kind of a real good reference point.
So the recent trends and recommendation systems.
And I guess we will come to the approaches just in a minute.
But also something that I would like to start with is how misaligned metrics can be.
So that going from a training objective to a goal is really, let's say, a sturdy path.
How are you dealing with this problem and getting better there or creating more alignment?
How far can you basically trust that your training objective you're optimizing for is the right one or that this is the right offline metric?
Because it's better aligned with some online metric.
Yeah, I mean, I think it's a good question.
I put that in there. The example I have there is kind of a cartoon one walking through things like, you know, and what exactly like NDCG or AUC and kind of you misaligned with what at least what I think are our overall goal, like what we're trying to optimize for, which I call member joy, which is just kind of helping in the long term.
So it's helping people get like the most enjoyment out of the entertainment they're consuming through Netflix and just the most out of their Netflix subscription.
Yeah, it's hard because you have to go through a bunch of different steps.
You might kind of have that very localized metric that you're trying to optimize your recommender or your page construction or all these algorithms for.
And then you go to another level, which is kind of like what your offline metric that you typically might use to kind of do offline model evaluation and tuning.
And then there's a test metric. And then again, there's that final overall objective like joy that you're trying to go for.
And it's just align that whole path.
You want to keep things all aligned.
I think we ask a question like, well, how good is it?
Or like it's it actually kind of depends on the types of models you have.
So I think some of the models that we have, you know, some of the alignment is pretty good.
Like you can see that, like, hey, if you see a big increase in your offline metric, it kind of directly does lead to some online performance gains.
And there's other places where the problems get more complicated, where you basically can see more discrepancies there.
So you see that, oh, you know, the model thinks it's gotten a lot better, but then online it's not better or it's actually worse.
Or that all the different levels of that can be kind of misaligned.
So early on, I think when you start a recommender from scratch, I think as long as you pick a reasonable metric that's kind of roughly pointing in the right direction, you're probably fine.
But as you get better at your models, get better at optimizing that, you will start to find that it can optimize in areas where it kind of you get into those misalignments.
And what I found is it can be very easy for people to see these misalignments and then be like, oh, OK, I didn't work.
Well, I'll just try another idea. Right.
But in some ways, like when you find something where you have these big disconnects, it's actually like that's a good point to like pause and be like, oh, actually, I've learned something really interesting.
That my metric is not doing what it's supposed to be doing.
And now it's time to like, I should switch and focus on this.
And so you kind of want to like iterate back and forth between them across time.
Those are very valuable learnings.
Sometimes you'll learn some like, OK, it's just some kind of corner case and most of the time it's fine.
But generally it's useful to kind of go back and reevaluate and look at all those different linkages in that whole chain of metrics and keep on improving.
Yeah, OK. I really appreciate that advice so that you don't go.
Let's say the easy way of saying using offline metrics is shitty at all, but rather use them.
Sometimes they are useful.
And even if they are not, they might provide you with some learning experience to do better.
Yeah, I mean, I go further and say like the metric is like part of the recommendation system.
And like I think about you want to optimize and try to improve all parts all the way from like the UI to the data, the models, all the levels of metrics you have.
Like they're all different parts of how you can make your system better.
And I started to say like your recommendations can only be as good as the metric that you're measuring it on when you're trying to improve them.
Again, the offline online metric is something you can kind of think about and notice those discrepancies.
But also thinking about kind of like is the online metric actually representing like the overall goal of what you're going for?
We have a team of data scientists who they think about how do we make those better?
We call like core metrics that we use.
And one of the great things about Netflix is we do lots of A.B. testing where we really try to make sure that when we're trying to make improvements to the product, there really are kind of representing improvements that we see in terms of long term member enjoyment and things like that.
But also we have to keep on making sure you're measuring that's very hard.
So you have to kind of consistently be able to make that better.
And so I really love talking with those people and trying to think with them about how that works.
And then when you think about the whole chain of metrics, you kind of have this big bias variance trade off there too, which is kind of very high variance on some of these very online metrics.
You have to kind of measure user behavior.
It has a lot of noise in it.
And then you can get very precise measurements of some of these metrics that might have a lot more bias in them because they've made a lot of assumptions.
And you want to think about improving that whole chain of them together across time.
Yeah, actually, some area where this can really hit you very hard in the face is actually deep learning.
And I would say deep learning for recommender systems has seen a steady rise up to a point where it has for me become very self-evident to be applied to certain recommender scenarios.
And you were actually provided a case study in 2021 on what your experiences were with using deep learning for recommender models at Netflix.
And it goes even beyond with your work in bandits and reinforcement learning.
So it's not the same, but I guess the whole journey or that part started somehow also with deep learning.
Can you introduce or share some of the insights that you are providing in that case study and where you could give some advice to use deep learning versus not using deep learning?
Yeah, sure. So the deep learning journey at Netflix was definitely an interesting one.
It was definitely when deep learning kind of became clear that it was really taking off within different applications within NLP, computer vision.
We had a lot of interest in trying it out in the recommendation areas.
The first couple times we tried it, it actually might have seen some really good offline games, kind of back to what we were saying before.
But it didn't really pan out on the online system.
And so there was actually quite a while where there was a lot of like, hey, look, we tried this.
You know, at that time, there was still this question of like, is deep learning, is it just hype or is it real?
And I think in the overall community, I think, you know, for machine learning, I think we're past that now.
It's like it's made its way into all types of amazing applications.
And even over this past year, just keeps going in terms of building on that.
But what we kind of learned is there's actually like a lot of connections between deep learning and what it was trying to do in a lot of what was happening, recommendation, the time around kind of different types of like matrix factorization type methods, because essentially what matrix factorization is, again, it's not deep, but it's doing pure representation learning, right?
Because all you have is like a user ID and an item ID, you don't know anything about them and you're just learning, you're kind of learning embeddings and then combining them.
Those techniques are just kind of learn, you know, you just kind of multiply and then that would be how you learn it.
And I think what at least we present that paper and kind of the takeaway there on this part was that if you tune that type of method, right, because it's pure representation learning, like you can get them to be very effective compared to like a deep learning method that you have to learn basically like the dot product of these two representations of learning.
So it's like all the representation is just kind of everything you're learning from the data.
Also say it's I think working recommendations, like I think one of the things that still astonishes me though is like how far methods like that can actually go, like knowing nothing about the actual content of the item or the user, right?
You can get so far just with learning from these interactions between people.
And I think for a lot of applications, that's really great because up until now, it's been very hard to get really good content representations out of something like a video that's many hours, you know, hour plus or many hours long for a TV show.
I mean, it was actually one of your colleagues, Harold Stack, who provided a great paper on this coming up with the Ease algorithm and also some follow ups on it where he actually, I guess the title was Don't Go Deeper Go Higher, right?
Yeah, yeah. So I think coming back to that, I think that one of the key things, though, from that journey, it kind of if you're looking at classic user by item matrix type of recommendation problem is that there's a lot of similarities.
And overlaps between work that was being done for matrix factorization and some of the things being done in deep learning and that we could kind of borrow methods from both sides to try to make things better.
So I think Harold's work is awesome because he just kind of can keep pushing the learnings and building off of things from both of those areas to kind of show what you can do without the necessarily going very to the deep part.
But on the other hand, what I say is like deep learning is very useful for recommendation, but more in that it enables you to take advantage of the real form of the data you have when you have like a real world recommender system.
You know, you don't just have this kind of binary user by item interactions or ratings or something in there.
You have these rich histories of how people interact and engage with different items and over time and kind of all the context around them.
And the context of where things are being presented and there's so much information you actually have.
And then you actually have, say, the content representations and things you can start bringing in.
So I think when you look at it from that perspective, there is a lot of work on like kind of different ways of pulling in some pieces of this here and there in the recommendation approaches.
But deep learning becomes such a nice framework for just being able to pull in all of those and kind of unify them together and extract information from what you can from a sequence or what you can from time.
Being kind of important dimensions there.
So it wasn't somewhere just like it was such a clear step forward on kind of the classic problem, but it kind of enabled you to all of a sudden build a lot of different other solutions better.
You've seen a lot of the kind of advances come from that.
And then going back to what you're saying with kind of metrics, when you have a model that all of a sudden has a lot more parameters that it can learn and fit, it has a lot more flexibility in terms of how you can tune it.
You can find its way into those places where your different layers of metrics might diverge.
And so, again, you have to go back and understand, oh, I tried something here.
It may be the offline metric went way up.
The online metric maybe didn't.
Again, OK, that tells me something about my metric.
I don't just say, oh, look, deep learning doesn't work.
But like, oh, wait, something in my metric is off here.
How do you fix that?
And having seen kind of like the evolution of our systems over time from being kind of very simple models to kind of going through and becoming more complex over time.
A lot of times what I found is that it's not necessarily just changing like one component at a time that necessarily unlocks the value.
Sometimes you have to change like two or three pieces of it at the same time to actually get that step forward because all of a sudden you're just enabling those new parts.
So for something like deep learning, it can be it's not just about changing the model, but also improving the metric and what it's trying to optimize for or in other places.
It's we change a model, but like all of a sudden that enable enables you to use all these features, maybe even try them in the past and they didn't work.
But now you have a new type of model that actually can make use of that data.
And that combination is actually what then leads to that really big improvement.
Okay, I see. I understand.
So in that paper, you actually differentiate into two different categories, the sequential and the back of items approaches that you refer to.
And you already mentioned that most of the value for using deep learning does not come from the pure interaction, but rather from considering so many more additional sources of information, which are, as you said, context, which are the sequences, the history, which are the content.
When it comes to these two different directions that you are presenting in the paper, so the back of items approaches and the sequential approaches, would you say that all these different sources of information are equally important for these two directions or that sequential approaches work better on some specific data or data representation as the back of item approaches work?
I think, again, it depends on the type of problem that you're trying to deal with. So I think if you have a problem where the sequence information actually really tells you a lot, then that type of representation and understanding how that flows is important.
So I think obviously something like language is like sequence is so important for understanding it.
I think in recommendations, there's some places where that's the case. Right. Like obvious example would be like a sequel. Right. Like you probably, you know, you know, if you if you watch Empire Strikes Back, then going back and recommend the original Star Wars, it's like, yeah, you know, you maybe the person probably would like it, but they probably have already seen it.
Right. So, you know, it might not be a great recommendation. And so kind of understanding like that there are these very natural sequences and there's obviously kind of like episodes and seasons within a show.
There can be others of like how people get into certain genres and kind of how that tastes kind of evolve over time. Also, there might be some sequential information and recommendation may not always be as strict.
You might be able to deal with some reordering, which is why sometimes in that kind of like bag or presentation where it gives you kind of a different maybe more like holistic view of like a full set can be useful and also be more efficient versus say if you're doing something more like an LSTM or GRU.
Of course, you know, with a transformer model, you can kind of get around some of that too. But yeah, I think it depends.
I think like a lot of these things, you know, it depends on the problem you're trying to go for, like what your data is.
Yeah, makes sense. I mean, we are already talking about also sequential models. And I guess one of the major categories that also helps us with in-session personalization.
How important is getting this right for you at Netflix? So, I mean, you can do a lot of the stuff, I guess, also offline, but many things need to be done online and also need to change within a session.
So what is the role that this is playing for personalization and specific for recommendations at Netflix and at my personalized experience to change adaptivity to what I'm doing within a session?
Yeah, I mean, I think it's an important area. We had a paper at RecSys several years ago about doing that on the homepage.
And then last year, a couple of my colleagues did one of the industry talks talking about doing that also for something called the pre-query canvas.
So it's what we show when you kind of land in search, but you haven't actually put anything into the search box yet.
So kind of trying to understand when people are in that search mode, what it is that they might have been looking for, and can we kind of learn what's happening, say, from what they're doing on the homepage there.
So it's an important characteristic of kind of building these more interactive personalization systems in the future.
Yeah, talking also about one of your colleagues, I greatly remember your colleague that has been talking about RecSys Ops at RecSys 2021.
Yeah, yeah, that was the previous year. We've tried most years to kind of bring some interesting aspects that we've been working on, the RecSys industry track.
So yeah, that was the previous year was on RecSys Ops, which is, as I had done a lot of work with a bunch of other folks on some of our algorithms there.
And I think it's one of those interesting aspects that when you're doing this work, there are these practicalities you have to deal with that you don't necessarily come up in the day to day when people talk about what they're doing with recommendation systems, but can matter quite a lot.
And he kind of walks through this kind of framework for trying to understand how you detect and kind of identify and diagnose different issues, how you respond to them and then follow up on fixing them kind of more deeply.
A lot of the motivation for things like this is like if you're building a recommender, and again, you're dealing with there's always new items coming in, like each item only launches one time.
And you want to make sure that the system is going to work, that it's going to be treat that item appropriately.
But there's so many steps to setting up all the metadata and everything in the system to make sure that the recommender is going to be able to work with it from day one.
And that was kind of a lot of one of the key examples kind of working through with this like RecSysOps is kind of how do you make sure that a new TV show or movie launches like going to be able to handle that in a good way?
And can we detect the problems that could happen earlier?
One of the really interesting things there was like, kind of the team was able to do was actually figure out like, can we predict what the model is going to do in the future? So that we can figure out like if there's likely to be a discrepancy there.
Discrepancy in that regard would mean between what you expect, what or how the item should perform and how it would perform is the system would behave as predicted.
Exactly. So can we predict what the recommender is going to do? And if there's a kind of a big discrepancy there, that's been able to help us do things like find, you'll say like there's some missing metadata around something before launch, be able to fix it, right?
And so, but also, you know, sometimes you can find other types of problems from it and that you might need to go in and do some adjustment in the algorithms or system that that's deeper.
But like when you think about that whole these recommendation problems, try to take like a very broad like application, like the whole part, like the whole system really matters.
And you want to make sure that there's so many things that can make the algorithms not behave in the way you would expect for a certain piece of content.
So you want to try to like minimize the places where say there's just like a setup problem or some other data issue or something that causes these so that we can know about them ahead of time versus just reacting to them, you know, once something is already like live on the service.
And you can't prevent all of them, but at least there's kind of a framework for how to think through that.
And sometimes this might lead you really to the point where you encounter, let's say, just a simple causing factor or some simple causing factors.
Like, for example, you said if you see that there's a large discrepancy between what you predict the new item is going to behave and what you expect it to behave, then it could be really attributed to that.
There is some kind of missing metadata that you then need to fill in.
It's hard to detect, it needs a whole system. So I guess there has been a lot of work that has been going into it.
But in the end, it's, I mean, really nice if you can track it down to something that is so easily to fix, because if you really see that there's a description missing or something like that, then you'll see what needs to be fixed.
However, on the other side, what if there are some more, let's say, lighter discrepancies where, for example, after some initial search, you see that the reason is not as evident.
So, for example, where member preferences just have collectively changed in a way that this item is just not according to the taste of people anymore.
Is this something that you might also be running into where you have some, let's say, more latent issues with an item that is going to cold start?
Yeah, I mean, I think as part of, we think about RecSysOps part of it is we want to detect the challenges that will come up because the system is doing the wrong thing versus saying, like, OK, maybe our earliest, some of our earlier predictions were off.
And so we have the ability to kind of disambiguate between those kind of in the different data we collect.
But you know, some but you can think about using these types of approaches to kind of find deeper patterns or problems to kind of follow up on to understand if there's something more.
It could sometimes be hard from like any one example, but you can kind of look at patterns of them to kind of understand if there's some pattern of cases that might say, OK, there's some data or features or something in the way we're training the algorithm that is making it not causing some kind of some of these patterns.
And then that kind of informs saying, OK, let's focus on this.
And that can be both on the item side, but it also can again be on the user side.
And I think those patterns they have with RecSysOps is trying to help you again.
You'd be able to detect those and identify them.
And then in the short term, having some ability to deal with it, but then also being able to kind of dig in so that you can do these like deeper, longer term improvements that then hopefully just can make the system better overall when you're able to fix them.
I really, really loved that additional, let's say, perspective on how to run a large scale recommender system and what could possibly go wrong and how you could prevent and learn from it.
The first time that I was listening to that talk, I was at the very first moment a bit confused because I was like, what does he mean by predicting cold start?
I mean, isn't an item cold starting just by definition that it's new?
And then after asking, I really understood, yeah, of course, I mean, that is an interesting perspective in predicting how fast that item will be consumed by users and get some, let's say traction versus not being used and then not getting enough signal or something like that.
So that was really an interesting new perspective that was kind of nicely illustrating the purpose of RecSysOps there.
When I think, again, I love working on machine learning problems. I always find it super interesting when someone takes some hard problem you have and then figures out like, actually, we can turn that into machine learning problems and do that to get better at it.
So it's like, oh, let's predict ahead when our algorithm is like, oh, yeah, that's cool.
I also remember learning about doing hyper parameter tuning using things like Gaussian processes and stuff.
Like, hey, you can learn these challenging things to kind of help with automating those. And there's lots of other applications like that.
I find it just like super interesting. You can solve some hard problem like that.
Bandits and recommender systems.
I mean, this is some part where Netflix is very active contributing regularly papers at RecSys and different conferences.
You are also the co-organizer of the corresponding workshop at RecSys, the reveal workshop that last year was held in conjunction with the consequences workshop.
Can you share with us what are the problems that you are trying to solve with bandits with reinforcement learning and how effective it is for you?
Sure. I'll clarify one thing. I was kind of brought in at the last minute to help out with reveal.
I'd kind of been involved kind of attending and presenting and sometimes giving some advice in the background in previous years.
But yeah, it was kind of pulled in at the last minute there.
Yeah, but Bandits has been a big area that we had done a lot of work on over the years within my team, trying to figure out how to get Bandits working in some various different personalization applications within Netflix.
So I think in doing that, it's interesting to kind of understand what you can learn from the Bandit literature.
But then also there's some differences when you're actually trying to take those approaches and then kind of put them into real world applications.
The first thing I kind of usually point out is that I think a lot of times when you think about something like say regret minimization in a single Bandit, that kind of assumes that that's the one algorithm that you're probably going to have there forever.
But if you're actually trying to innovate on Bandit algorithms, like the way it's exploring and how it's working, you might need to kind of adjust how that works so that you're actually collecting up enough data that you can actually improve upon it and add new data sources, new features that might have very different policies.
So you might need much more coverage in the kind of the data that you're collecting there.
And then dealing with things like new items coming in, different trends in how the rewards work across time and different approaches to exploration have different pros and cons, especially when you have problems where there's a very small number of arms versus if you have a very wide number of arms.
Or if the kind of the differences in the reward you might get from the different arms has a large variation versus if it's a small variation.
You try to over time kind of understand these different problems and then kind of build algorithms and also work on infrastructure to make it even easier to kind of stand up these new Bandits because the challenge with them is because they work in this like, they basically it's like a closed loop.
You need to have all of like the data logging and everything all really well hooked up in order to kind of just get things up and running.
So we tried to make that also easier across time and working with our partners in like the machine learning platform team and the data engineering team.
Yeah, we've done a lot of work on going back to the metrics part.
One of the nice things about Bandits is that being able to do off policy evaluation approaches is nice because you kind of collect real data and then you see how much does the new Bandit you're going to show that kind of are used.
How much does it match or whenever it matches what actually was shown in the real system, you can kind of say, oh, this is what really happened.
So if it's better at picking good things and you kind of is able to avoid the bad things, it can build a better correlation between what you're actually going to see in the live system.
So that framing is really nice and can kind of help you kind of hone in that your algorithm is trying to it's optimizing for what's actually seeing in the real world.
But then there's definitely a lot of challenges with it because you have a lot of data sparsity and variance issues and it's hard to make sure you're covering enough of this space, especially of a lot of arms that you can actually get a lot of good.
Matches there. We had a paper that we presented at Reveal a few years ago that was looking at Bandit problems and dealing with what happens when you have you're doing a recommendation problem.
We have a lot of different actions and the idea we had is basically just take an idea from like more classic evaluation of recommendation approaches where you can get very sparse feedback.
But the idea is like if you have a ranking of all the items and you can kind of get something that's good, like up higher on the list, you might not know the ones that are above it.
If the user would even evaluate if they're good or not, but if you get the things people like kind of towards the top and then the things that people don't like towards the bottom, that's a good thing.
And so we just kind of took that same concept to like a Bandit selection, which is like if you have a Bandit and it can get the arm that has a good reward, it kind of like maybe it's not would have picked number one.
But you don't know what would have happened there.
But if you can get it kind of higher up in the ranking for a good arm, that's a good sign.
If you can kind of push it down the ranking for a bad arm, that's a good sign.
And so that was this metric we called recap.
Yeah, so that kind of represents one of the areas of kind of metric improvement with Bandits that we've worked on.
But yeah, we're always trying to figure out how to kind of improve our Bandit models and where it makes sense to use them, where Bandits don't actually make sense to use because they introduce definitely a lot of new challenges and complexities.
Yeah, something that I find interesting is especially the coverage part that you are bringing up there, which is a challenge when using Bandits for recommendations.
And what I always remember there is someone saying that you appreciate having some certain intended or also unintended randomness in your system because it grants more overall coverage to the items you are having, which you can then, of course, use for several Bandit approaches.
So that, of course, intended, you are doing it with some certain exploration degree, but also unintended because you might have any faults or also some subtle problems in the system that grant better coverage across the catalogs that you're having.
Yeah, yeah, for sure. I think those, like you're kind of alluding to some of the realities of Bandits in the real world too, is that like sometimes you might have a certain selection, but then for some reason, that's not what actually gets shown or something happens.
And then how do you make sure that your propensities and all these kind of things you're using are actually correct?
That's like another really small thing that I've seen, which is like if those things are off like a little bit, like it can create huge problems in your metrics and what your models are optimizing for.
So getting those right is in the presence of like, you know, there's going to be certain system issues that could come up that your systems are kind of robust to that.
It's super important. I think one of the high level of lessons we learned with Bandits for exploration generally is, you know, if you have a closed loop system, like kind of have to have some ability to do some kind of exploration or else things will really not work.
But if you're able to train off of like an open loop, you might be able to instead you use other approaches to kind of help compensate for that.
So open loop would be like to make the homepage where if you can only play rows on the homepage, we need to understand them.
It's kind of like a closed system. We can only play from the rows that we would surface versus like if you're doing just a general movie and TV show recommendation, people can go and search for things that are off the homepage.
If you can learn from that, that can help you kind of understand like the broader space and combine them.
And I think that's where it also approaches like being able to use causal models and try to understand like some of the causality can be kind of ideas inspired from like econometrics can really help with kind of adjusting for those, which I know you also had a previous episode that was kind of talking about some of those too.
Yeah, so then we definitely have a call to action there to the listeners who are also Netflix members.
Please don't forget from time to time also to use the search function.
Search is good. Yeah, I mean, hopefully, you know, your homepage is great. We'll try to make it better.
You know, but yeah, it's good.
When it doesn't work, you know, search is there for you.
But don't deprive Justin from all the data or the signals that they use for band itself.
Yeah, we have been covering a lot of topics in terms of the approaches in terms of the technology.
However, what Netflix is also well known for is its culture.
And I mean, there's even a whole book about the Netflix culture that is called No Rules Rules.
And just recently, Reed Hastings has declared that he's stepping down as a CEO and moving to become executive chairman.
And he was actually the co-author of that book that stems also a lot from the culture deck.
When thinking about the Netflix culture and how you deal with each other, what is it that you find is the most exciting how to deal with others and how to collaborate together?
So what is it that you can share that it really excited you and where you learned a lot?
Sure. Yeah, I mean, being in Netflix for 11 years, I really enjoyed the culture.
I think it works really well, especially for the type of work we're doing in the recommendation space.
And one of the things is, you know, when I talk to people who come from more of a they've been studying, you know, machine learning recommendations in grad school.
And, you know, we talk about things like, you know, freedom and responsibility and providing people a lot of context.
It kind of it really resonates with people because I think if you're kind of have that kind of little bit of that researcher, you know, mindset of like, yeah, give me some like hard problems.
I'll really dive in and come up with solutions and, you know, kind of we can all kind of work together on solving things as a team.
It kind of like resonates with people.
But I think the thing that's amazing is that kind of mindset is like throughout the whole company, right? It's not kind of just in like in a research area where people are kind of open to new ideas and trying to work together to kind of improve Netflix and, you know, in being able to kind of share a lot of information so that people can kind of take on these really interesting big problems and then have a big impact with their work.
And I think what I found sometimes surprising to people and you kind of touched a little bit on this is that the way it works too is it can actually promotes like collaboration really well because we want to have high performance people working in really high performance teams.
And to have really high performance teams, you know, you need to get beyond just being a collection of individuals, but really being able to, you know, work well together and being able to work well together as things like being really comfortable with giving each other feedback, right?
Because there's always opportunities for everyone to improve and that really helps with people understanding, you know, where can we get better on a whole variety of different levels.
And, you know, it can be technical things, but also, you know, communication or other ways of how we can collaborate better and how can we plan better and organize things better.
You know, that feedback is such an important part of that.
There's so many elements of the culture and the values, you know, I think the selflessness is also like a such an important thing of being able to see the big picture, you know, and being able to put kind of, you know, the helping our members first, the company of you first versus like just, you know, looking at just, you know, kind of your own personal needs first, I think is also really helpful for, again, people working really well together, prioritizing the things that are the most important and kind of being adaptive and flexible with all of that.
Yeah, I mean, it also shapes, I guess, how you lead as a director.
Is there something that you can share in terms of that?
So your team that works on the personalization of the homepage is doing a lot of interesting, exciting work.
How do you basically shape that team or makes them be effective or what would you consider as being crucial to your style of leadership?
My team, it's a kind of a mixture of, you know, applied researchers are kind of doing this like end to end, coming up with ideas, you know, working really closely with product managers on, you know, where we can be improving your algorithms and the Netflix experience or kind of coming up with new models and things like that and kind of working through, you know, what to prioritize.
And then we try a lot of ideas and experiments offline with the data we have or sometimes need to figure out how to set new data and work with our data engineering partners on that.
And then when we see something promising, we kind of put them online, A.B. tests and partner with a team that kind of we provide the code and the models and they deal kind of with the actual surveying and all kinds of logic around on that.
So the team itself is a mixture of applied researchers and then some software engineers kind of focusing on the specific machine learning infrastructure that we need for doing like our page construction and other problems that we have.
So the team, you know, it's really try to have that like focus on kind of like understanding, you know, what are the problems we're going after, you know, how do we have an impact?
I try to get people really to think about, you know, the big picture and kind of understand, you know, the product and the members and kind of how what we're doing really fits in that and take a kind of a broad view of where to improve things.
And then I try to, you know, have like a very open innovation centric perspective of, you know, trying to get encouraging people to kind of come up with ideas.
And we try to do that both in terms of like when we're trying to think about the kind of the directed parts of the work we're doing and like what we want to like prioritize kind of being open to lots of different ideas from different people about what we could do to solve some of the problems you might want to focus on over the next six months or a year.
But then because we hire, you know, really smart people who get into the details of how all these algorithms and the data and everything in the systems all work, we then encourage them to also do some kind of exploratory projects to kind of understand like what could be that next big thing that could really unlock kind of a step change.
And we're doing the algorithms and we've seen some of our bigger algorithm improvements come from that exploration as well. I think of what we're doing is kind of this mixture of focusing in kind of on like what we think are some of like the important areas, you know, for the business to improve and innovate and then also spend some time kind of trying out new ideas and kind of exploring and kind of building out those concepts.
So they might become the next thing that we could see that big advance for. So it's in some ways, it's kind of like bandits, you know, you kind of have the explorer arm and then you were trying to just optimize and you try to do that.
I find actually one of the hardest things to do is actually sometimes get people to do that because they just love doing the main thing that they're working on so much, which I think is good.
And I think that's something I try to do is really try to match up like people's skills and interests and experience with like the high priority problems so that you hopefully find that really good match there.
You're kind of like I was saying before, like, you know, we collaborate kind of my team focuses on the page parts. And like we talked about some of the, you know, kind of evidence pieces and, you know, that layer as well.
I have like a sub team focused on that. And then we partner. There's there's another team that we partner with that does like a lot of the ranking algorithms that drive you kind of the initial ordering of the rows before we kind of decide which ones we want to go in select.
And there's a team that does search. There's a team that does, you know, kind of messaging and outreach. And there's there's other machine learning teams, you know, throughout Netflix to that we partner with.
A lot of it is, you know, being really good collaborators, you know, working across these different areas on your projects big and small in terms of scope.
Yeah, I really like that analogy that you're bringing up with exploration and exploitation. I had to think about actually that term from your culture deck. So freedom and responsibility.
So basically the responsibility for products that are in production or that are to be brought into production and also that end to end ownership there as some kind of a notion of responsibility.
And the other one, the freedom to, let's say, entertain new, you called it the next big thing, ideas and explore basically what could significantly change something or also to have the time and the freedom to elaborate on new ideas.
And for example, to to come up with a POC there.
Yeah, and I think again, that's exactly right. I think that's where it's one way of like we try to kind of take that culture and put it into action there. And I think it's like one of the things that works really well.
I think one other thing I would mention too is that we also try to bring in a lot of different perspectives in the team. So when we're hiring, we hire a lot of people with machine learning background.
Obviously, having experienced a recommendation can be quite valuable, but we also will hire people who they might not have worked in recommendations in the past, but they have a really solid understanding of machine learning fundamentals and some other areas.
We've seen that by bringing in a bunch of diverse perspectives on a lot of different levels, it kind of helps build the teams that were able to kind of learn from each other and kind of solve problems together.
And so one thing about working out a place like Netflix is sometimes it can feel like talking to people like, oh, it's Netflix. Can I work there? I don't know if I want to even apply.
We're hiring and we're always looking for really great people. And we're kind of open to people with like, we like people with a lot of different backgrounds and bring them to the team.
And then making sure that everyone can kind of participate in the discussions and share their ideas and surface those concepts so that we can kind of really be thinking about what the problems are trying to solve in new ways and kind of continuing to improve all the algorithms and systems that we're working on.
Sounds definitely like a great, enriching environment with lots of freedom, but also, no, there is no but I mean, responsibility can also be effective.
There is a lot of responsibility. Responsibility is important. So just being able to keep in mind like that, you know, the high level of, you know, that you're working on problems that are important for the business and you know, where we're going and in that part.
Making an impact, I guess, is for many people very motivating. So yeah.
Yeah. Okay. I mean, in terms of making an impact and solving problems with smart people, what further challenges do you see for the future or something that you are going to engage with?
I mean, we covered quite a lot already, but is there something very specific that you are dealing with with some major challenge for Netflix in terms of personalization, but maybe also for the whole field?
Yeah, I mean, I put this in one of the talks because sometimes people are like, Oh, is there still that much to do in personalization? It's like personalization is still super hard to like, you know, try to understand like, what is that great recommendation to be showing someone?
Like, how do you kind of respond there? So I think there's still a lot of challenges, you know, in kind of all of the different elements of, you know, that we've talked about.
You know, there's so many improvements to work on. I think about the problem of just making sure that, you know, we're optimizing for what people really want in the long term, right?
As being something that's still like a piece as a field. And that kind of goes both into the kind of how you define, you know, what is the objective of the system and improving that.
And like, how do you actually kind of represent that and understand that and build a better understanding of like what, you know, people really want and how to help them get the most out of the recommendations that you can provide and the best experience out of whatever product you're building.
But then also, you know, how do you actually build the models that can actually optimize towards that? So I still think that's like, you know, such an important aspect. And it leads to lots of very wide ranging discussions of a lot of different aspects of your recommendations and how do you solve these problems and how do you balance again, all the different factors that come into making a good recommendation.
So I still think that's, you know, such an important area. And then, you know, I think there's always lots of just exciting evolutions of different techniques and things, you know, approaches in the field.
And so I think we're always trying to just keep pushing ahead on those problems and just get better at it like one step at a time, learning from all the data, all the experiments we run and keep pushing on that.
Okay, and then maybe part of what comes out of that push will or can be shown and seen at this year's RecSys, which this time will be in Singapore. So definitely looking forward. And I guess we can also count on you to be there or is this already been decided or can you already make a statement there?
So yeah, I'm hoping to be there. Last year was the industry co-chair. I'm an industry co-chair again this year. So hopefully we'll have the call for industry talks up soon. I think that's one of the great things about the RecSys conference is that, you know, there's opportunity for people working in industry and all these problems to really surface all the interesting work they're working on, you know, kind of in addition to all of the kind of research papers and stuff.
So if you have some interesting work to share there, please submit it. I guess if I could make a call out to your listeners, I'm sure there's a lot of them who might be interested in that. And yeah, I think it's a great conference. And it's always great meeting people there.
It's a great community. And I think there's just a lot that, you know, everyone can kind of learn from each other in terms of solving these really hard problems of trying to understand what is it people really like and want. I think about recommendations is trying to like help, you know, people's lives just be a bit better, you know, day after day, helping them spend their time in useful ways on things that they really enjoy.
Yeah, so if people want to submit to the industry talk and conference, then Justin is the person to go.
It's not just me. There's also, yeah, we have Luis and Yang as co-chairs.
Wow. Okay, that was a great tour through personalization at Netflix. And I guess there is so much more we could talk about. But therefore, we also do have the show notes where we relate to the Netflix research blog, but also to the papers that we explicitly mentioned in this episode.
Maybe as always, and just given the time that we have already spent so far, I will constrain it to a single question. Quick and shortly, do you have a guest recommendation?
Oh, gosh, I know so many people I feel bad even like, like, one, maybe we could take that offline.
Now you have to rank.
No, I can't. There's so many great people I've worked with at Netflix and now in the community, I don't want to think a lot of single.
Okay, okay, there are too many. So that means, for the sake of fairness, we will this time have to leave without a guest recommendation.
Yeah, Justin, it was really great to having you on the show. And thank you so much for sharing so much of the great work that you are doing and giving us some insights there some pointers, but also a better understanding of the challenges that you're dealing with and giving us a glimpse into, yeah, how people actually work together and work together effectively at Netflix.
So, it was a great learning experience having you.
Oh yeah, thanks so much for having me. It's great chatting with you too. Yeah, then I wish you all the best and enjoy your weekend.
Yeah, you too. So, great. Thank you and see you. Take care. Bye. Bye.
Thank you so much for listening to this episode of RECSPERTS, recommender systems experts, the podcast that brings you the experts in recommender systems.
If you enjoy this podcast, please subscribe to it on your favorite podcast player, and please share it with anybody you think might benefit from it.
Please also leave a review on PodChaser. And last but not least, if you have questions, a recommendation for an interesting expert you want to have in my show, or any other suggestions, drop me a message on Twitter, or send me an email to Marcel at RECSPERTS.com.
Thank you again for listening and sharing, and make sure not to miss the next episode, because people who listen to this also listen to the next episode. See you, goodbye.
Thanks for watching.