Recommender Systems are the most challenging, powerful and ubiquitous area of machine learning and artificial intelligence. This podcast hosts the experts in recommender systems research and application. From understanding what users really want to driving large-scale content discovery - from delivering personalized online experiences to catering to multi-stakeholder goals. Guests from industry and academia share how they tackle these and many more challenges. With Recsperts coming from universities all around the globe or from various industries like streaming, ecommerce, news, or social media, this podcast provides depth and insights. We go far beyond your 101 on RecSys and the shallowness of another matrix factorization based rating prediction blogpost! The motto is: be relevant or become irrelevant!
Expect a brand-new interview each month and follow Recsperts on your favorite podcast player.
Note: This transcript has been generated automatically using OpenAI's whisper and may contain inaccuracies or errors. We recommend listening to the audio for a better understanding of the content. Please feel free to reach out if you spot any corrections that need to be made. Thank you for your understanding.
Over 60% of global time spent on the app is spent watching videos.
So videos is already a majority use case for Facebook.
User interaction data, first of all, is observation, which items that users interact with depends on both which items are presented to them and then what items they choose to interact with out of them.
This understanding of what the content is about, what the video is really about helps much more than once it starts getting engagement.
Once it starts getting engagement, you can represent the video as some aggregation of users who have engaged with it.
Because of this large scale engagement, the engagement attributes really takes over and defines what the video is about.
Cross-domain plays a big role actually.
So it really helps us build a very rich and very comprehensive user profile, understand what the users are really interested in.
In general, what we have learned is no single prediction is a good measure of whether the post is valuable or not.
We need a combination of this to optimize for user retention.
Not all short-term objectives directly correlate with the attention.
You probably need something additional on top of it to really directly optimize for our business objective.
You cannot go forward and just optimize for one metric.
You really need to understand engagement, satisfaction, long-term value and connect this.
So we need people who can think deeply of what are the problems we could face and how to solve them.
Hello and welcome to this new episode of RECSPERTS, a recommender systems expert.
And for this episode, I have invited an industrial expert from the field of video recommendations.
We are talking about video recommendations at Meta, more particularly within Facebook.
It's my great honor to have Amey Dharwadker on board.
Hello and welcome to the show.
Thanks Marcel for inviting me and thanks everyone for tuning in.
Hope you'll have a great chat today.
Yeah, I hope so as well.
You and Video RecSys are the main topic for today's episode.
And before handing over to you, as always, I will provide a brief summary who is my guest for today, if you haven't heard about him yet or if you don't know him yet already.
So Amey Dharwadker is an ML engineering manager at Facebook.
He is leading the Facebook video recommendations quality ranking team.
Prior to this, he has been working on feed ranking and ads ranking models and he is with Meta since 2015.
So almost 10 years.
So do I hear the anniversary from the future?
Yes, I'll be celebrating my 10th anniversary in January in coming January.
Okay, great.
So yeah, looking forward to that one.
So 10 years at MITTA, that's quite a long time.
So you must have seen a lot that's going on there.
My guest has obtained a Master of Science in electrical engineering from Columbia University and has previously also worked in computer vision for university, but also in advanced driver assistance systems at analog devices.
So brings plenty of experience to the table when it comes to computer vision, which I guess is a great deal when it comes to video recommender systems and where you could apply or leverage that knowledge and skills.
And Amey has also been quite actively involved with the community.
There are works at ICML or Dub Dub Dub that he published together with his colleagues.
He was also one of the co-organizers of the first workshop on large scale video recommender systems that took place at recommender systems conference last year in 2023.
And which we are also excited about taking place again this year at RecSys 2024 in Bari.
And with that, I will hand over to you Amey.
So first of all, again, thanks for being my guest today.
I'm very excited about what we are going to discuss and talk about.
And can you introduce yourself to our users and tell them some more about yourself and your work?
For sure.
Thank you for the kind introduction, Marcel.
So as you mentioned, I'm Amey.
I work currently as a machine learning engineering manager, leading the Facebook video recommendations quality ranking team.
My team works on improving the quality and integrity of Facebook video recommendations for our billions of users worldwide every day.
I joined in 2015 on the newsfeed ranking team.
I played a big part in basically migrating a newsfeed ranking from a place where it was a connected surface where you saw only like posts or stories from your friends and pages or creators that you are connected to, to be a combination today of both connected as well as unconnected recommendation surface.
Then I spent a couple of years on ads ranking, working mainly on conversion modeling on long attribution window, through conversion ads.
And since the last four plus years, I have been on video recommendations.
Over the time, I started first on the core ranking team where I led basically our work on cutting edge modeling algorithms, maybe using deep neural modeling as well as multi task learning.
Since then, a few years back, I transitioned with the quality ranking team and have been leading it since then.
As you mentioned, I'm also active in Lexus community and really excited for the upcoming video Lexus workshop that we are organizing again this year.
So, last year's workshop was the first of that kind and this year is going to be the second installment.
As we can already see from the conference schedule and from the workshops website is that it's slightly different.
So we do have the video Lexus workshop and the large scale workshop.
So what will it be like?
Can you maybe do some advertising for the workshop?
Why should people attend that one?
Of course.
Yeah.
So last year it was a standard on half day workshop.
This year it's going to be a full day workshop along with larger exists.
You know, videos, I think as a use case has been along all the fields, right?
So you have seen it in education, entertainment, even e-commerce.
I even see like LinkedIn now has recommended videos, units there.
So even in professional media.
So it's a very ubiquitous use case.
Our goal was to create a platform where academics, industry professionals can come, researchers can come and also learn from the leading industrial experts on the trends, challenges and you know what they have done and how they have built very large scale video recommendation systems for the billions of users worldwide.
Last year we had speakers from like Instagram, YouTube, Google deep mind, Netflix, Korsho discuss variety of their words in different topics on video exists.
And this year also, you know, we plan to have multiple speakers there.
In general, we also got a lot of interest.
So I feel like there is a lot of interest in this topic.
We have our YouTube channel, but we have shared videos that we have recorded.
We see a lot of interest, a lot of organic views there and so on.
So hopefully people can attend, learn from the best, learn from people who have experience building these large scale systems.
For the listeners and for everybody who's going to attend Rexes, it actually will be the very first day of Rexes.
So Monday, October 14th, if you want to attend that one, then this is the very first day of the whole conference where you have the chance to learn more about video and large scale Rexes.
Great.
Yeah, so much about the very relevant advertising, especially for those listeners, which I guess there are many of who are planning to go to Rexes or attending virtually.
But for today, our main topic is video recommendations.
I wasn't even that sure at Facebook and Instagram where actually.
So I have learned that you are leading the Facebook video recommendations quality ranking team.
So when I learned that the first thing that I was doing was kind of asking myself videos in Facebook.
And to be honest, I haven't been the most frequent user of Facebook lately.
So I guess when it comes to interacting with services by matter, then I'm more like on the WhatsApp and Instagram site, but not that much anymore on the Facebook side.
But as I have learned from one of your papers, there are like 1.25 billion monthly viewers who are interacting with video recommendations on Facebook.
And therefore, let's jump right into and first lay out the landscape video recommendations on Facebook.
What's out there?
So what are the use cases and is it even and if so, how is it connected with video recommendations?
So in general, it can start by just giving you a sense of scale.
So Facebook is one of the largest online social media platform just by number of people and also like time spent on the platform.
In a recent earnings report, we put out the number there of 3 billion plus monthly active people on the platform.
Corporate size, like tens of billions of public videos that are eligible for showing up as recommendations to users.
And right now, over 60 percent of global time spent on the app is spent watching videos.
So videos is already a majority use case for Facebook.
The time spent on native videos daily is growing at a fast pace and majority of that is coming from improvements we make to our ranking algorithm.
In terms of use cases of video recommendations, there are multiple surfaces that we show video recommendations on.
First is like recommendations in use feed.
So use feed has both connected content like content from friends, creators, pages that you are already connected to and also unconnected recommendations, which are simple recommendations that you may not be connected to the creator directly.
We also have in feed units in use feed, which are like horizontal scrolling units, which contain many videos.
The idea there is to help the user discover relevant content and go deeper into that and then keep exploring.
We also have something like YouTube's Up Next, which is basically when you click a video, you get more related videos.
You can scroll through more related videos.
So that is we call it video channels.
And finally, a unified video tab, which we're referring to from our paper.
Now that tab has both long form videos as well as the links, which is our short form video content.
So that is a unified tab, which is the primary video destination.
So that only has videos.
Users can go there directly and just to consider.
By the time that you are saying this, went to the watch tab.
Yeah, maybe just to, of course, our listeners won't see that, but I guess it's easy for them to reproduce.
So just to better understand which things you are talking about.
So for the listeners who can't see this, this is still an audio only podcast, but I'm just sharing my Facebook screen with Amay.
And first thing that I'm seeing right now is like the start page.
So is this something that you refer to as kind of the newsfeed or what would that be?
No, so this is the watch, like in a home tab for watch feed.
And then we have sections there like live and shows and so on.
Okay, so there's there's live to the left, which is stuff that is like, for example, you see some gamer sharing video gaming stuff here right now or live gaming seems like a important or interesting thing.
Or maybe you have find out that this is something that interests me.
And then I see like reels that you said as your short video content, like I guess also something that I would associate Instagram the most with just as an end user.
And you also said shows.
What was shows about?
So shows was, you know, before when we started watch, it was more of like professional, you know, video content, you can think of it like an analog as to some like, you know, shows on YouTube or Netflix series kind of thing, which are professionally made.
So that is how watch started before it became a platform for all publicly available videos on Facebook.
Okay, okay.
So you see a couple of channels here and then you can basically tune in those channels and see more content from that very channel.
Yes.
Okay.
Okay.
And I guess the last one is the one that you mentioned first before and which is like, you just see and taken but it's German for for Discover, which then shows you basically that landscape of different interests that you could then get more from.
Yeah, so this was like, you know, interest feed, I think right now.
So right now we are working on actually unifying it into a single everything into a single surface that single surface would have both short form content that is reads as well as long form content.
So that is what we refer to as unified tab.
Right now it's rolled out very recently, more vital.
So hopefully we'll be getting it soon.
Okay, okay.
So as you say, this I've just entered home and garden.
So this is basically then or in the future will show that short form and long form content on that tab, for example.
Yes.
Okay, cool.
So definitely helps me and better understanding the let's say different use cases for Facebook watch or Facebook video recommendations.
When you say short form and long form content, could you elaborate a bit more on what sets those two apart and how important they are compared to each other if you can?
So traditionally the platform or you know, most of the videos were like movie style or on YouTube, if you see like most of the longer form videos were more horizontally, you know, captured and so on.
So that was prevalent form when Facebook evolved over the years.
Recently, we are seeing more and more content being created in the vertical format and in short form, it's good in the aspect like you can consume information, you know, entertainment content really fast.
It's also easy to produce, it doesn't need to be like professionally captured.
So right now we are seeing that almost everyone is a creator, right?
Everyone in the phone in the hand is a creator.
So we are seeing this form of content being, you know, rising a lot in terms of both creation as well as consumption.
And we feel like, you know, this is where you see like all the other platforms to are now having a mix of short and long form content.
And I think this trend will grow over time.
So we want to be a surface which get us to all needs.
And you know, for example, our ranking is trying to rank all these videos together to find out actually what matters to users, what they are interested in and serve content irrespective of video length together.
Okay, I see.
So you mentioned your team, it is the video recommendation ranking quality team.
Does this also imply that there exist multiple teams who are responsible for modeling ranking algorithms for that specific kind of content?
How is that structure inside Facebook?
And how is it, let's say, at matter?
So taking also like teams and folks together, which are on the Instagram side.
So how is the collaboration there?
So maybe turning this into a part into two things.
How is it structured at Facebook?
Who is working on what?
And how is the collaboration with the corresponding teams if there are, I guess there would be on the Instagram side?
So I think mostly broadly, if you think of Facebook, the org is divided into core ranking pieces, which is just building better models, features to serve the relevance use case neatly.
And then the quality ranking team is trying to ensure that the quality of videos, integrity of videos is meted.
So I can give you some examples of the types of problems you're solving.
For example, we improve the detection and enforcement of suggestive content or engagement bait content.
If a creator is trying to lure users to watch the full video, they may say like a watch till the end to see what happens.
In that case, watch time increases because users do keep watching till the end.
So your traditional ranking models will have a very high prediction for watch time or watch related events.
And so next time that could be enforced in the ranking because our training data sees that and you could end up with a very bitty kind of feed for user who just was maybe baited into watching one video for a long time.
So we try to ensure that this doesn't happen.
The goal of our team is to ensure that users' time is very well spent on meaningful and satisfying video content.
So we try to balance relevance with satisfaction, which is captured through different signals on the platform.
For example, we have see more, see less signals that users, it's a user control where user can say that I want to see more of this type of content or I want to see less of this type of content.
We also do, for example, deeper engagement signal learning.
You have some content which does well only on one event.
We don't want that to dominate your feed.
We want content that does well on multiple ranking events.
For example, watch rate, you will have a very high watch time prediction, but people may not like the content or they may comment on it in a negative manner.
So you want to capture all those signals together while considering ranking.
So those are some of the types of work that our team does.
So and you said the core team is responsible for building features.
Are they also responsible to already build models on top of those features or is the whole modeling part already within your team?
So everything is split.
So all the teams are working on features, just the problems that we are tackling are different.
So the way we are tackling it is similar.
So all the teams are building ranking models or improving the ranking models.
All the teams are working on like incorporating their signals into features and so on.
So everything is shared here.
It's just that we're tackling different problems.
And obviously, there is a lot of collaboration here because all these need to go into the single ranking system that is serving all our users.
So it's just two separate parts of org with different focus, but they work very closely with each other.
Okay, I see.
That, or at least for myself, also explains a bit better some questions that I had already with regards to a paper that we will come to later.
A paper where you and your team elaborate on how you perform personalized interest exploration for your recommender, which is basically a system you build that works on top of an existing recommender.
And there was just like, okay, why isn't it embedded?
But then it starts to make sense for this kind of arrangement or way of working.
Your team might be incentivized by working on the quality of the recommended content, but maybe for another team, the main thing is like relevant.
So are you sharing basically also the same objectives so that you don't run into compensating for different optimization goals of the other team or how do you manage and align that?
That's a great question.
I think it's both around like what objectives we're doing on, what metrics we care about, and also like how we are doing the work.
So for example, on the objectives, we actually have a multi-objective system, we kind of predict different engagement objectives and combine them to get the final relevance score.
But obviously, as you said, sometimes the objectives could be conflicting.
And in this case, we think of it more like a constraint optimization problem where we're trying to maximize engagement given like quality, diversity, guard rates, and so on.
So the metrics part of it is taking care of like how we are optimizing because everyone is then trying to improve the same set of metrics.
In terms of work also, like as I said, a lot of work is shared.
We also see that quality improvements and some of these changes sometimes they do have short term engagement loss, but we do see like in the long term, most of these things are very incremental and providing Facebook additional value.
So once users quality of feed of users improves, they tend to come over to the platform.
Our models enforce this.
So we know if you're consuming high quality content, next time you will get more high quality content and so on.
I see.
Maybe let's stay on that level of collaboration and look a bit more to another platform that META operates with, which is Instagram.
So how does the collaboration with the corresponding Instagram teams work?
Because I mean, as I already said in the intro, I'm more confident with Instagram or a more frequent user of Instagram myself.
So for me, I'm much more familiar with how it works there and how I feel perceived as a user.
How much exchange is there and is it the same model that are also used there or do you use models by them or how do you arrange this?
First of all, on the basic level, there is a lot of content liquidity between Facebook and Instagram.
So we allow creators to post their content once and they can enable cross posting on the other platform.
And then this happens a lot because it just gives the creators a broader reach.
So most of the content is eligible to be ranked and shown across our platforms.
In terms of the technical work, a lot of the infrastructure is shared on how the models are trained, how they are serving and so on is shared between Facebook and Instagram.
And the actual models are different right now.
And the reason for that is that our user behaviors or the type of users are also very different on both platforms in terms of demographics or user behaviors, user attributes and so on.
So right now we have separate models on these platforms, but we do a lot of sharing of knowledge.
And for example, one modeling architecture, if it works on one team, we try to incorporate it on the other on the other surface and so on.
We also have regular content reviews to understand what content is doing well on one platform versus the other, what content is popular at one place versus the other and so on.
This helps us really understand reasons or any gaps, for example, in our models, if it is just beyond, is it gaps in the models that are causing this or is it just because the audience is different in this platform?
So we try to understand that a lot.
Okay, I see.
I guess one thing that you quickly mentioned at the very beginning, but I just want to take a step back there and understand it.
So how large is actually the video corpus again and how much does it grow?
So the video corpus, we have tens of billions of public videos available that are eligible for showing to the users at any point of time.
And this grows by a few million every month.
So obviously, like, you know, some older videos could become less relevant and not get distribution. Some videos are evergreen and have distribution for a long period of time and so on. So we do see this varying a lot, but that's the skill.
And yeah, since this already poses, I guess, one fundamental challenge, which I guess is the freshness and the cold start of your items.
And maybe we can continue with with those challenges and from there move a bit more into talking about the systems and their corresponding models.
So for video recommendation, which are the main challenges that you address with your team?
I think in terms of ranking, like there are multiple challenges.
I think the first one I would say is, as I said, we rank this different length videos together.
So how do you actually, you know, what type of prediction events do you use for ranking both long and short form videos together is a challenging problem.
For example, if you go rank videos by watch time, you would show more longer videos.
If you rank more based on what percentage, like what percentage of video was watched, you would show more shorter videos.
So now we have come up with multiple derived events on this to debias this video watch time event.
For example, one of the common ways to do this is you can bucketize videos based on their length. And then for each user video interaction, you first figure out which bucket, which length bucket that video falls in.
And then you compare that engagement of that user with that particular video against what is the average and standard deviation of, you know, engagement of all videos across all users in that bucket. So you're trying to figure out what's the incremental value this video provided to this particular user over all similar length videos, you know, across all users.
There are recently many papers on this because many platforms are facing similar issues.
I know, for example, one paper on watch time gain by I don't remember exactly who there was also a shared check paper, it exists 2023, the Larry workshop that address some similar type of problems.
We will find it out and add it to the show notes as well.
For sure. And then there are various bias issues that are common across, you know, all recommender systems. So user interaction data, first of all, is observation, which items that users interact with depends on both which items are presented to them, and then what items they choose to interact with out of them.
And also like popular items and active users have, you know, much larger impact on model training. Feedback loop causes this to be worse over time because exposure shapes user behavior.
And then this is fed back into training data. So there is this rich getting richer sort of effect here. So there are various types of bias that we, you know, deal with, like position bias, popularity bias, conformity bias, and how to overcome this is I think a big challenge.
All right. All right. I hear there are definitely different biases. You are working on fighting watch bait, if I'm correct, the freshness of videos, is there some more details or things that you could share how you are alleviating this cold start problem of your items? Because I guess it was some time ago where I read about this, and I guess it was a paper by by Netflix, if I'm not mistaken, where they were talking about predicting the cold start of new titles on Netflix. And there was first like, yeah, how do you mean predict cold start? I mean, it's either cold start or it's not.
But once I read it and watched the talk about it, then I found like it's more about to predict which titles are going to suffer more or less from the item cold start problem. So and then I actually found the error, of course. So every time you introduce new content to your platform, this content is very fresh. And as you already mentioned, there are millions of items coming to the platform, I guess it was every day being added to your corpus. And if you don't make sure they become candidates and finally also appear in rankings, then they are not consumed, which is bad. So how do you alleviate this cold start problem? Actually, is there something that you can share about this?
I think Facebook is in a unique position, because, you know, a lot of the creators and stuff that they post content, they first start to get a lot of connected distribution. So from connected distribution, you can think of it as like a prior to unconnected distribution. So we can first understand how that content does on people who follow you in, you know, in the that content will appear in their newsfeed. So if they engage with it, are they engaging with it or not? And so on.
Other aspects of it are, you know, we have exploration system that we have presented and that you mentioned about that helps in this item cold start problem. Yeah, I think the exploration is, you know, one of the things where we handle the cold start.
Makes definitely perfect sense that if I'm following certain creators also with the intensity, so how long have I been following them or do I share? Do I like do I save their content? So actually, how much positive signal do I provide for that content and this basically could then somewhat elicit the strengths of preference I have for a certain creator and then of course, you make the assumption that the things that the creator created in the past accounting for building that strong preference for them will also be somewhat similar to what the creator is going to create in the future so that this then also by that relationship becomes relevant for me and gets displayed. What about the actual content of videos? I mean, you have some audio, you have some video signals, you might have some text, you have a title, you might have some categories attached to it by the creator, all that metadata that comes with the video. I guess this is also definitely something that you that you can use to to see whether that matches the user's interest. We definitely use that content understanding is a big piece there. And especially for newer videos where we don't have a lot of engagement understanding, we find that like content level attributes and you know topics for example, helps a lot. Just based on our total understanding what we found out was when the video is cold start that is it doesn't have a lot of engagement, this understanding of you know what the content is about what the video is really about helps much more than once it starts getting engagement. Once it starts getting engagement, you can represent the video as some you know aggregation of users who have engaged with it because of this large scale engagement. The engagement attributes really takes over and defines what the video is about. So the video could be about one particular topic, but there might be a very different set of audience that is interested in that video. And so the engagement data really helps us expand that reach. And that's why we see that that data becomes much more important than the content understanding once the video gets engagement. Yeah, I guess this definitely aligns well with what Richard Mirotra back then has also shared from his experience at shared chat that behavioral data is very soonish outperforming content understanding once you see which users are going to interact with it. But content understanding definitely helps you with alleviating the cold start to a certain degree. That's always a good thing to remember and maybe which also holds true in I would say other scenarios and then builds the power of joining forces of content understanding along with understanding user feedback correctly or interpreting it correctly. Let's go a bit into the direction of actual systems and models. So we do have a video recommendation system in place that interconnects different models which have different objectives. Can you maybe start with a bird's eye view on how these models and the systems they are embedded into look like and work and from there maybe we could start diving a bit more into things that you already mentioned like objectives but also user feedback and features.
For sure, yeah. So we start with for example in the whole stack of ranking we start with first step which is candidate generation. So as I said we have tens of billions of videos we cannot rank all those videos for every user every time. So we have something we call candidate generators which are a mixture of heuristics and models which try to narrow this down into a few thousands of videos and then these go through the ranking stack per user for every query they have. In the ranking stack once you have candidate generators which produce thousands of videos you have the first stage of ranking and then we have mostly two towers past neural networks which are very useful because they you know have very fast inference properties you can cache the object side embeddings and you know index them in a nearest neighbor service. When the user comes in you generate the user embeddings on the fly and you know our objective is like to predict engagement events as a thought product between this user embeddings and the item embeddings. After training we then serve like whatever items are closest to the user embedding is passed to the next stage that helps narrow down the items ranked for the final ranking model which is much more complex.
Then we have a multitask DLRM model which is a deep learning recommendation model that is open source for meta. We use categorical features and build embeddings out of them to you know learn along with continuous features that are processed in a bottom layer MLP.
Then there are feature interactions that happen explicitly and are passed to a top MLP which then goes to sigmoid layer to get final predictions of different events and finally once you have the predictions of different events we have a weighted combination the weights are mostly optimized through Bayesian optimization to you know come up with a fixed set of weights for all these events that are combined to get the relevant score per video. So that's how like you know the whole ranking stack of videos. That's already a great level of detail I think where some people could go and build something similar or I mean like two tower models have been well adopted across the industry so every here and there you see two tower models because they have just proven so useful and as you already said it's nice how you can disentangle the item embedding from the user embedding process. One thing that has also just crossing my mind when working on these is actually the part about creating the user embedding from the user profile or basically embedding the user profile on the fly. What are kind of the main let's say reasons for that something that I could think about you want to do real-time personalization or let's say near real-time personalization so you really want to take into account what the user has just done right before as a short term signal along with more like longer term signals. Is this the main reason why you created on the fly or what are other reasons you are doing this? That's one of the main reasons like in the real-time personalization we can use real-time slash near real-time features that enables us to understand the context very well. For example in the world where you know you have a lot of short video accommodations users scroll through very fast you need to understand at that particular time what the user is really interested in what mood they are in what videos they want to consume. So this really helps us in that aspect to understand the user mood their dynamic preferences that are evolving for example a user spends a little bit more time on tennis video doesn't really mean their interest in sports it could mean they're interested in tennis only. So just their behaviors of engagement of what they are spending time on how they are scrolling through those behaviors can really be captured at these near real-time features and that proves to be very useful in this case.
Okay I see makes sense and I mean in terms of the infrastructure and the engineering side of the whole story I guess it's also a big challenge to constantly embed new features because you might have some kind of an item tower that you use to embed your items so the videos into video embeddings that you can then use for approximate nearest neighbor search. Yeah how do you solve this challenge is it just like building huge indie sales and updating them almost every couple of minutes or how do you deal with that challenge of making sure that also the newest created content gets embedded fast and appears in the index to be found during search? Yeah I think that sounds about right like we have video only features so a lot of you know features even new videos like you know have some type of like content understanding features our features are mix of attribute feature of the video content understanding engagement and so on so we believe that you know even though engagement features may be missing or less you know established if they have lesser engagement if they are new but the other features should be able to represent them and as they get more and more engagement you know the indexes keep updating and that will help them get more distribution over them. Thinking about the different heuristics that you use for candidate generation where of course approximate nearest neighbor search might also come in handy some also more specific thing I'm thinking about there is there exist so many techniques actual algorithms for performing anns like I mean very early there was spotify's annoy there's hnsw there's fire the fire library by by ufox but all of them what they have in common is that they basically do vector similarity search how actually do you also take into account semantic information there because sometimes you want to make sure that you have let's say at least a certain amount of maybe videos from a certain topic or also you have some continuous or other categorical features that you want to take into account along with the search how do you solve for that because if I would just perform like plain vector similarity search then I could possibly not guarantee that I get the final distributions that I need to is this a problem for you folks or would you like say okay this is nothing that we are concerned about or how do you deal with this again I think there you know the the thing we do is like incorporate some business rules to ensure that for example that there is topic diversity there is creator level diversity and we have something we call a value model which optimizes for this where we have you know some additional business rules on top of the ranking so you take the ranked list and then you again sort of re-rank it to ensure that you know you have diversity and so on we also have similar things at the later stage ranking we had a paper actually on in web conf 2023 on titled kbr which is context aware video recommendations that had a similar concept of how do you how do we do slotting of videos based on context features of all the videos on top of it so it can really slots videos and ensures like you know that it's some sort of diversity we use similar concepts in the early stage okay what I'm getting from this but please correct me is vector similarity search is the right way to go and it helps but you're just I would assume make sure by sampling enough candidates that with re-ranking you could then guarantee a certain diversity or whichever related objectives that you want to take into account yeah that's good yeah already talking about objectives and also the signals that you have I mean sometimes they are actually the same so you want to predict clicks or you want to predict watches can you provide some some some overview about first going backwards from the actual business goals and how this actually translates back into offline ml metrics or losses how well they are aligned how do you translate this back into that chain so making sure that what you optimize for is also directionally and hopefully also with the sensitivity aligned with the final business goal yeah I think this this is a billion dollar question so our our work here is to enhance a user experience and engagement by delivering them relevant content that they might be interested at a very high level we also believe our you know mission is to help users discover new content or new interest that they probably even did not know that they have and then the goal of all this work in the longer term if you see what exactly is the business how this connects to the business goal is how do we increase the retention of the user so that they come back more to consume such content that they enjoy or learn more from in turn we show them more ads get more engagement on ads and that satisfies our business side goals so that is the entire life cycle of how we think so from our perspective our goal is to improve the retention of users now when we build ranking signals or you know the events we have common events explicit engagement events like likes common shares saves we also see for example from a video users can you know visit the profile of the creator and watch more videos downstream so that shows like deeper engagement in terms of the user we also have different derived events as I went through because none of the you know like naive watch time events satisfies the use case or you know it's just biased due to different things so we have a lot of these derived events in general what we have learned is no single prediction is a good measure of whether the post is valuable or not and we need a combination of this to optimize for user retention in general like this is the approach that we have taken we try to see how to wait these events appropriately to you know improve retention of the user recently we have also explored some words on modeling retention directly again retention modeling is a difficult problem because the reason for user retaining is not clear sometimes you really like the content but you may just not have enough time the next day that's why you did not visit so it's noisy in that sense also we have like hundreds of video engagements per day I mean each user can have hundreds of video engagements per day but if you're doing for example daily retention you'll just have one signal whether they retain or not so it's also sparse in that aspect so our goal is to find some sort of like aha videos which you enjoyed and that you know caused you to come back so a very naive way of doing this is attribute all the videos that you had some sort of deeper engagement with a signal of whether you are retaining or not and train with a like a contrast step loss on these to then like increase the gap between very high attention videos versus very low attention videos to model retention so these naive approaches we tried a couple of years back seem to work well and over time we have improved on this and we still it's a very active topic of work but this all stems from the reason that you know not all short-term objectives directly correlate with attention you probably need something additional on top of it to really directly optimize for our business objective one problem is also if you then talk to product managers business stakeholders everybody would directly not if they said is retention something you want us to optimize for or to increase but i would say there are at least less people who are willing also to go the hard way of determining the right signals i remember that paper i'm not sure whether it was also by by google um in the context of youtube about surrogate metrics for retention so to find something because i mean depending on the domain then what we could regard as retention as something that happens within the next day or maybe within the next week or next month so something that takes a long time to evaluate which then makes sense to have something as a proxy that well correlates with it this is something that we could measure and where we have high confidence if we improve that we also improve retention this is something so surrogate metrics or something that you have also been been working on or trying out yes yes that that has been i mean we had already had some work along those lines and we have been trying out some some variants of that as you as you just mentioned which i just want to highlight is the combination of different interaction signals or their prediction if we could say so is making the difference and i guess there's also something for everybody to try out because it remembers me of the rexis twitter challenge that took place a couple of years ago there were multiple different signals that should have been predicted and then you could weight them in a certain fashion to then use the final score which is more like a weighted combination of click probability watch probability and then i mean watch probability again we could distinguish into do we mean full watch or at least 50 watch or like minimum 30 seconds watch whatever and combine this and into into a joint score that is used for ranking yeah there are all these obvious signals which are partially implicit partially explicit like watching as i guess maybe the main implicit signal versus like liking as a more explicit signal and they all have different degrees of sparsity but maybe the sparser have let's say higher importance apart from these that already exist are you also engineering additional interaction signals of of users and how useful is this branch of work possibly so i'm more thinking about how to represent user behavior so in that aspect you know on features right we do have like user video and user cross video features so on the labels also we have you know multiple labels and many of them have to be derived so they kind of try to take a raw action of watch time or watch time percentage for example and debias these to get better like labels which are more correlated with what actually we want to optimize for we have attention mechanisms also in the model so for example there we believe that if we model these well with attention then we can dynamically weigh these different types of interactions better by reducing the noise from less significant interactions other things we do are you know more with respect to for example cold shut when we don't know much about the user don't know much about like their interactions you can enrich them by interaction from similar users and cold shut cases for example you could also like cluster the users and understand more about them from the cluster that they are in we've also explored like graph neural networks in this case where we can model user item interactions and understand semantic relationships better by random walks on the graph so there are these multiple approaches on both feature side as well as label side you know and of the learning side to improve which signals are really useful and how to you know get over these sparse interaction signals when we think about optimization then the actual metric is the one thing but looking more on the timeline what we already mentioned retention there which we might arguably say is more of a long-term goal but what are short and long-term goals for you and how do you balance between the two of them on short term goals i mean we can clearly measure a lot of like you know quality side metrics a lot of you know watch time side metrics a lot of user activity side of metrics in general we don't have the the reason why this system is complicated is there is no one single metric which decides for example if i have a new model or if i have a new ranking system i cannot decide whether i should launch it or not just based on one metric we look at this suite of metrics to ensure that we are doing the right things we are optimizing their optimizing engagement but we also have gatherings on quality we don't regress you know some other metric that we really care about you know all the creators get fair distribution that they deserve and so on so there is this optimization which happens and metrics are looked at holistically more than a single sort of and i guess something that sometimes helps with optimizing for the longer term is exploration and with regards to exploration i would like to go over one of your publications and if you would be so kind to share some some details about it and what you did there and why and what the effects were is actually looking at a paper from last year that was published at dub dub dub in 2023 PIE personalized interest exploration for large scale recommender systems so a paper that was based on the work that you and your team did can you walk us through what you have been doing there or which of the challenges that you had seen you were solving with that personalized interest exploration for sure yeah the high level goal is to you know provide users what they would you know love to engage with and on the other side we need to make sure that creators especially niche creators get the reach that they deserve but in our models when we learn you know naively models are able to learn and capture more over-represented groups really well but you know they may suffer from you know creating good representations for these niche creators or less popular creators or less popular content so generally the approach to solve for that is to generate a list of recommendations that is optimized for engagement but then re-lank it to ensure you know diversity and fairness and so on in this paper we presented our framework for exploration that we mainly use so first we do a user creator level exploration using the personalized page rank algorithm which is a random work on the interaction graph and then we have a online exploration framework to using contextual bandits for example to to do the exploration now it's complex because of how do you measure the value of true exploration in in this case you know when you have a test model which has exploration you are introducing new sorts of videos but because we train our models online in a recurring fashion even the control model then is learning on the exploration data so you know you are have some leakage in the sense and you will not be able to measure the real true value so we come up with like different metrics to measure this true value and once we generate exploration candidates we have this feed composition layer which is a probabilistic insertion layer that inserts exploration videos in your field to balance the explore exploit you know component recently there was on this line there was a paper I think in wisdom 2024 by edgy minchin and group titled long term value of exploration and there also they suggested a metric which is similar it was a corpus metric which is how many contents receive more than x number of interactions after y days post exploration so the idea is like exploration will help you bootstrap engagement but then the content has to survive on its own in the hacking so the question is are we able to get new content enough distribution post exploration phase while keeping user experience neutral because you can do a lot of exploration and degrade user experience and get this done but that is you know the beauty of the metrics of you know you need complex metrics to achieve your outcomes I think that is the biggest one that's an interesting idea to take it from that viewpoint of saying after a certain period of facilitating content to be discovered through exploration the content should be able to let's say live on its own without that let's say auxiliary help or support of exploration but yeah we need to make sure it's found and then it could actually prove its relevance to the users to say so in that sense it could also be correct or fine if the content that was supported by exploration is after the exploration phase not seen as relevant anymore because it might not have been relevant in the first place as well but just let's say deemed relevant because of the help or support of exploration is this maybe a way of looking at that complicated problem that kind of makes sense so the idea is you know you are trying to separate out some type of content that doesn't get distributed because it doesn't have the seed engagement or seed interaction so the idea of exploration is that we are providing you the seed engagement seed distribution and bootstrapping you so now once you have bootstrapped are you able to survive on your own so that's the main question that we're asking yeah have you also observed that users have varying appetite for exploration that depends on the users the context this shows a personalized approach towards exploration so i would say that also the degree of exploration would adapt to the user or so how do you do this or let's say have you observed this that users have a rying appetite and how do you make use of that information for for tailoring exploration totally i think this is a very common use case like some people really do know what they want to engage with what they want to see some people are very open to exploring new things and then you know keep engaging with it i think we have in our system the way we kind of implemented initially was have some level of exploration and then observe the effects in the longer term without like continuously you know adding more and more exploration for the users so we could do something like a heterogeneous treatment effect right to really understand individual treatments effects of individual treatments on personalization different users have different appetite so that comes out there and then you can tune your exploration like you know magnitude of exploration or how you do exploration based on different users by understanding the hd effects on the data i see so much being said on the on the exploration side of things evaluation less tools but more approaches you use to evaluate offline and online whether for example things like counterfactual player rule there is this something that you that you find or would you say like our offline metrics are already so good we can trust them perfectly or what's your take on this so i think that default mode of you know offline evaluation is we train on a few days of data and then we evaluate mostly on the next day so we try to understand how this model does in the future and as you said right like because right now that way of evaluation is mostly trying to evaluate how your new model would have done on the recommendations that are generated by the current production model or the old model that you're comparing to because our old model has that bias of it is generating the recommendations we don't really know what would happen if the new model would generate recommendations and how would users engage on that new set of recommendations so that exactly is what we are trying to figure out that is where counterfactual evaluation comes into play and is used broadly we also use it in many cases we're trying to understand how would our new model perform if it would have you know produced those recommendations in the past again there are various methods here we have explored including like inversed probability scoring as like self-normalize inverse probability scoring doubly robust estimation and so on we also run a lot of tests long term on the notification side we published a blog post on running some long term tests where we reduce the number of notifications to users to only provide the extreme high quality ones and what we found out was some of the notifications metrics or the short term metrics saw a loss and we saw like user engagement or visitation loss in the short term and it took around one year to get it back and it was positive after one year so some of these tests which are you know changing user experience a lot sometimes users need time to adapt to these so some of these tests also need to run like for a long time and we do have for example you know some of the tests that we try to understand is the incrementality increasing decreasing over long term too so it's a mix of you know traditional offline evaluation counterfactual evaluation as well as long term understanding strings before we talk a bit more also about your role as a leader of a team and also your involvement with the community and with the RecSys workshop i want to like conclude looking into other models or a modeling direction that we can see folks from matter we're working on and that plays a role there but how much have sequence models played a role because i see there's just to mention it it was published this year at ICML it's called actions speak louder than words trillion parameter sequential transducers for generative recommendations so that very paper taking as an example for you working also on sequential models where are those applied and how do you make use of those for for video recommendations yeah sure so in general in recommendation systems users next actions depend on the actions they have taken in the past either that could be like some minutes back hours back weeks days months back to in 2016 the first paper of youtube recommendations that they had the way they were operating were they you know had embeddings for the categorical features and they pulled everything together with some pulley and over time we discovered that there are many issues with this particular approach first it doesn't account for the sequential nature of inputs it also doesn't you know consider the temporal information for example like one action taken minutes ago would be considered the same as any action taken months ago but in real life we know that this is you know not true so i think a lot of it is coming from elements that are borrowed from llms which uh you know model word sequences and that's how like a lot of this extension of sequential models for recommendation systems is being done in this paper that you mentioned which is on the hstu arc it's a hierarchical sequential transduction units arc it replaces the decoder style transformer block with the hstu block again details are were in the paper but one unique aspect there was we used pointwise normalization to replace the softmax normalization and this works really well because our vocabulary is non-stationary if you have like llms which are you know operating on the english language the words there are you know static or words that exist in the dictionary but in this case if you think our ids or tokens are video ids and we have new video ids coming in every minute at the same time some older video ids are getting irrelevant over time so it's a non-stationary vocabulary and this particular way of normalization really helps in the sequential model it was also i think shown that you know this model that is purpose built for sequentially uh you know modeling recommendations it has been shown to scale better than transformer and have better predictive accuracy too so in general we have used we have started using this in various places in our production models on especially on the late stage ranking which are the more complex models and we have seen performance flips come up all right and performance gains in that sense means also again higher engagement or more fairness for creators what is kind of the main gain or the major gain you got from this it's a mix of both i mean but we mostly you know on the user side it's user metrics gains engagement gains retention gains all right all right nice work and maybe a bit or too much to go into more details about but we will include it in the show notes and the whole topic of sequential recommendations deserves its own episode in the in the future i hope apart from that something that was one of my first thoughts when i was preparing for this episode was about yes there are videos on facebook but there's a lot of other content that users generate interact with etc so you have kind of within the social network different domains and this brings to my mind cross-domain recommender systems is this playing a role are you using them to leverage how users interact with other content or where users create content or like for example comment like other stuff besides videos so how much does cross-domain recommendation play a role there cross-domain plays a big role actually so it really helps us build a very rich and very comprehensive user profile understand what the users are really interested in for example if you are you know primarily consuming like photos or links or other forms of media but not much on video it's a user cold start problem and in that case we do really see that you know some of the information of your engagement on other forms of media really helps and for example if you are reading links related to tech on your newsfeed or through your network you may be interested in tech related video content and that's how we can you know provide you more relevant recommendations if we don't know anything about you similarly we can you know mitigate a lot of like this diversity slash exploration type of problems by leveraging this more comprehensive sort of user profile recently facebook also announced broadly that we are working towards a giant recommendation model that could serve different video products throughout the same model and the way we think about this is we can use this to generate more relevant recommendations as well as you know capture these some short-term trends and stuff and be more responsive or culturally relevant if you may say on on our recommendations type while learning everything directly all right okay yeah so far about the models part the use cases part but something that maybe our listeners are also interested in is your role you changed from an ic into a managerial role if i'm not mistaken and now do lead the team what was the reason for that and what would you say constitutes a successful recommendation team so as a senior engineer and i was already doing a lot of stuff like reading like road maps working with cross-functional you know partners to make sure that we are successfully delivering on our milestones on improving ranking and so on and i wanted to focus some time on building a better team and helping people really excel in their career and i think the as an engineering manager it helps me spend time on really nurturing careers of people and making sure that you know i can like give back what i received through maybe good managers in the past to others by supporting them and making sure they are successful at the same time i think a lot of knowledge is required in terms of like setting up this experimentation sort of mindset which is very critical for embed teams so i thought you know that is where i can spend most of my time as an engineering manager secondly i took it as a personal challenge i feel this is a different role with different problem solving skills that i can turn for example on you know balancing my time on technical depth versus managerial breadth making sure i you know do well on conflict resolution team dynamics balancing and so on so that will take to do main aspects behind my decision of transitioning from IC to engineering manager and in your role as engineering manager and leading that team what would you say are the cornerstones or the critical things that that make a good recommendation team that also then delivers on on the roadmap as you said i think there are multiple aspects so first i feel is you know people working on ranking do need to have a very diverse skill set because there is a lot of understanding of data for example i said bias is you know understanding that exists how that exists how we can solve for it you need to like really connect your technical abilities to the real problems that users are solving to to fix for that then i think secondly we need a very user centric metrics view you cannot go forward and just optimize for one metric you really need to understand you know engagement satisfaction long term value and connect this right so we need people who can think deeply of what are the problems we could face and how to solve them the next is like a mindset of experimentation again we are doing ml which is a lot experimental than maybe some infrastructure role which is on enabling services so here we need to really like you know be willing to challenge some assumptions you know take a very data backed mindset look deeper into results if your experiments are working or if they're not working in either case you really need to understand deeper and understand you know how do you come up with next steps what are the next things you want to try out to you know make sure that you know we make we come up with a path forward here and then finally i think another thing that i feel strongly is needed for a good mba team is this continuous learning process and how do you integrate like the research into production because our field is very fast evolving so we need engineers to you know stay on top of trends industry trends ml research and so on and find ways to incorporate it to improve our systems so this i think you know combined constitutes a great ml team all right all right yeah you have just mentioned as your last point stay at the top of their field keep up with recent trends which are connected also to challenges so what is your outlook on the current trends and future challenges of the rexis field in general but also the via rexis field in particular so what do you think are the major trends and challenges there that if i was a rexis practitioner which i am should definitely not miss or engage with i think a couple of things that come to mind especially on rexis and video exercises i think first is like the multi-stakeholder marketplace sort of thing that you know has been discussed i think very broadly now you know we want to make sure the platform is sustainable so we need creators who are creating content to do that we need to make sure that they are getting all the distribution that they deserve otherwise they will stop creating content on your platform to enable that you need to make sure that the right users are connected to the right creators because then they will consume the content and you know both the parties would be would be satisfied with how they are doing on the platform so that i think is very much needed and we see that across platforms now with creators and users being you know two major corner shows and the business being the third one the platform should cater to all these needs on the video side specifically i see more and more use cases coming up with this mixed video length formats being popular for example there are shorts on youtube on facebook also we have you know long-form content and reels i think over time i see short-form video may be becoming a portal towards long-form content because the barrier to entry is low on both production side as well as consumption side like if i watch a 15 second video and if it's not super relevant i maybe don't feel i wasted a lot of time but if i'm interested i can you know go deeper and you know look at longer videos from that creator or understand that topic and so on so this mixed length video format i think will become you know prevalent across platforms and finally from a software sort of perspective i see like you know we have large inventory to rank which is again non-stationary so i think we are building larger models how can we you know optimize our delivery stack or draining stack to serve and drain these large models with minimum latency because users are also getting more and more impatient everyone needs you know results fast wants to see the recommendations fast don't want you know to wait for the app to load and so on so connecting you know these large model trends to you know improving the latency and resource constraints i think will be also an important area yeah that's definitely an interesting and very insightful overview that you are providing there so yeah especially i guess the first aspect about multi-stakeholder platforms or multi-sided platforms as they are also called definitely important and if you once adopt that way of looking at platforms and you see it in different domains and almost everywhere where you rely on providers of items of content on the one side and not only should focus on consumers as you said if your platform is not attractive for those who create content then in the future there might possibly not be any or any useful or fresh or engaging content anymore in that sense this episode was definitely helpful in many ways but it was also helpful in deriving already ideas for for future episodes looking forward especially towards autumn and october we see the rexis upcoming so first and most important question will you be there for rexis this year in bari yes i'm planning to be there excellent yeah i've also just made my booking a couple of days ago so i'm looking forward to meet you there in person and the video rexis workshop is already part of the workshops that i want to visit at least i guess so so first day of rexis october 14th monday don't miss out on the video rexis workshop what are we going to expect there at the workshop so will it be the same structure as last year so kind of four to five invited talks or will it be a bit different or how will it be like and can you maybe already shed some light on the topics yeah we're planning the same structure again i think topics they're still like iterating on it and connecting with different speakers and trying to get a good combination uh there of leading industry professionals maybe some folks from academics but high level it's going to be the same structure as last time okay cool same structure but newer topics i guess yes for new challenges and new approaches and whatever great so for this podcast i mean i've had many great guests now today i had you as another great guest on board are there further people that you would like to appear on Recspert and that you would like to know more about maybe topics maybe people is there something that is crossing your mind yeah and we had a lot of great speakers last year at our conference workshop i think min min gen would make a good uh you know guest here from industry i've been following works of julian mccauley on the academic side so i think i could be a great guest on the academic side that sounds very great yeah i would would love to have them yeah i guess with this let's conclude today's episode amay it was really great that you shared so much stuff about the work that you are doing and i'm very sure there will be lots of future interesting and insightful work appearing out of that powerhouse yeah thanks for attending thanks for sharing thanks for having me it was a pleasure to chat with you and thanks a lot for the insightful questions yeah then i would say have a wonderful rest of the day i mean talking from europe to west coast uh you at least have an advantage because most of the day is still in front of you yeah thank you so much thank you so much for me it will be a round of running to project the day so yeah thanks again wish you a nice rest of the day and see you soon and barry bye bye thank you so much for listening to this episode of Recspert recommender systems expert the podcast that brings you the experts in recommender systems if you enjoy this podcast please subscribe to it on your favorite podcast player and please share it with anybody you think might benefit from it if you have questions a recommendation for an interesting expert you want to have in my show or any other suggestions drop me a message on twitter or send me an email thank you again for listening and sharing and make sure not to miss the next episode because people who listen to this also listen to the next episode goodbye so