Recsperts - Recommender Systems Experts

In episode 22 of Recsperts, we welcome Prabhat Agarwal, Senior ML Engineer, and Aayush Mudgal, Staff ML Engineer, both from Pinterest, to the show. Prabhat works on recommendations and search systems at Pinterest, leading representation learning efforts. Aayush is responsible for ads ranking and privacy-aware conversion modeling. We discuss user and content modeling, short- vs. long-term objectives, evaluation as well as multi-task learning and touch on counterfactual evaluation as well.

In our interview, Prabhat guides us through the journey of continuous improvements of Pinterest's Homefeed personalization starting with techniques such as gradient boosting over two-tower models to DCN and transformers. We discuss how to capture users' short- and long-term preferences through multiple embeddings and the role of candidate generators for content diversification. Prabhat shares some details about position debiasing and the challenges to facilitate exploration.
With Aayush we get the chance to dive into the specifics of ads ranking at Pinterest and he helps us to better understand how multifaceted ads can be. We learn more about the pain of having too many models and the Pinterest's efforts to consolidate the model landscape to improve infrastructural costs, maintainability, and efficiency. Aayush also shares some insights about exploration and corresponding randomization in the context of ads and how user behavior is very different between different kinds of ads.
Both guests highlight the role of counterfactual evaluation and its impact for faster experimentation.

Towards the end of the episode, we also touch a bit on learnings from last year's RecSys challenge.

Enjoy this enriching episode of RECSPERTS - Recommender Systems Experts.
Don't forget to follow the podcast and please leave a review

  • (00:00) - Introduction
  • (03:51) - Guest Introductions
  • (09:57) - Pinterest Introduction
  • (21:57) - Homefeed Personalization
  • (47:27) - Ads Ranking
  • (01:14:58) - RecSys Challenge 2023
  • (01:20:26) - Closing Remarks

Links from the Episode:
Papers:
General Links:

What is Recsperts - Recommender Systems Experts?

Recommender Systems are the most challenging, powerful and ubiquitous area of machine learning and artificial intelligence. This podcast hosts the experts in recommender systems research and application. From understanding what users really want to driving large-scale content discovery - from delivering personalized online experiences to catering to multi-stakeholder goals. Guests from industry and academia share how they tackle these and many more challenges. With Recsperts coming from universities all around the globe or from various industries like streaming, ecommerce, news, or social media, this podcast provides depth and insights. We go far beyond your 101 on RecSys and the shallowness of another matrix factorization based rating prediction blogpost! The motto is: be relevant or become irrelevant!
Expect a brand-new interview each month and follow Recsperts on your favorite podcast player.

Note: This transcript has been generated automatically using OpenAI's whisper and may contain inaccuracies or errors. We recommend listening to the audio for a better understanding of the content. Please feel free to reach out if you spot any corrections that need to be made. Thank you for your understanding.

Inspiration is like a journey, right?
You see something, you get an idea, you want to see similar things, you might get another idea.
Pinterest as a platform is like a visual discovery engine.
If we can predict what the user is gonna engage with next, that's probably a good representation which can capture all the different interests.
If your candidate generators are not explorative at all, they'll never get things to explore and so basically they're still exploiting.
Whatever you'll evaluate, you'll evaluate with the bias of the current system even if you're developing a newer model to use for ranking next.
You don't want to optimize just for your advertisers.
For example, advertisers might want to promote clicks on their platform, they want to drive more clicks but driving more clicks can also lead to clickbaity or spammy content on the platform.
So you want to now balance between this advertiser value of driving clicks versus what the users get and preventing users to bottom up.
So at some point we had about 20 to 25 models in the production systems, each of them training incrementally and managing that was becoming a big problem.
We observed during time that the correlation between what performance improvement in this matrix offline is not same as the improvement we see online.
And so that's like a team like investigated a lot and they came up with use like some metrics they can't track well.
Making sure what we see in the offline setting is directly related to what we see in the online setting.
Hello and welcome to this new episode of RECSPERTS, a recommender systems experts.
For today's episode, we are going to have an industrial focus again with one of the major players in the social network world.
And I have two great guests with me today representing Pinterest.
And I'm very glad that we are able to do this episode since I guess that Pinterest has shown a tremendous success and achievements over more than a decade in terms of personalizing its experience for users.
And there are many great publications that are nice to read that show some interesting insights, especially when it comes working with or on graphs.
So I'm very happy for today to do an episode where we are not only covering the recommendation part, but we will also be talking a lot about ads ranking and what Pinterest is doing there.
And therefore I am very glad to welcome to the show Aayush Mudgal and Prabhat Agawal.
Hello and welcome to the show.
Yeah, thanks Marcin and thanks for having us and everyone for listening to this show together.
Happy to have here Prabhat too.
Thanks Marcin for inviting us and thanks everyone for joining in.
Hope you had a good time with us today.
Yes, yes, I guess we have some great stuff that we are going to talk about, what you folks do at Pinterest to talk a bit about your work, your role, your responsibilities, and how you provide a nicely personalized product to users and cater the demand of many of your stakeholders.
So to provide a bit of background before I'm going to hand over to both of you and just doing as always some almost short introduction of my guests as all of the listeners should be aware of.
So let's start with Aayush.
So Aayush is a staff machine learning engineer at Pinterest.
He is also a tech lead leading the efforts on privacy aware conversion modeling and building and scaling recommender systems, doing video ads ranking.
He has published work at AAAI.
He was actually a co-organizer of the RecSys Challenge 2023 in collaboration with ChetChet and he has obtained his Master of Science in computer science from the Columbia University before doing computer science at the IIT in Kanpur.
Our second guest for today who does more of the stuff on the recommendation side of things is Prabhat.
He is a senior machine learning engineer.
He has also obtained a computer science degree, master's degree actually from Stanford University and also obtained a degree from IIT in Karakpur.
Previously before working for Pinterest, worked for Goldman Sachs and did user behavior studies and has also papers published at SIGIR.
So hello and welcome to the show again.
And yeah, with that small introduction, let me hand over to Aayush.
So can you please introduce yourself to our listeners and tell us a bit what you do at Pinterest, how you got there and why you are actually working for Pinterest?
Yeah, thanks Marshall.
I think you did a very wonderful introduction already but I can share why I came to Pinterest in general.
Like back in, I think I joined Pinterest, like it's almost six years now.
I joined Pinterest on like 5th March of 2018.
So I just had my anniversary, like six years anniversary but I think when I was looking for jobs, I think recommendation and I was probably not the thing on my radar at that point.
I was basically working on like some computer vision or like natural language processing kind of like experience.
However, like when I was looking for jobs, I was looking for like mid-sized companies like Pinterest was still not IPO at that time and mid-sized companies because they have like a decent machine learning style again.
Pinterest even at 2018 was doing pretty much, pretty good, especially in like graph recommender systems and all of those domains in general and they had good papers.
So I decided to go come to Pinterest given the size was mediumish, not too large.
And also I knew that IPO was about to happen.
So I could like cash in on a pre-IPO company.
So that was one of the other motivations in general.
And I think IPO happened one year after so that was good in general.
But yeah, I think ad was probably not the domain I was looking for, but I think Pinterest was still pretty flexible for someone who doesn't have experience.
And I think since then over six years, I haven't changed my team in general.
So it has not to offer.
I think we'll go over some of the details later but I think over the last couple of years, Pinterest did a pretty good re-architecturing of our stack and I think Prabhat had a good hand in it in general.
But yeah, I think currently, I think the machine learning stack is pretty mature to do many good recent changes that you want to do which keeps me there.
Yeah, Prabhat, please go for it.
Yeah, thanks for the great introduction there.
Yeah, and yet you did pronounce everything correctly.
Oh, I'm glad.
Yeah, so yes, currently, I work mostly in recommendation and search, as Marcel mentioned.
I joined about three years back and this was straight up after my master's job.
So yeah, in US I haven't worked anywhere else in Pinterest.
The way I got here was I was working with Professor Urey while he was in Stanford and he was the chief scientist at Pinterest back then.
So I tried for an internship in the summer.
The thing that I liked that was one was the machine learning stack was very nice.
So you can do a lot of stuff you don't have to take care of.
In general, you have to take care of just your model not the other stuff, in front stuff which is pretty sorted for three years back.
I guess it's very common nowadays, but three years back, not many companies until it's like very big size hard, decent enough ML infra.
So that was nice.
Well, people were very, very smart and also helpful like working on like hundreds of problems.
And I liked my team a lot because it was kind of a horizontal team and used to be like work with across all teams at Pinterest.
So it's like, you can choose whatever next project you want.
You're not bound by a particular area.
If you're interested in anything, I just shoot it up, discuss with people, work on it and see how it does in that area.
So that flexibility was what I liked a lot.
That would have made me stay for three hours going strong still.
Oh, that sounds definitely like some nice experiences that you made so far, especially with Ayush heading for his six year anniversary soon.
And then also Prabhat what you're saying that it's quite easy to engage in different projects and basically do something where you see, you learn or which basically caters to your interest, but where you, I would also assume can make an impact.
So that sounds definitely like a nice experience and like an experience of, I would say constant learning.
And then you actually provide a nice product of that learning experience, which is actually Pinterest.
Yeah, when preparing, I was first thinking like, do we actually have to explain Pinterest or shouldn't listeners already know?
I mean, this podcast is being listened by a lot of folks in academia, but of course also in industry who are working in recommender systems in search or in information retrieval in general.
So there I would definitely assume that Pinterest is a well-known name and maybe more than half of the listeners I would just bet has someone in the past maybe read a blog post or a paper by you.
But then I was actually going back and thinking about my personal bubble of people that are kind of or my friends or relatives.
And I was like, I actually, I can't really recall whether some of them are using Pinterest.
So in that sense, I would say Instagram for those people is quite self evident, but if I sometimes think I seldomly hear Pinterest.
So in that sense, nevertheless, let's maybe get started.
So would it even be right to refer to Pinterest as a social network or is it something different or how would you describe Pinterest?
What is it that you offer to your users and to your stakeholders?
So yeah, I guess, yeah, it's an interesting question.
So I'll describe Pinterest as like the positive place on the web.
And I guess I'll share my personal journey about Pinterest because I use Pinterest a lot as well.
I didn't use it before joining, but I got introduced to the app and then I'm hooked on it.
So I like cooking.
So whenever I find recipes that worked out, then I save it to a board that decide the recipes I know.
It's like collection, I want to make that again.
I'll just go.
Like if I'm browsing, I see like different ideas because like my homekit is tuned to what I've saved.
And so I see ideas that I want like are interesting or I want to try for some particular location.
So I like have different boards for that.
Okay, some special new recipes to try or like serve this kind of food if you're like hosting someone person, like if a theme or so, because that's like how I feel Pinterest is used by most of our users is like, I want to do this.
I'm planning a wedding.
I am planning a date night.
I'm planning a dinner.
What can we do?
So, and the other part is you see what how other people have done.
It's a date night outfit.
So they're not just seeing the product, which is like a show, but you're also seeing how people have styled it.
And then you can get that experience and that like that motivation inspiration.
And then you can also like make your own styles and share it if you want.
And that's how I guess I find Pinterest useful, I guess.
That should be what is often probably.
One thing that I want to add here is like the thing that I see like not positive about Pinterest is like, it's not about spending more time on the platform.
Like that's not something it's looking for.
It's also wanting people to go outside the platform and try things in their life as such.
Like if you're interested in like cooking, like trying out the cooking recipe or painting time with the painting or like whatever for me, like I use it extensively for travel ideas and stuff.
So the other thing is there's not much about like following counts and like comments and everything likes and of those kinds of structures.
So it's more of like a personal space to in general.
So you're not like compared like other social media platforms to get like more following or like how many likes you get on a post or a image that you upload.
So it has all of those positive features where you could be yourself and create a life that you want to love in general.
I think that's what I found really positive about Pinterest in general.
What I'm hearing is like provide inspiration, a positive place, having users return to let's say the real life and try out the stuff they were inspired by from their experience on Pinterest to make things a bit more concrete and also help our listeners with better understanding all the stuff that we are going to talk about in that episode here.
Can you maybe walk us through the basic terminology and maybe some stats and figures about the people who use Pinterest, but also the people who provide content on Pinterest.
So I've heard about pinners and pins and boards.
Can you provide just some structure?
What are these concepts and how many of them do you have on your platform?
So that our listeners just get a concept of the stuff you are dealing with and also about the scale that is I guess highly important for this.
I'll start Pinterest, the name comes.
So you pin it here, I guess that the name is right for your pin your interest, but that's where the pin comes from, you might know as like every image or every, the main entity that you see on Pinterest is called pin.
So you can compare it to like a post and it works like that's the basic structure what people save.
And I guess I've used the word save a lot.
So at Pinterest, you make collections by saving the pins that you see in your search and your honky.
And these collections are known as boards.
So I guess these are the three things you might see about whenever we in a blog post or papers engine, these are the three main things people come across.
And I guess we have more around like billions of pins on platform as of today, more than like 450 million users.
Yeah.
Wow, okay.
That's quite a scale.
Yeah.
And these pins, are they always visual attached with some text or do they have different notions in terms of the modalities they are composed of or how can I perceive a pin?
I mean, you already gave that example, for example, recipe, but what could be different kinds of pins?
Pins can be like very different types.
So your inspiration comes in different forms.
So your pins have to come in different forms.
So like, I guess, as we mentioned, it's like a static image.
If I'm looking for like decor ideas, that's probably enough.
And most of this principle also have links because like images are saved with links in description.
So if people click, you can go to the blog and see how to get the source and like get more details about it.
The second type of pins are videos because sometimes videos are the best way to express something.
For example, how to build a fence, how to repair something or maybe how to style your hair.
So that's like another type of content.
Even recipes, video recipes are interesting and engaging as well.
Then we also have products, which are known as items.
They're like things you can also buy, not just see, not just click, not just play, but also if you can buy basically furniture, fashion or whatever.
And then I guess we have ads of each type as well.
So that's, I guess, what covers most of it.
Contended to find on Pinterest.
I would say products come from your retail partners, but the pins themselves.
So for example, if there is a recipe or how I dress my hair or something like that, this is coming from the same people who pin stuff.
So in that sense, the people who consume are also the ones who produce, but I guess not all of them, but a pinner can basically do those.
So create content, but also consume content.
Yeah, at Pinterest, like we don't differ between like the person who is consuming content or person who is creating content.
At Pinterest, everyone is a pinner.
You are saving what you like.
You're saving stuff from external sites, what you like to Pinterest.
You're expressing yourself.
You are sharing with other people.
So then you take the heart of a creator.
When you're exploring ideas, you take the heart of a consumer, but at Pinterest, I guess we don't differentiate because everyone has ideas to share and everyone has ideas to get inspired and realize them in their friends life.
Yeah, I think one thing I also want to add like the platform is more like visual in general.
So if you go to Pinterest app, like you will see more images, videos, not like a lot of text out there.
It's mostly a visual discovery engine too.
And that's where I think Pinterest has been at forefront about like visuals and like what like multimodalities of like images and text, back my text behind, which is curated by users.
And that powers a lot of personalization systems.
Mm-hmm, okay.
And yeah, as you already touched on personalization systems and ads systems, it's mostly, I would assume personalized ads, but I mean, we do have recommendations on the one hand.
We do have ads on the other hand.
And both I would assume use some kind of similar mechanism, but also separate ones.
But actually what are the sites where you are showing all of that stuff?
Where are actually all the sites and venues where you inject, let's say your models and your mechanisms?
I think Pinterest has like three main like surfaces which are like kind of heterogeneous and also homogeneous.
Like information can flow from one to another, but like the first one is like home feed, which is very typical to most surfaces in general, where the user has past interactions or they don't have any interactions.
And the task is to show relevant content based on their kind of features or based on like what the platform understands from those users.
The second most important surface is also around search.
Like if you have a search query where the intent is very clear what they're looking for on the platform and you can have like search queries to find and like to continue the journey that a user might have.
And the third that is probably, I think one of the most important surface for Pinterest is around related pins.
So this is like, if you click on a particular content, can you find many similar content to users or like similar content, looking content?
I think one thing that I want to highlight is like the surface on its own looks pretty simple.
Like it's just finding similar content.
But if you look back and like it really helps to drive like the recommendation systems, it helps to collect data, like what content is similar and like trying to learn through like, trying to like learn embeddings through like similar content by getting signals from the users themselves rather than to like label this data.
Many other platforms I think lack this kind of a surface to self collect data also from users in general to power personalization models.
And I think the related pins does a very good job in being able to find like good data sets also for like machine learning purposes.
Yeah, I guess, like they said like Pinterest and all these surfaces are more just to, I guess, add the non-machine learning part is like for the users as well, the surface are like very flowy.
You have a new thing you want to try to start with search.
You like something, but you want to like try but other variations you go to related surfaces and a home feed as well.
Then you like your browsing based on your engagement.
Then you see something, but inspiration is like a journey.
Like you see something, you get an idea.
You want to see similar things.
You might get another idea.
Then this kind of, again, you go into this flow, you like something that again, you go into related flows.
So that's like keeps that inspiration journey.
And then because we have this kind of journey based stuff, we can collect this data and observe this and try to improve this journey wherever possible.
So, and everything is powered by like machine learning.
So even it's like each surface that are also like several stuff.
And you have like a very typical recommendation stack powering each of them.
And that's because you need to understand like the users intent in lots of different ways.
And I guess some of that we'll touch on later on.
Okay, I see.
Good that you are provides that coverage.
So we do have the home feed with it's recommendations, the search and the related pins.
Let's get started with the home feed.
As far as I know, the home feed is your primary entrance point of a user.
So for example, if I go in my browser to pinterest.com or if I open up my app, then this will be the first thing I will see.
What is actually happening there?
I guess that you probabed two years ago, gave a very good talk at the GTC 2022, where you basically went through the journey starting off the first ML model in 2014 that you used to personalize the home feed.
And then going through that history up until 2022.
And I mean, I guess in the meantime, you have done even more on that side and showing and illustrating all the improvements that you made to the models, to the latency, to debiasing and so on and so forth.
Can you get us a bit hooked into the personalization and machine learning part there?
So what do you do to provide your users a personalized experience on the home feed?
Yeah, I guess that's a great question.
So it's because as I mentioned, home feed is the first point of entry.
So users after the end of the day, whether it's a new user, daily user or whatever, that's the first surface where we want to have to meet the opinions, where we have to understand them and show what they were like.
I guess, as you mentioned, we started out with something simple in 2016, but today it is one of the largest surfaces on Pinterest.
And I guess most engaged as well, because we can personalize it a lot based on what the users have done.
To give a very brief view, we have a standard suite of candidate generators, which will generate candidates based on different models, based on different heuristics, and we have some generators based on the graph, which are, I guess, cross-pixie graphs, and which we have discussed in paper.
Some of them are ML based on the sequence of interactions users had, which we have talked about in papers, like Pinner Farmer, and then we also have other stuff that powers to get more interest-based retrieval.
We know these users are interested in this.
The machine learning is also doing one part, but we can also inject some more course or granular.
So after the candidate generators, once you get to the ranking stage, where you're now given this thousands of candidates, which we have extracted from billions of candidates, like the whole multi-billion corpus of things that we have.
So now the next step is to rank these thousands into about 100 or 400 or 500 or some small numbers, so that next stage, which is Blender, can take it and arrange into a grid.
So here also we want to personalize the users a lot.
So we use Pinner Farmer embeddings, which captures the long-term interest of the user, but we also then augment our model with sequence features, which are most recent actions users have engaged with, to add more short-term understanding as well.
On the content side also, we have different kind of content embeddings that we have discussed, like pins, age, items age, and so forth and so on.
And then use most state-of-the-art architectures to then get to a multi-task model, which can generate the probabilities of different types of actions that can be taken on the pin, and which is then consumed by the Blender, to apply either some business logics and some others to get from this rank list to a feed of pins that the user see.
And here is where we also apply all of our diversity stuff.
We also do a lot of that diversity throughout the stack, but here we then use more sophisticated methods to make sure that the content is just not, is not like one type, it's also diverse so that the users can learn from more of them and explore and exploit basically.
Okay, okay, I see, yeah.
You have been alluding to that pin-a-former approach that you presented at the KDD conference in N22.
It was a paper by four of your colleagues.
And actually about that sequence modeling for user representation at Pinterest, where you post that problem of short-term user interests.
So the user comes to the platform and they have some specific intent.
Maybe they wanna just go back to some recipes because they have the task of cooking something for tonight.
But there are also these more like longer term things that you wanna do.
What are the things that kind of helped you in coming up with that model, providing that experience, but also diversifying the results there?
I mean, this is sometimes always, I would say, maybe also a trade-off.
So it's quite easy to fall into that rabbit hole of just doing the most direct and next thing and predict what the user just immediately likes or would like to click.
But I guess it's sometimes much harder to also anticipate a longer term date.
I guess in that paper, you have that graphic, that one page where you showed there's a 28-day future window that you wanna somehow anticipate and predict.
So can you elaborate a bit on what you have been doing there and what pin-a-former is then kind of based on?
So pin-a-former is based on the idea that if we can predict what the user is gonna engage with next, that's probably a good representation which can capture all the different interests that, I guess because user representation is not new, I mean, we won't have tried like different other methods as well, I guess, as one of earlier mentioned, like Pin-Sage was one of the earlier ways we used to understand users.
We will just like cluster the content that the user have engaged with.
And then as you say that, okay, these are the main clusters the user have engaged with, so these can capture user interest.
Then I guess basically, folks, my colleagues worked on this idea that what if we can just pose this problem of like predicting what are the next actions the user will engage with?
Will that capture enough?
So once you see a sequence problem, you apply a transformer model.
So yeah, because it's not 20, 94, so that's remodel everywhere.
So yeah, so then we use like a transformer model to give this input sequence to the users and then take the embedding from the transformer encoder and then see whether using this embedding, we can predict all the next actions in a particular time window, so we can like choose from next day, next week and so on.
And in general, I guess more details for the paper, but we have seen like just modeling the next action is not enough and often does not need to go to embeddings, but if like mixing like longer term predictions, more information is captured in the embeddings and they are more useful in the downstream task, which if you now think of it, it makes sense because if you're just predicting next can be sometimes easy based on like what journey the user is on or where on the journey the user is.
But if you really want to learn interesting, you have to look at the harder problem, like not just right at the moment, but like what all the things they're gonna learn, like the next ones.
Most of this like is based on our representation and the frameworks and not given this, how we optimize is also like now posing it as a contrastive loss that this embedding and the future embedding are the positive pairs and then you have like all the mix of like in batch negative and random negatives, which is a pretty standard framework that we use at Pinterest and have been often described in like all the different Sage papers you will read across.
Yeah, yeah.
When you say that just predicting the next item that is of interest for the user or the next pin that is of interest for the user is not enough.
So what does this actually mean not being enough?
So have you observed that users were less loyal or that users were kind of spending time on the platform is actually not a goal that you want to maximize for, but there must be definitely some engagement signals that you seek to optimize.
What are those signals?
What are the ways that users can interact with the content?
And what is it that you seek to optimize there and where you saw that there is potential and that just short term focused approaches are not enough?
Yeah, so when I was talking about, I was talking about the very specific panel farmers.
So the short term was in terms of the model itself, not as a platform.
The way we were measuring is like, and that we wanted to see like given this embedding, how well we can predict the next action.
Let's assume this, so this is like a metric, even if you're training on the next action task, once you have like both the user sequence as like a sequence prediction problem, we can choose what next you want to predict.
So the evaluation metric we use, like, okay, let's see what is the recall at the next action given the user embedding we produce.
And the other treatment here is like you take, you're not just predicting the next action, but you're also predicting the all future actions or a sample within like some next k days or something.
And we observed that training on the second task, not only improves metrics on the second task, which is like how well I can predict the next k days using the recall metric, but also includes the next action one as well, which now if we see that you're making the task more challenging, hence you have to encode more information in the embeddings.
And as we know from contrastive losses that in general machine learning that the model that we'll learn will be always a function that your data supports.
So what is not in the data will never be learned because that's like the, I guess the pros and cons of machine learning as well.
So that's why like, if you bring a more task, you're encoding more information.
And because the goal is to now use these embeddings in like downstream ranking models or retrieval models to learn the user or know about the user, you want to get as much more information into the embedding as possible.
And so we are basically at this point designing SSL tasks because we are saying, okay, next action prediction is one, next key action is one SSL task and we kind of mix and match and see that gets good performance.
Okay, okay.
In that sense, I mean, there are these different ways of how users can interact with your content.
So I mean, they can pin or I'm not sure whether it's the right one, but just decide to take a pin on some of their boards.
They can make a guess a closeup.
They can adjust simply click it.
Are you basically trying to predict all of these actions?
So is this actually what you're trying to predict the probability of different actions?
So yeah, so as you mentioned, you just can take a lot of different actions on the content.
You just can take different actions on Pinterest.
So you can like save a pin, you can click on it, you can close up, it just gets to the related feeds, you can download it, you can screenshot it and so on and so forth.
We generally at Pinterest look at what are like the high value actions users are doing and like, depending upon which model, we are like either we are like the user embedding model or we are at a retrieval model or we are the ranking model.
Different kind of multitasking is used to like model different actions and also like the which set of actions are modeled.
For example, in the pen and form of paper, which is like the user embedding model, which is like the, even before the retrieval, here we want to basically capture more meaningful information.
So for example, we only use like clicks, which are like long in duration and saves as a positives in our examples because these are high value, it is give like more high quality information to you and in course like the right type of information into the embedding that we are learning.
But when we are at the ranking stage, we want to model the probability of all types of actions so that the blender can then decide based on how they are blending.
You can look at all these probabilities, which assume if the ranking was correct, the blender will then see like what are the best content that should show to the user and arrange in this particular way.
So answer values depending upon which system we are looking at.
You mentioned also candidate generation and we touched a bit on the goal of not only showing relevant content, but also diversify your content.
And this remembers me of a paper that was also published by you and then 2020 at KDD about that PinnerSage algorithm.
So the name of the paper was PinnerSage multi-model user embedding framework for recommendations and picturesque.
In that paper, you basically outlined or motivated why there should not be a single unified embedding for a user, but why there should actually be many or a couple of embeddings that capture the different interests of users.
And then you provided some nice approach, how you come up with them based on your previous PinnerSage embeddings, but also how you could use this.
Does this directly go into candidate generators?
So to say that I do have a user can be represented by those different embeddings that captures a different interest.
So I could use all of them to then fuel candidate generators to come up with different pin recommendation candidates.
Is this one of the key components there or how does this fit or support how you come up with diversified content?
Yeah, sure.
I guess in this tag diversity can mean a lot of different things.
Starting like one, you can be looking at content diversity versus like, is it content showing different interests to the users or like different types of, and then there are a whole other set of diversity that we look at, I guess you can look at it, blogs and papers.
But, and so that is the solutions have to be across the stack and in different places as well.
You cannot, the stuff, at one point in the stack and assume like, even if you basically, for example, even if you diversify or have a very high expirative strategy on your ranking side, if your candidate generators are not expirative at all, they'll never get things to explode.
And so basically they're still exploding.
So that's, I guess, the things about here that you need to have correct representations of whatever you're trying to do across the stack.
I get the paper you mentioned, the Pinner Sage, that was one of our user embedding models.
And that was used to make like one type of candidate generator, but then as I mentioned, like we have like different types of candidate generator which are looking at different aspects.
You can think of them like looking at aspects.
One is like interest-based explorations.
Then one is like more machine learning.
One is user embedding learning.
Like for free, you can have like Pinner Form or hand Pinner Sage, both used for like different types of candidate generators and so on.
So that's, I guess, one way we make sure that we kind of look at the user from different angles and different perspectives and try to make the candidate generators representative of that.
There are like more stuff that happens across other stacks and even in the candidate generator stack, but I guess that's, I'll skip that part because it can go into a long discussion.
No, I mean, for all the nitty gritty details, we always are going to reference all of the material that we touch here and that is somewhat related in the show notes.
So for our listeners who seek for more detailed information please go there for further references.
And there's also a nice block that you have.
Maybe before going into more of the ads part, talking what you're doing there, because obviously you don't have only Pinners, but you also, I mean, you are a company which is listed on the stock market and you folks will not earn money.
So there must be some mechanism of doing that.
So let's also talk about that.
But before going into that, I'm still on one of my screens looking at that waterfall chart from that talk at the GDC conference in 2022.
And there are so many things that you did in that journey.
So we talked about Pinnerformer, real-time features, transformers, but there are also two things which are not about the model that does something, but also about additional problems in the realm of personalization, which is evaluation and closing the offline online gap.
And the other one would be the position bias that at least in that presentation came last.
Maybe let's start with the former.
So can you share some insight that you got from that online offline gap and how you were able to close it?
Yeah, sure.
I guess this is one of the things I love about Pinterest is that when we are working on some ranking models or some, the offline evaluation metrics are good.
So if you see something improving in the offline metrics, you have a good amount of confidence that you will see similar movements in online.
So makes the life of machine learning in genius much easier.
In general, for that very specific ranking model evaluation, in general, we used to like AUC based metrics like either PRAUC or ROCUC.
So because we are generally ranking is posed as a classification problem where we are classifying whether like engaging or not.
So, and people use like classification and ranking metrics like AUC, but what you observed and we observed during time that the correlation between what performance improvement in this metrics offline is not same as the improvement we see online.
And so that's like a team like investigated a lot and they came up with use like some metrics like contractual instead of just like rank predicting on the current system speed.
So basically you're, it's kind of biased by your current ranking system.
So whatever you'll evaluate, you'll evaluate with the bias of the current system, even if you're developing like a newer model to use for ranking next.
So that's why we kind of try to like debias a little bit by showing a random feed to the user.
Again, random in some things like it's still have some constraints based on what it has seen from the ranking.
But then instead of just using all of those scores in a very particular way, like this kind of randomizes this feed a little bit so that you can collect more content signals from this.
And then we develop some like more metrics around this like experiment with a lot of metrics and offline online studies and found out like some metrics which works like quite good for predicting offline, like how good the model will be doing online.
So, and that led to like one of the biggest speed of some experimentation because now everyone does.
Otherwise, like if you're trying to have to try something online, you have to like make sure everything deploys around all the associated workflows that are associated with your model, then wait for like some amount of time, which could be in days to get some, and know of like how this model is gonna perform if you don't see good correlations.
But now with like having good correlation metrics offline, people can run hundreds of hyper parameters and try out different things, different modeling ideas, different data ideas.
And as I described, like some of the biggest changes in the model and data came after we launched this kind of metric.
So I guess that's why I talk about the importance of like having a good correlated metric offline as well, because that speed is like you can't imagine.
Yeah, I guess you were reporting about some like 50X speed up there in terms of experimentation speed.
That's quite tremendous benefit.
I would definitely say if there is something offline that you could really rely on, and you would rather see that this translates more consistently into online impact, maybe getting to the position bias, you just mentioned you took a counterfactual approach there, having basically some randomization so that you could then, I would assume basically apply some inverse propensity scoring techniques there.
For the case of position bias, the approach looked slightly different.
There you were basically estimating that position bias or modeling that position bias directly, and then took it out as part of the inference path so that in training, you were taking both things, so the relevance and the position bias, and then in the inference part only the relevance.
Why have you taken a different approach or am I just mistaking it?
It's more close to the randomization approach anyways.
So why have you taken that approach for position bias where you basically estimate both of the things, so position bias and relevance directly?
Yeah, sure, and that's a great question.
So that's the difference between the two.
We are trying to solve very different problems.
So in the evaluation, what we want to see that the bias is not the position bias that we're observing, we are observing the current system bias.
So we want to do bias against that.
Whereas in the position bias work, you do another type of bias and recommendation system.
So in general, if you look at all different biases that are present, as soon as you do candidate generators, you got the bias, like you have popularity bias because popular contents will be retrieved more, then you have data visibility bias, which is that your current training data is based on the current model.
So when it goes to the next model, it will have those kind of issue.
Then one of the other biases is the presentation bias.
So this is coming from like how you present your content to the users.
And this is typical of how people, humans in rank, that if something is ranked higher, because we trust the system, we assume that it is also good content.
So there's like that kind of bias, and in general, just discoverability bias, higher positions have a big advantage.
And because we are now using this data to train our next model, we'll have this kind of biases from the implicit feedback, because we don't have explicit feedback.
And this is true for all recommendation systems.
So that's why when we train the next model, what we want to do in this particular position bias is see if we can learn how much bias this content hard, or on an average of content has based on the position and on its display.
So we use different features like device side to get features about the grid, because one position in one grid is different from a same position in a different grid.
The idea is to, if we can learn this separately, we can remove this bias during serving.
So now we rank based on only the relevance, these kind of presentation biases are reduced.
So this is a whole lot of literature and pivot side, like if you're more interested to read, but we found this idea in one of our papers that we read and we implemented it and we see like a improvement on it.
So I hope that answers the question.
No, I guess I'd provide some good background.
Maybe to wrap up that first section about the recommendations, something that would also play a role and adds ranking to certain degree there, but yeah, given the many steps that you have already taken so far to optimize your system, one could argue, wow, that's very mature.
So is there still many things left to do or are returns diminishing more and more and every step takes you just a larger effort and when is that actually still something worth doing?
In your opinion, what are the major challenges that are still unsolved?
So what remains to be done to make Pinterest the perfect personalized experience?
Yeah, I don't think a perfect system can exist.
But yeah, even though I guess that's a good point, the model seems or the system seems quite mature, there are always new ideas coming up and there are always new things to do and like I guess this was the talk referring to 2022, but since then, like we have made so many improvements, I guess which are again more blogs about that, more papers that there was a transact paper about how we utilize this, added more sequences into this model and like how we serve that and how we log it.
So and that's give us like more gains into the model.
So that I guess that's the thing, like you can always keep improving because there's always to learn from the user, better and show better content.
And I guess one of the core challenges is the explore part according to me, how well we can bring new ideas or new interest to the users and make them stick to it.
So you just don't want to show them like a different thing that they probably click in, but like it becomes fun of their core interest or something like actually engage someone and that's a hard problem to solve because you're not looking for just clickbaity stuff that you like in that moment, but you want to build something that users come again for and it's also like realizing something in their life.
It's not just, yeah, so that's the, I guess the thing that I think is a very hard problem and we have done a lot to like get to it, but still more can be done here.
We are working on a lot of things, but yeah, that's it.
Next thing I'll like to see as a user.
Yeah, that in terms of recommendation and how the personalization part is already kind of a good coverage.
This is actually not the only thing I can do but I think that's a good idea.
I think that's a good idea.
I think that's a good idea.
I think that's a good idea.
I think that's a good idea.
I think that's a good idea.
I think that's a good idea.
I think that's a good idea.
I think that's a good idea.
I think that's a good idea.
I think that's a good idea.
I think that's a good idea.
I think that's a good idea.
I think that's a good idea.
I think that's a good idea.
I think that's a good idea.
I think that's a good idea.
I think that's a good idea.
I think that's a good idea.
I think that's a good idea.
I think that's a good idea.
And let's talk about it.
I think that's a good idea.
It's the ad.
I think that's a good idea.
I think the Staubach who is, I guess, the expert here when it comes to ads and ads ranking I would be interested a bit more to see what sets apart ads from ad recommendations from Pinterest shopping interest.
I think that sounds pretty good.
I think Prabhav did a very good overview about our recommendation systems work.
But I think what ads come into the picture is like the user is on Pinterest, they're in some journey.
Ads help to connect users with advertisers.
Advertisers have content that they want to become discoverable or explorable by users.
So that's where advertisements comes, it's trying to balance between your users, between the advertisers and the platform itself so that the platform can make money.
From that perspective, Pinterest provides a full funnel optimization.
As advertisers, they're majorly three products that Pinterest has today to complete this full funnel optimization as the user goes in their journey.
So the first step in the journey is the user might be just coming to Pinterest.
They are not aware about what they want to do.
So that's where awareness product comes in.
The main motivation is you're just trying to create some brand awareness.
It could be something like Target is a brand, you don't necessarily need to interact with it.
And Target just want to create an awareness for their brand on the platform.
As the users start to interact more with the platform, there could be more consideration products like you want to drive more clicks or you want to drive more engagements with your content on the platform.
So that's the other kind of optimization that advertisers might look for.
And thirdly, as the user gets more engaged, as the user is more, let's say if you are in a like, I'm trying to remodel my house and I'm looking for interesting ideas for remodeling.
But at some point, you might also want to buy the products so that you can do the remodeling in general.
You want to buy different products.
So that's where the conversion product comes in and there you're trying to, okay, you are in a journey.
How can you fulfill this journey and take action on it?
How can you make it viable?
That's very, okay, you find this product, can you buy it outside the platform?
Or let's say Target or Walmart or any other retailer or any other kind of like advertisers.
So depending on where the user is, what kind of advertising goals the advertisers might have, they can run different ad objectives and ad optimization.
One thing that you touched upon, like standard and shopping, so that's two common terms at Pinterest in general.
So just to demit what they mean.
So shopping is essentially is like a shopping catalog or like Bible pins that we have on the platform.
You can think about something, let's say, if it is, let's say Target or maybe Target, or let's say maybe Macy's or Nordstrom, for example.
So if it is Macy's, then as like a category, you might have a clothing category.
And then in a category, like have, let's say, clothing, then you can go to shoes, then you can have Nike shoes.
Then Nike shoes can have different, like maybe the Jordan Nike version, some here, and then you can have different sizes.
If you look at this catalog, the size can be pretty big in general.
You have different shoe sizes, you have different, if you map it down, I think the content becomes much larger.
But this is mostly like a shoppable content, which you see, and you can just buy a catalog from it.
For standard ads are kind of a product where it's not directly shoppable in that sense in terms of like things that you're buying, but it could be something like you're trying to create a witness for, let's say, BMW or a Volkswagen pair.
The inventory is not that big in general, like there's probably one or two products that you buy, or like three to four products where it's not so nested as what is in the world of shopping in general, which distinguishes the two kinds of products that printers us.
So standard ads and shopping ads where the shopping ads mostly relate to concrete products that then come from different retailers that you feature on your platform or that advertise their products on Pinterest.
Yeah, and I think one thing just want to highlight there, like there are also like non-promoted shopping things too, like the organic shopping things.
As a retailer, you can also choose to just share the catalog with Pinterest and you don't need to run ads to make it discoverable either.
So that just flows through the organic kind of recommendations which Prabhat already discussed a bit before.
But you can also run ads on that if you really want to show them more explicitly to users and want to target investors.
Why would I do this as a retailer or as an advertiser to have those non-advertisable pins?
So I think Pinterest as a platform is like a visual discovery engine.
Like a user is trying to search for like let's say Nike shoes and if Target has it, like Pinterest will just show it naturally to them in general.
They might just see the content that they want to buy or shop on the platform organically too.
I think there's this like notion like you're in some journey, right?
Like if you're in some journey, like you're trying to explore things, that's where I think the organic recommendations come in.
A lot of picture as you're trying to explore things, you can get organic recommendations.
As you're more mature into the journey, the advertiser might also want to directly influence to show that particular content to them and that's where like the purchase journey or like the advertisements can come into picture if the advertiser feels like it's the right time for them to monetize and show and then group content.
Okay, so in that sense, they would also kind of build some awareness for the retailer or for the brands of the retailer in that sense that they could monetize it later on when providing more like standard and shopping ads.
Yeah.
Okay, I see.
I see.
In the preparation for this interview, I've been also going through a couple of the blog posts that you have been writing there.
There were some blog posts that were talking about the structure and you use kind of a lot of auto ML there.
You have also multitask learning and multi tower models and there was a very specific blog post and just opening it right now to have a better overview.
There was actually about the approaches that you use.
So maybe let's first start with that multitask learning aspect.
So when we talk about Pinterest ads, so what are the tasks that you want to learn and what are the approaches that you use to solve that multitask problem?
Yeah, I think that's like, I can go into big detail there.
I think one thing that I want to highlight for ads, if you look at ads, like what it's trying to optimize is for like engagements with advertisers.
But that is not only the main objective, it's also trying to optimize to see if you get the user value.
You don't want to optimize just for your advertisers.
For example, advertisers might want to promote clicks on the platform.
They want to drive more clicks, but driving more clicks can also lead to click beauty or spammy content on the platform.
So you want to now balance between this advertiser value of driving clicks versus what the users get and preventing users to not have a bad experience.
So that's where even from the beginning itself, for an ad system to work in general, it needs to predict for multiple different objectives.
It could be as simple as predicting for clicks because that's what we are optimizing for.
But then also you need to have some of the actions, which I think Prabhat also shared, saves is like an action that we feel translates well with the user journey and user satisfaction in that sense, or is the user hiding, or how much longer duration?
How much time does the user spend on the content in general?
Do they just click on it and come back within a second?
Or do they spend something more than 10 to 20 seconds or somewhere there in general?
To understand how the user is interacting with the platform.
So for Pinterest, I think at least for ads to main, I think these actions like clicks, how much time you're spending on the content, saving, are you going to save the content or not?
Are you going to do a close up?
Are you going to hide this content?
At least for all generic ads, these are four to five different predictions that you definitely need before to decide the value of showing an ad to the user.
But one thing is for ads, I think the heterogeneity is higher in general.
Some advertisers might not be looking for clicks on the platform.
They just want to create brand awareness.
So maybe you don't need a click prediction for these advertisers.
There could be advertisers who are looking for more conversions on their platforms.
So let's say if you see a particular ad on Pinterest, maybe in a day or two you go to their advertiser website and you buy the product.
So they're trying to drive that kind of an interaction.
So apart from just click, you might also need to predict the likelihood of conversions which not all the other advertisers need.
So in that sense, the kind of predictions and the values that you need can keep on growing with different products that you might have on the platform.
As you start to bring videos on the platform, you might be just making clicks, but also making predictions for video views.
So this scale of predictions can keep on increasing, but then as the product is growing, at some point, when I joined in 2018, I think the modeling techniques were pretty basic.
They were GBDD kind of models and we are training a gradient booster trace, embedding that into logistic impressions.
And then we are making all these kind of different predictions in general.
So at some point, we had about 20 to 25 models running in the production systems.
Each of them training incrementally and managing that was becoming a big problem in general.
So that's where we started leveraging more of multi-task learning approaches so that can we combine these predictions mostly when we started was just from a maintenance perspective, we cannot maintain these 20 different models.
Like if we need to, let's say example, add a feature.
If you just want to add a feature and something like winner former or winner said with private shared before and you see promise.
Now you need to go into each of these 20 models, two different experiments and ruling it out would take like a year or so.
And similarly, if you had so many models that you're maintaining with the workforce, like Windows has been a linear company in general.
So with that workforce, you cannot leverage and keep on just doing this like repeated copies and stuff like that.
And similarly, if you want to remove features, it will another take a year or so to remove features from ecosystems in general.
That's where basically from like maintenance perspective, that's where we started to leverage multi-task learning.
Can we maybe combine different objectives?
Like for a particular prediction, we need like different four to five objectives.
Can we combine them?
But over time, as the models became complex, we realized that that's the other motivation to like now can a single model make these predictions?
To do that, GBDDs cannot do multi-task learning in that perspective.
So we move to like neural network deep learning and I think Prabhat in his Nvidia talk has a big journey of that evolution in general.
But moving to multi-task learning with these objectives, they help us in like two ways, like two to three ways in general.
Now we can have lesser models.
We can also have a single model that can make inferences for multiple objectives which lowers down our infrastructure cost in general.
And thirdly, we can leverage multi-task learning for our sparser actions.
If you look at advertisers who are optimizing for like conversions, like for example, let's say if you have people are viewing like an ad is viewed for 1000 times, you might only get clicks maybe 10 to 20 times.
But conversion is only one, like maybe just one or two conversions out of that.
So the task are very sparse in general.
So as the task becomes sparser, I think multi-task learning with more tensor objectives helps us to train more performance models in general.
And that's like the third area where multi-task learning is effectively helping us to have more better possibilities.
Okay, yeah, I'm right now looking at that blog post that we are going to attach.
That was called how we use AutoML multi-task learning and multi-tower models for Pinterest ads.
And there's actually a figure where you show this high level architecture where you basically start off with a single model where you learn representations cross them, and then you finally end up with some last fully connected layer that then is going to be injected in into those towers that have different objectives or are designed to cater to different objectives.
What are actually the features and the data that goes into that model and what are those objectives we have been talking about?
I mean, you touched on non-conversion, but which others can we talk about?
And then also, is this multi-task architecture, I guess this is something that you mentioned there, helping so that learning one task or to predict one task can also help to a certain degree the other.
So maybe what was the reason to have that hybrid architecture of some part being separate and some part being joined?
Yeah, I think that's a good point.
I think one thing I would highlight like that blog was in 2020.
And I think the work was done in 2019 in general, which was published in after the work was completed.
So I think over the last four years, they have been better techniques of handling different tasks and balancing weights or balancing learning across heterogeneous tasks, which in 2020 we didn't have access to at that point.
So at that point, the main motivation that we had is, okay, standards and shopping ads have, that's where the main structure split we had in general.
Standard ads are kind of very different than shopping ads and user behavior is much different.
So if you try to just combine all that data together without any sophisticated arrangements between them, we see that it drops performance.
So what we wanted to do, okay, the first part that the machine learning model is trying to do is to learn a good feature representation and later you're trying to find your networks for that in general.
Since we wanted to have a single model, what we did was to reduce the influence between standard and shopping ads, we just had different parameters that can split between them, so you're learning a good representation and then fine tuning for your heterogeneous task.
So that's what the kind of, it was mostly an intuitive idea at that point.
I think they are better mixture of experts and better gating mechanisms today that don't need to require you to explicitly structure into multi towers depending on your prior understanding.
So this was like one case where we know, okay, prior, okay, this would work and we just went about splitting the architectures in general.
But then the question is, what if there are other splits that you need to do and you would not know what splits you need to do?
So I think that kind of structure gets into those complexities in general, but that was the best that we had in 2020, I would say.
And that's what we did.
And I think it was pretty successful in general, just based on intuition we could split them and that worked pretty well.
In terms of objectives, I think I already shared engagements on the platform, clicks, maybe longer duration clicks in general, saves on the platform, hides, I think those are the kind of objectives that the model is co-training.
Also something that we touched on in the home feed recommendations part, but also here, how are you evaluating the effectiveness and how do you make any differences between ads and organic pins?
So what is actually specific when it comes to evaluation of ads?
So anything that is different there or that changes from the recommendation context?
I think one thing that I want to highlight here, I know like Prabhash here, like the new randomized like pocket where we do the evaluation for home feed, which changed the lives for ML engineers.
But I think, unfortunately, that kind of freedom is limited for ads in general.
One of the reasons is like running a randomized online experiment where it involves money, which you're charging from your advertisers.
So in general, it comes at a cost, like we don't have that much flexibility in the ads side, that kind of a randomized experiment.
So what it limits us is to like AUCs and PRAUCs in general, which might not correlate well with your online experiments at some point.
So to get away with that, the only thing that we have done, at least in Pinterest recently, we also had a blog which I'll link it here.
So there are two reasons why your offline might not correlate with online.
Some of those, which I think the randomized for kids helps to address, which we cannot, but the other set of reasons where it could be just like system related bugs like what you're training and like if they're like issues with your feature surveying, if they're issues with your feature logging, that might also lead to your AUCs not directly correlating with those metrics.
So I think over time, we made a lot of investments in making sure our systems are more robust in general.
So that like, for example, I can share like at some point, if let's say your feature server fails, you're not able to fetch some features.
The models, if you do not have robust models, they might just explode on those predictions in general, which you might not see in your offline settings, because all the features that you are training on is well-formed performance as such.
So like investment in mostly that like the secondary systems around like more monitoring and stuff is what we have invested in.
But still, I think the translation from AUC to just the business metric is harder just because of the nature of the arts business.
I see.
And any specific that you can share what you do to mitigate this?
Basically, what is it that you do then?
And instead, if you actually don't have that freedom, but you are aware of the problem that it might not transfer that well into online improvements.
So what are the mechanisms or approaches there that you employ?
I think one thing is I think AUC is not translating is also if you're looking at smaller number.
So what it means is typically we are trying to reach towards like higher use events in the offline setting before we move to the online setting, because that increases the chances of getting gains and seeing the gains in the online setting.
The other thing is around making sure what we see in the offline setting is directly related to what we see in the online setting in general.
Also running the models longer.
Like one of the selection biases can also be reduced as the model runs longer on the training traffic as it ramps up on the online traffic.
It starts to also influence its own distribution.
So like running models more longer before we make decisions on the experiment.
So that also slows down the experiment velocity in general, because you cannot put all models off online.
But that's a challenge that I would say is a continuous challenge.
Could you touch a bit on importance of real time features here and how, let's say when a user comes to Pinterest and performs certain actions quickly and how well you can adapt to maybe showing some different ads.
So is there something in terms of that real time adaption that you could share?
Yeah, I can share like one thing even before I go and do the recommendation model, one thing I also want to share is ads always have like some component of real timeness in them, which is probably not needed for organic at times.
One thing is like ads has this concept of budget like advertisers want to let us spend $100 on the platform.
Now as a platform, we guarantee that we will never spend above your budget in general.
You never spend something more than $100.
Now as the users interact real time, this information of what the user is doing and interaction and clicks needs to be computed real time to make sure that you don't overspend the budget.
So in that sense, there's always a budgeting and a pacing system that runs real time for ads businesses.
Which is like maybe computing every minute or every two minutes to get to understand like how much budget you have spent from an advertiser, how much budget you have.
So what it means is even for ads like before these sequences came in or before these like kind of structures like they're kind of features.
That ad system had that you can generate like, has this user interacted with this advertiser in last maybe one minute or last two minutes.
So these kind of features were already present in like 2019 because the infrastructure to support real time is what's much needed for ads.
But as the machine learning techniques became more advanced and I think Prabhaj shared some of those papers around transact and pin our farmers as models today like at winter is like very closely follow also what the home feed models are the ranking models are doing.
And we utilize the sequence of features, we utilize those real time sequences of how the users are interacting and use that to make the personalization better.
And one thing also I want to highlight is, I think we had a blog last year around machine learning environment.
We standardized our ML ecosystem.
So what one thing it helps us is to unify and utilize knowledge sharing between the organic and ads component and move faster.
These techniques that happen like transact and all of those techniques are in the same ecosystem today.
So that makes it easier for also ads to adopt and also like shared knowledge much more efficiently with the organic teams.
Which was like a bottleneck before when we had like two different ecosystem.
Mm-hm, mm-hm.
Yeah, I guess it was at blog post, so if it opened right here, ML and standardizing ML at Pinterest under one ML engine to accelerate innovation.
Where you I guess touched also a bit on these paints that you have seen with different architectures being used and then different types of approaches and systems and then unifying that a bit so it was definitely worth reading.
This is more about technical or infrastructural side of things.
Just by nature Pinterest is more of a leaner company.
So you have also to think about how things can scale horizontally to different use cases without reinventing everything every time.
How is this working in terms of organization?
So is the recommendation business quite separate from the ads business or are you folks collaborating quite closely on those topics or how could I think about that?
Because I guess there are some companies where there might be an ads team and there are personalization or recommendation teams.
And then both or all of these teams are somehow using similar techniques but they have different objectives or cater to different needs or stakeholders.
So how much of collaboration exchange of ideas and approaches is there actually or how do you organize this work to also stay lean?
I think one thing I would say like since Pinterest is leaner, I think collaborations are much, much easier in general.
So I think for me, for example, I think basically what I feel is like, let's say if I have an idea that I want to pursue, I can basically collaborate with anyone in the company that I feel is pretty, can help us in that idea of getting through in general.
So I think collaborations at least are not defined here like team names in general.
So for example, for a very long time, like my team name was like conversion optimization or like ranking and we're doing optimization but for a very long time, our team also like bidding systems, spacing systems, which have nothing to do with ranking systems in general.
So basically anyone who has an idea that they want to pursue, you are more free to collaborate for people.
I think collaborations in the machine learning teams between comfeed ads, organic, it's pretty strong.
I think there are weekly meetings between all of those people together just to share ideas.
Each of the teams can also adopt or utilize them for efficiency.
In terms of challenges and what you're going to address, looking at ads ranking, ads personalization, I would also still assume that perfection isn't realistic there, but what is it that you take as next steps to reach for perfection in that sense?
Technically, I think the personalization models can definitely become better and better in general to deliver more personalization.
I think the other line of research or line of work that the company is also doing is around like, how can you blend ads and organic together better in general to have a more unified experience and a more whole page kind of optimal solution.
So I think that area is pretty exciting too.
How do you go about doing that kind of optimization?
And I think I would say given after last year, once we have done the machine learning standardization, I think the velocity, the speed of doing experiments has greatly improved in general, bringing more new ideas.
So what it means is we can iterate much faster than what we used to do before.
So that brings it like exciting things that are happening in maybe the LLM domain or the Gen AI domain, are much, much easier to bring to power today.
That's another exciting area that I'm looking for.
That sounds nice.
Just maybe as a reminder for the listeners, as we have just seen a GTC taking place last week, there was also again a talk by Pinterest where some of your colleagues shared more about how they made effective use of GPUs and also actually bringing inference to GPUs where you had some interesting experiences to share.
So I guess nothing that we won't go into depth for today, but maybe that people want to check out who are more interested in the engineering related stuff.
However, this was not the only conference that caught my attention when preparing for this, because there's actually always the conference that is at the center of this podcast, which is actually the RecSys.
I have seen that I hear that you were actually also actively engaged in RecSys 2023, if I'm not mistaken, and being part of the organization committee for the RecSys challenge there.
What was it about and what was your role and what were the results or the outcomes for you, but also for all the people who took part in that?
Yeah, I think that's been like last year, I think somewhere around this time, I guess.
But that was like a RecSys challenge that was organized by share chart.
Like share chart is like an Indian social media platform.
I think one of the major theme of the challenge was around how to improve this deep funnel optimization.
So what deep funnel means is like this conversion objectives and how we can do it with that like user privacy at heart.
How can you make sure I think in 2020, 2021, like Apple and other like advertising browsers and like other platforms have been like pushing towards like more user privacy centric advertisements in general.
And this challenge like was trying to bring that to researchers, a data set where they have like the data set coming from share chart in general, then anonymized data set coming from like around 10 million, I think random users who visited their share chart over three months to make sure that you can do this with user privacy and personalization.
So I think that was the main focus of this.
Probably this is one of the biggest like privacy centric data set that was released from a privacy where conversion modeling around like around that time.
And I think it was pretty interesting.
We had like a lot of like good, good submissions interactions with both researchers and also a lot of workshops and the good research discussions and how to make sure that we can have more private recommendations.
And for the future, can we also expect Pinterest to release a data set for an upcoming RecSys challenge there?
I think that I have a need to check, but I think it needs like a lot of work and coordination.
Yeah, maybe that would definitely be cool.
Maybe have some industry, a great data set there, contribute to the community that people can work on and try to reproduce interesting results, or maybe also have that embedded as a RecSys challenge.
Maybe is there something that I mean, that is really a broad spectrum of topics that I have maybe forgotten you to touch on or anything that you feel like this is definitely something that I want to share from the Pinterest personalization and ads ranking experience that might be important or interesting for all listeners.
One thing I also want to share the difference between ads and organic in some sense.
If you look at ad systems, like when I joined in 2018 itself, like our systems were being incrementally trained, the models were replaced at like an hourly frequency.
So the requirement for that is like ads, like one thing is ads content die pretty quickly, like you might be having one at that runs for maybe a month and you don't have that content coming up again in that perspective because campaigns have a short lifespan.
And because of that, models need to be retrained more frequently so that you can capture the change in distribution.
But I think for organic, that was not the case in 2018.
But I think over time, like more frequent training is something that people realize help and organic also starts to like maybe training models more incrementally.
But yeah, but for ads to start itself, I think that was one of the other considerations that people had models also that needs to be more fresh.
And is it then the case that you are retraining those models from scratch or is it basically that you do some some increments of training with new and fresher data without resulting the model parameters?
Yeah, I think for ads in general, we don't restart from scratch in general, we basically train incrementally on newer data with a lot of validation checks coming in because the data sets might have issues and you don't want to publish models that might break the system.
A lot of validation checks, but mostly models are being trained incrementally to capture new distribution.
Yeah, a lot from other side has to cover a lot.
Yeah, and the personalizations like on the organic side just covered like one aspect of personalization.
But I guess there are a lot of other surfaces and a lot of others like search, ready to transfer, like similar interesting things happen.
So yeah, please feel free to check out labs.printer.com to see get updated about all our work more coming this year.
Great, okay, I will definitely put that in the show notes as well so that people can easily find it.
Nevertheless, I guess labs.pinterest.com should be also easily enough to remember.
Great.
Yeah, in terms of thinking about personalization in the many products that we use in our daily lives is there and unfortunately apart from Pinterest, something that you definitely enjoy and where you think that personalization has been done very great in that product.
I think Netflix, I think Netflix is something which I think was like one of the pioneers of making like an open source data set challenges, bringing the community together.
In my opinion, I think Netflix definitely is one of the companies that has been pushing the frontiers of the communication system.
Yeah, I guess apart from Pinterest, I guess the other.
Yeah, I think Marcel caught that in my bio.
It was not mentioned in the podcast today.
I did spend like four to five months at Netflix at some point during my masters.
Prabhat, sorry, I interrupted you.
What was it that you wanted to share?
Yeah, I was just going to share my, I was just sharing my, so yeah, I think for me it will be Instagram Reels. I do like how good the recommendations are and keeps you engaged.
They have the Instagram problem as well that it keeps you doing it and you have to take a big bit overall. It's also like a good mix of engaging content and new things to do.
It's another greatly personalized product, I think.
Then this was a great overview about the stuff that you're doing at Pinterest and I guess there is much more to come and refer again to the show notes and yeah, find more information and maybe ideas and to go back to where we started from, find inspiration, inspiration from papers and from the content that you share and generate. But this is where I would leave it for today.
Thank you very much for attending and sharing those insights, your experiences with the community. It was really nice having you for the show.
Yeah, thank you more soon. Thank you everyone for joining, for listening so far.
Thanks for enjoying, hope you enjoyed.
Thank you, bye.
Thank you so much for listening to this episode of RECSPERTS, Recommender Systems Experts, the podcast that brings you the experts in recommender systems.
If you enjoy this podcast, please subscribe to it on your favorite podcast player and please share it with anybody you think might benefit from it. If you have questions, a recommendation for an interesting expert you want to have in my show or any other suggestions, drop me a message on Twitter or send me an email. Thank you again for listening and sharing and make sure not to miss the next episode because people who listen to this also listen to the next episode. Goodbye.
Bye.