Recsperts - Recommender Systems Experts | #6: Purpose-Aware Privacy-Preserving Recommendations with Manel Slokom

Episode number six of Recsperts is about purpose-aware privacy-preserving data for recommender systems. My guest is Manel Slokom, who is a 4th year PhD student at Delft University of Technology. She served as student volunteer at RecSys for three years in a row before becoming student volunteer co-chair herself in 2021. In addition to her work on privacy and fairness, she also dedicates herself to simulation and in particular synthetic data for recommender systems - also co-organizing the 1st SimuRec Workshop as part of RecSys 2021.

Show Notes

In episode number six, we welcome Manel Slokom to the show and talk about purpose-aware privacy-preserving data for recommender systems. Manel is a 4th year PhD student at Delft University of Technology. For three years in a row she served as student volunteer at RecSys - before becoming student volunteer co-chair herself in 2021. Besides working on privacy and fairness, she also dedicates herself to simulation and in particular synthetic data for recommender systems - also co-organizing the 1st SimuRec Workshop as part of RecSys 2021.

This episode is definitely worth a longer run. Manel and I discussed fairness and privacy in recommender systems and how ratings can leak signals about sensitive personal information. For example, classifiers may exploit ratings in order to effectively determine one's gender. She explains "Personalized Blurring", which is the approach she developed to personalize gender obfuscation in user rating data, as well as how this can contribute to more diverse recommendations.
In our discussion, we also touch "data-centric AI", a term recently formulated by Andrew Ng, and how adapting feedback data may yield underestimated effects on recommendations that can lead to "data-centric recommender systems". In addition, we dived into the differences between simulated and synthetic data which brought us to the SimuRec workshop that she co-organized as part of RecSys 2021.

Finally, Manel provides some recommendations for young researcher to become active RecSys community members and benefit from exchange: talk to people and volunteer at RecSys.

Enjoy this enriching episode of RECSPERTS - Recommender Systems Experts.

Links from the Episode:

Manel on Twitter
Manel on LinkedIn
Manel at TU Delft (find more papers referenced there)
SimuRec Workshop at RecSys 2021
FAccTrec Workshop at RecSys 2021
Andrew Ng: Unbiggen AI (from IEEE Spectrum)

Papers:

General Links:

Follow me on Twitter: https://twitter.com/LivesInAnalogia
Send me your comments, questions and suggestions to marcel@recsperts.com
Podcast Website: https://www.recsperts.com/

What is Recsperts - Recommender Systems Experts?

Recommender Systems are the most challenging, powerful and ubiquitous area of machine learning and artificial intelligence. This podcast hosts the experts in recommender systems research and application. From understanding what users really want to driving large-scale content discovery - from delivering personalized online experiences to catering to multi-stakeholder goals. Guests from industry and academia share how they tackle these and many more challenges. With Recsperts coming from universities all around the globe or from various industries like streaming, ecommerce, news, or social media, this podcast provides depth and insights. We go far beyond your 101 on RecSys and the shallowness of another matrix factorization based rating prediction blogpost! The motto is: be relevant or become irrelevant!
Expect a brand-new interview each month and follow Recsperts on your favorite podcast player.

Note: This transcript has been generated automatically using OpenAI's whisper and may contain inaccuracies or errors. We recommend listening to the audio for a better understanding of the content. Please feel free to reach out if you spot any corrections that need to be made. Thank you for your understanding.

Ratings can say a lot about you.
User privacy. This is the main goal of my PhD.
I'm trying to protect the user's data. Simple.
If I will just destroy male's recommendation to make it equal to the recommendation for females, this is maybe easy to do.
What is hard is to improve the recommendation for females.
We stayed close to the user preferences.
We did not inject bias. We did not inject noise because it's personalized and we know you will like it. So you will not feel this is not your preferences.
It's in line with your preferences, but also it helps to protect your gender.
So a classifier will not be able to know if you are a female or male.
Synthetic data is a data that looks like original data, but it is fake.
Changing small things, protecting users, and keeping the recommendation quality.
And keeping the recommendation quality.
Hello and welcome to this new episode of RecSperts. A recommender systems expert. This time I'm joined by Manel Slokom. She's a researcher at the Technical University of Delft, where she is basically in senior year of her PhD candidacy. She is also a teacher and supervisor at Radford University and works for Statistics Netherlands, which is the National Statistics Office of the Netherlands. And she actually has been serving the RecSperts community also to a great extent.
She has been a student volunteer for a couple of times and also in 2021, served the community as a student volunteer co-chair. Besides all of these things, she also served in co-organizing the first workshop on simulations for recommender systems as of RecSperts 2021. Hello Manel, great to have you on board. Hi, wow. Thank you for the introduction. You really covered everything.
So, hi everyone. I'm happy to be here.
I really, really appreciate your invitation and I am more than happy to speak and I hope people who will be listening to us to enjoy our conversation.
I hope so as well. And I hope we have lots of time to talk about your research, which basically sent us around a purpose-aware privacy-preserving data for recommender systems. So pretty dense, pretty interesting and I guess it might also be addressing many of the problems that we face in recommender systems.
I would really love to talk about this point. Maybe first of all, can you give us a bit of background how you join the RecSperts community and how you find your way there? Yeah, wow.
This is, this already, I will need a lot to say to mention to you my story with recommender systems. Okay.
My recommender system path started in 2015 with my master thesis where I worked with my supervisor Rauya Ayeshi in Tunisia.
And I remember she just said she is interested in recommender systems and then it's for me to do the literature and find what I want to do in recommender system for my master thesis. And I ended looking into recommender systems in general, like everyone, content-based, collaborative filtering, hybrid. And then I found okay, it looks collaborative filtering, I'm more into this and then I went to item-based. And I ended doing my master thesis on item-based collaborative filtering in addition to another, it's called Possibilistic Theory. It's more about the preferences and the hypothesis that preferences are changing over time and there is an uncertainty.
When there is uncertainty, so we can talk about, I don't know if you know the flu, possibilities theory, it's another thing.
If you like to give a couple of points about that, please enlighten us.
Okay, so we know all the probability theory.
Possibilistic theory, it's kind of similar to probability theory but it has also other terminology, other knowledge. One of the knowledge is that we work between zero and one, so I need to map the user preferences.
If it's explicit feedback, I need to move them to zero one and then I need to convert them between zero one. It's not like explicit to say zero, you didn't like the movie for example, or one, you really loved it, no. So it's kind of degree of preference you could refer to?
Yes. Okay. And then there is a back when I worked on it, I proposed like a similarity measure different from person correlation, all of the similarity measures that we know and it is more in the possibility theory.
Okay, I see. And this was basically the topic of your master thesis. So have you encountered recommender systems before already in your master studies or was it really like there is a master thesis and here is RecSys, now please get on board?
No, I didn't encounter recommender systems before. I didn't have any course, except I mean, if I interact with social media, this is where I see recommender systems, the famous notification, you may also like this or some suggestions about pages or something.
But I never thought about recommender systems as a topic for my research career. If you picture yourself back then as basically an unknowledgeable consumer of recommendations or suggestions and nowadays as a researcher in recommender systems, what would you think has changed the most in your perception of recommendations?
A lot. A lot of things.
Okay, first in my master, I was finishing my master with collaborative filtering. I wasn't more and more into recommender system. But this time, because we are face to social media, I wanted to know social recommendation. And this is why I ended after my master doing social recommendation.
And I wanted to know how Facebook is doing, those France recommendations, France of France or six degrees of separation. So I was the small word theory in all of this. So you see, I was impressed by social media and I want to know all the details. How the algorithm behind what we see, how it's working, how can I make recommendation for France? And then it came to my mind. I see all of this and Facebook is doing something nice.
Okay, we have France recommendation. What I feel is a bit missing is the recommendation of collaborations, authors' recommendation. And after recommending authors, what I mean by this, me and you, we work on recommender system.
This is a topic in common. And then maybe we both work on simulations. But we never met each other and we don't know each other. You know, the word is, okay, no. I was talking about small word and now mentioning the word is big and we are billions of users.
So if we didn't meet at RecSys, it's not sure we will know each other. So I was working on a paper for social recommendation, but it's mainly for researcher. I recommend you a researcher that you may be interested to work with to collaborate. And then I mentioned to you what's the topic that can be the contribution or the topic that you can both of you work on and probably end by having a paper or something submitted. And this was your first research work. So where was that point from the master thesis that you said, okay, this is so interesting.
I also want to do research there and write my PhD thesis about this topic. So what was the kind of transitioning from your master thesis to your first paper on social recommendations or recommending collaborations?
The changing, I was in my master. It was more simple and I can understand its item-based recommendation.
So basically I know there is a distance, there is a correlation between items that I need to compute. And then I can make, if you watch it more movies than me, and we have so many movies in common, then I get recommendations from movies, you watch it.
But then the next was social recommendation and specifically link prediction. This is where the changing started coming and how can I make the prediction of missing link or a link in a graph.
So you see the graph has nodes, has relations and the link is a relation that can happen between authors. And for that I chose to work on kind of Bayesian networks. It's called probabilistic relational models, PRM.
And PRM are doing what I'm saying. So I try to take this domain from Bayesian networks and apply it to recommend a system to predict links.
Okay, I see. This was basically then the groundwork for your first publication that you also I guess presented at RecSys, right? No, I presented it in another conference. Okay, until the beginning of 2017 I'm not yet attending RecSys.
And I was really hoping and wishing and it was a dream for me to attend RecSys some point in my life.
And I remember, so I was in Tunisia and it's a bit hard to go abroad in terms of money, visa, everything.
So I was trying since 2015, since I came across recommended system to send messages, emails to RecSys community, organizers of the student volunteers for example if they need help and if I can get a visa to go to RecSys. Okay, good idea.
Yeah, and I remember in 2016 there is a problem with the visa. I reached the people and they replied back to me I remember it was a bit late. So when I saw the RecSys website or the LinkedIn publication, the call for student volunteers already closed. Okay, I see. So I said to them okay, I will be watching if you need help just remember me or let me know. And what happened? The year after? Yeah, this is 2017.
This is the year where my career changed.
Perfect. Okay, I said it changed it a lot in my view, maybe for people no, but in my view it's like 180 degrees. Suddenly I got a position in the Netherlands where I am now and I remember when I submitted for the position I was saying okay, it's like a game. I will not say I will get it. Anyway, I'm working on a recommender system. It's 50-50.
I will just do my best. If it works, that's nice. If it didn't work, it's okay too.
And I submitted the application in September 2016. And it was a very long very long interviews. Started September or October 2016. Then I met for the first interview in December. Second interview in January 2017. Then there are so many interviews to a point that I forgot how much, but until May. And I remember it was 15 of May in the afternoon I got a call to my phone. And I was surprised because I see a call from a number that is written like a private number. I cannot see what's the number. I cannot know who is the person. And I was really scared if I don't like to take a phone from a number that I don't know. Normally the kind of calls that you would not accept having. Yes. I was saying, hello, in French. Then a person was speaking in English. And I was surprised because in Tunisia we speak Arabic, dialect, and if it is official French. And again in my mind I do the interview but I try to not only focus on that interviews. So I don't want to make my life stressful and I keep going. If there is something, if something will happen I will wait. Just wait. I got a call and the person started speaking in English. And you know I'm in the background trying to guess the voice.
Who is the person? And she was saying, hi, hello.
This is me. I am Martha Larson.
So I remember I hear that and then I'm listening but if you say to me I don't, I understand what she was saying but I cannot speak anymore.
It feels like I don't know English. I was surprised. I kept listening and said yes. And then she said, you know, me, Alan, Alan Hanjarik is my promoter now. And Fabian and everyone who I did the interview with, they agreed that I am the first candidate. Wow, nice. To be selected for this position. And I was listening to all of this and you know, I'm happy, I'm surprised. I don't know what to say at that moment.
And she was saying a lot of things. And then she said to me, you know, I feel that you are surprised. And what I will do, I will write an email and say all the details that I just mentioned to you and please you have time to think and then come back to me if you want to take it or not. And this is from where my career shifted. And I guess the outcome was obvious. You were suddenly all into it. Yes. So there was even not the question but you were very surprised. It sounds like that. But great, great story. Very inspiring.
So then I started meeting Martha. Of course I needed to convince my father but my father already convinced his soul. And once I got the yes from my father, I replied back to Martha. I think it was next day.
And after that, I remember we were talking and suddenly she mentioned to me RecSys summer school. She said, oh, by the way, like this I still remember how. She said to me, you know, by the way, there is a RecSys summer school and the RecSys conference in Italy, in Europe and it's nearby the Netherlands. Would you like to go?
I said, of course.
You know, if it's for me, I would go right now.
And yeah, this is how I started my PhD.
I started my PhD with a dream that was a dream for me to attend RecSys and the dream came true. And I came to the Netherlands 15 of August, 21st of August I took the flight to Bonzano.
So you almost haven't had enough time to accommodate yourself in the Netherlands. No, I just signed up my contract.
So I signed up my contract and I went to Bonzano.
Wow. And next week this is where I met my supervisor for the first time at RecSys in Como, in the venue, face to face.
So the first time I met Martha was at RecSys.
Yeah. There's not a better place to meet your supervisor. No, I'm really for all of this.
That's why I said it's really a dream and you may feel or not. I still, I can't believe this dream came true. Yeah. And this has already been a couple of years ago and you are now just at your dream getting maybe to the next stage or something like that. Yeah. With kind of finalizing your PhD this year, maybe. Hopefully.
Yeah. Really, really nice story. I really feel like you are totally overwhelmed still and very much into it.
And actually it's funny though a bit because the RecSys in 2017 at Lake Como in Italy was actually also my first RecSys. So I guess we haven't met at that RecSys or maybe we didn't know each other then. But actually this was also my first RecSys after having finished my Master's thesis back then. You know for me that RecSys was a lot of emotions, a lot of things. The only word can describe me at that moment is being overwhelmed. Very excited. I was in the summer school and I was trying to note everything, try to know everyone, try to learn everything which is impossible to a point I really don't know.
It's too much. And this one advice that I would give to all the students or younger generation who wants to attend conferences, do not push yourself. You are not meant to just be there and to learn everything.
Actually you see a recommender system is a very broad topic.
If you want to learn collaborative filtering, you want to learn about the graph in recommender system, about differential privacy, about privacy, all of this in recommender system it's too much.
We cannot do it. But what you can do and what I then learned to do is just to try to know people, socialize, try to position people.
For me I remember 2017, some names I still remember very well. Of course.
Michael X-Men, Joe Conston, Robin Berg, was the names and more did margin us. There were names that I read their papers. I know the authors by heart but I never saw them. And then I saw them.
That was the first. Then I started seeing and knowing Nava about group recommender system.
You mean Nava Tintera? Yes.
Now recently it also explain about recommender system.
Kristin Bauer, music recommendation.
Cynthia Lee, she is my colleague. She is also into music recommendation.
She works actually in your laboratory. We are both from the MMC group, Multimedia Computing Group at TU Dev.
Nice. So what they ended doing at Lexus, the first Lexus is trying to know people, position people in my mind, in my brain somewhere, what they do, which topic exactly they work on. So the next years I can be ready to know them more, to interact with them, get familiar with the topic with the Lexus in general.
It worked. Okay, that's a really, really nice approach.
And I definitely feel you like you are passing by all these famous people that are shape-ening that field and you just think, oh, okay, nice. Now I'm sitting in his or her workshop and we are talking about these topics and yeah, it's always like some people from the outside think that recommender systems is a very narrow or specific topic or might refer to it as being narrow and specific and then once you are in, you see that there are so many, many directions. This is also why you really like your comment or your advice in don't feel pushed or obliged to kind of understand and know everything in recommender systems because once you get into the field, I guess if you are in practice or in research, you see that you need to somehow specialize on it.
And then, of course, the second advice that you gave, get to know the people.
So, for example, there are people who specialize in sequence array recommender systems.
There are people that specialize in multi-stakeholder recommender systems or people that have their expertise in privacy awareness like you and then know these names and get to know the people at Lexus and then give them a call, write them a message or mail when you are kind of touching this field and need their assistance because they, at some later point, might also ask you for some advice and this is I guess how this great community can nicely collaborate and also support each other and thereby also advancing the field in research and in practice. So, really, really great advice.
Love that.
Yeah. Nice. Yeah, so 2017 then was actually your, I would say, great Lexus explosion being overwhelmed, getting to know the people, directly getting to know Marta.
And how from that point did you develop into your field? So, there must have been some process of narrowing down. I've seen you have done some work in simulations.
You have been thinking a lot about how to treat the data that many of our models kind of set on top and also that aspects of purpose awareness, which I hope we will come to in a minute, but also about privacy.
Many, many topics. So, how did that evolve at that time and how did you somehow nail down your topic and can you talk about this?
Yeah. So, you will hear me saying a lot about the help of the supervisor and I feel it's really important. I didn't know it before.
When you apply for a position, you need to know the place, you need to know the person with whom you will work. I didn't do this. When I applied, I just, I'm interested in the recommender system.
But now, I confirmed.
So, after Lexus, I came back and I am super enthusiastic and it's my first year and I know I will be working on privacy and recommender systems. What exactly do we need to find? It's my first year of the PhD.
It's called Go No Go. So, after one year at Team Delft, you have a Go No Go meeting, like a small defense.
You need to present what you did in the one year, what you want to do in the next three years, etc.
In my mind, I want to go back to Lexus.
You know. And in 2017, I attended the doctoral consortium. I just, I don't have a paper. I'm still new. I have nothing. But, you know, I want to know everything in this recommender system conference.
Everything. So, I don't want to miss whatever.
I guess it's also a totally valid approach. First, go totally into breadth and pick up everything and then decide on yourself where you want to go deeper into, right?
Yes. So, it's kind of filtering. So, my first target for the next Lexus is at least to be at Lexus again, of course, but with a paper. And it would be nice if the paper is a doctoral consortium paper. Because I saw how the mentors, how the PhD candidate presents the work and how the mentors are super kind and the way they give comments to the PhD candidate and all those comments will be important for the next years for the PhD candidate. You know, they are senior in recommender system and they have a more vision like compared to me. So, whatever they will say, it's important. With that in mind, back to depth. And I need to find what exactly I will be doing in privacy in this huge topic of privacy in recommender systems. And I started looking and I ended by having a big diagram and every time I have a meeting with Martha.
And I say to her, see, this is a diagram. Privacy in recommender system is big, but we can try to make it into two parts. One, where we do security, where you try to secure the recommender system.
And in that one, you see there is some work by Robin Burke.
Secure recommender systems. And what they try to do is that there are some attacks like push and nuke attack that can happen to the recommender system for some there is an intuition behind. And the intuition is not good. For example, I can tell you, me and you, we have two companies and we sell the same product to our users. Then what I do, and because you are dominating the market and you are more famous than me. So, what I can do, I try to go to the recommender system create fake users or fake items just to lower down the ratings of your product. So, just to make your recommendations not appearing in the top. Yeah, yeah, makes sense. But instead, they can maybe get my products instead of yours.
So, this is securing the recommender system from attacks. Just to shortly interrupt you, this is, I guess, a good point for all listeners to also reference the previous episode we had with Felice Mera, where we are actually talking about the topic of adversarial recommender systems. So, yeah. Yes, good point. Well, yeah, I like this. Then the second part, it's about the users. The user's information, how to protect users.
And it's more into privacy preserving the user's data. And I like that topic. I know there are so many works there, but it's a matter, what I like in doing research is that you do what you, for me, I have to do what I want. If I do what I love, I will be enjoying and I will, you know, you will not feel it's painful or sometimes, yeah, I'm stressed, it happens.
But as long as you enjoy what you are doing, you will not feel that pain.
It's easier because you are constantly motivated and mostly intrinsically and then everything feels a bit easier to do, even though, as you said, sometimes it can become stressful, but there is also the distinction into positive and negative stress.
Yeah. Yeah. So I like the positive stress. This makes me productive. Yeah, that's cool. And this is then how you, I really like the distinction that you make when you say, okay, in that topic, you have that security aspect and you have the privacy preserving aspect and then you kind of quickly decided you want to go more into the privacy preservation, right?
Yes. And then at some point, you know, there are, again, different techniques, different ways to protect users and then it will depend on what you want to protect. And then there is a threat model came into play. Okay.
In my PhD life. Yeah. So threat model, people may ask what's this threat model. I really, really like the idea. It's not a new, it's from the 1980 threat model. It's a theoretical formulation that you do before working on a privacy for your paper, for example. What it says, a threat model contains the objective resources that are available for an attacker. Okay.
So for the attacker, what information available for him? What is the goal of the attacker? And then how can we protect against that attack?
It's the countermeasure. I give you an example. Go ahead.
Netflix prize. You know, it happened in 2006 and Netflix released for the challenge data and participants were running and there, the matrix factorization came to place and there is a winner and they won a lot of money. After two years, exactly two years, in 2008, there is a group of researchers who were able to de-anonymize the Netflix data. And Netflix guy, company, when they release the data, they say it's anonymous.
Anonymity means that you cannot link or identify a user. But what happened after two years, the data was de-anonymized. And you know what? All the users that are in that data, the information about that users became public. And this is from where I started and I like the idea of my PhD. For two things.
One is that key anonymity and anonymization is good and we need to anonymize the data as a first protection. And it's called against identity disclosure, the first attack. And second thing, it may happen probably like what happened for Netflix price, the data is de-anonymized. So try to protect against inference attack. Which means what kind of inference you can do if you have this data. What can you infer?
It's not explicit in the data, but you can get it.
And it's a second type of attack called inference attack.
We are nicely boiling down.
So based on the threat model, you can say the attacker wants to know who is me. So trying to identify me in a data.
And they can use resources like for Netflix price they used IMDB, scraping IMDB and ending by linking the two sources and the attack happened. Or you can do reference attacks where you have a data and then you want to infer some information about me. For example, you want to know my gender just based on the movies I watched. And it is possible. You know, it is possible because I have that paper.
That actually relates to your paper that you published last year.
A pretty long paper I've seen that you published in a journal about towards user oriented privacy for recommender system data.
A personalization based approach to gender obfuscation for user profiles.
So now we are really at that point. Yeah. And then I think the reader will say, oh, it's a very long title and it's already a lot of detail. I can give it to you more easy. User privacy.
Of course, this is the main goal of my PhD. I'm trying to protect the user's data. Simple. Of course, when I mention privacy, because there is a threat model, there is a threat that can happen to a data. So the user's information is linked.
So what's my threat is about? The threat is saying that a user can have a pre-trained classifier.
SVM, a very simple classifier. We do not need to go to deep learning or other complicated black box. We just stay with a simple logic regression or SVM. With this pre-trained model from the attacker, he is able and then he gets access to extra information.
Some movies that I watched, he will be able to know my gender. Okay. What are in that case the input features of the binary classifiers or the classifiers in general for that model? It's simply your user vector. The user vector is you, I mean the ID, plus the movies that you watched. Okay.
But having access to the user ID itself shouldn't be a big deal.
I would say from an naive perspective.
So what is pretty relevant, you say, is the item IDs that I know or the item IDs of the items that you watched. And then of course I'm able to link those items to their metadata. So for example, I know what have been the genres, what have been the publication date of a certain movie. And this all gives me enough information to be able to predict the gender. Yes.
Okay. We did some exploratory analysis for MovieLens 1 million.
In MovieLens 1 billion, you are given that attribute about the user.
But now I'm mentioning the exploratory analysis, how we ended building the paper, is that we found the signal. If we try to find which movies are more linked or more of interest to females versus which movies are more of interest to males.
We ended by seeing some signals like males users tend to watch action horror movies.
However, female users like watching drama, comedy, romantic movies.
And just there is one point why I like the inference attack.
Again, beginning of my PhD, I came across a PNAS paper where they showed just from five likes on Facebook, just five likes. They are able to infer our orientation. Whatever.
It is political orientation, religion orientation, whatever orientation. So see how much is. That kind of study in which they checked for different amounts of likes, how good they were able to resemble the so-called ocean profile of a person. So these four or five psychological categories I could use to describe an individual by. And this was very interesting because I guess in the end I said 100 or 150 likes are enough to better judge you on your ocean profile as your partner or something like that.
So you see it's really possible. And I was impressed by that paper. And this is when I said, yeah, I agree with anonymization. Then I remember I said to my supervisor anonymization is key and important as a first protection.
But it's not the only protection that we have to make.
Especially if you want to release the data. Think too much about the inference attack. See how a little bit a little bit of knowledge for the attacker, a little bit of resources knowing that we are in the age and era of big data. And unfortunately users data is available everywhere.
And I guess once it's out, it never gets back. So with the Netflix data you see that for example it was somehow then discouraged afterwards to use that data or somehow Netflix itself isn't providing that data set anymore. But nowadays still everyone in many research papers is using besides available movie lens data also the Netflix data to kind of reproduce because it's somewhere on Keko and then everyone has access to it. So it's like if you are publishing something about yourself on the internet, the internet does not forget.
And the same I think applies to data in this regard. So you should do a very good job in being sure that you anonymize properly if you especially as a company decide to publish data that you claim is anonymized. Right. Exactly. So once the data goes out of your hand, my hand, anyone it's not in our hand anymore then do expect anything.
But for me at the end, I say why I like this topic and why I went to protecting users rather than securing the system.
I know so many people are working there. But protecting users is more close to my heart in a sense. I know many people who are not highly educated or many people who are not really aware of what is happening. The dark part of social media. Again, always my supervisor Martha says whenever there is a free app, do expect it can crash at any time. The cost is the user. Simply. I would definitely add to it that also people that kind of charge you costs for using their app or buying their app they might also have an incentive to use your data in addition in order to increase revenue or profit or something like that. So I guess the probability is definitely higher. So I definitely go with her comment, but I guess it's not limited to only free apps. Right. Yeah. I remember she was mentioning this when we moved to COVID and we needed to use I think Zoom or and we always used Overleaf. And because Overleaf is always now there is an authentication it will be ender maintenance and always Martha mentioned to me you should always download, keep a version into your local machine. Keep updating that version. We use the system and it's for free, but we should not totally blindly trust the system. There's no guarantee. That's the point. Yeah.
And I feel her advice can go, can feed to all the free apps. In a sense, see how Facebook, people are not really paying to get access to Facebook, but we see how many times there is a breach happen to Facebook, you know, so how people are giving their data just for free. I'm trying to tell my parents. My dad has an account and when I moved I don't want to use Facebook anymore. So I tried to convince him let's use another app to have a call just because I don't want to give my data to this Facebook. I mean there are pros and cons here, of course. I would agree there is this kind of dark side. Sometimes I feel it's a bit exaggerated or then claims are too lightly done and not really based on science or really backed up by it. But I definitely see the risk and we have seen some scandals. But also on the other side, this is sometimes maybe rather a problem of proper data education and then sometimes possibilities and the opportunities in our digital life are just growing or expanding faster than we are able to educate people about how to properly use them. Because for me it's rather a trade.
I am a fast Facebook user. I use it seldomly, but I have also Instagram and I just basically am very aware of it.
But I also recognize that we are among the people who especially understand these technologies to a certain degree. I would not claim that I understand an Instagram perfectly. But you see I'm offering something about myself such that the platform can of course sell ads. But of course this platform needs to sell ads. It needs to somehow earn money. I mean earning money is not something evil by itself.
It's basically about how aware are these people about what they are sharing and that you are kind of trading off your data against kind of possibilities also to connect with people or to communicate with people. I guess a great example there is also WhatsApp.
WhatsApp makes communication very convenient. And of course, therefore we kind of grant access to our data.
I would say if there is no signal, no other app, I have to use Facebook and I will use it. And I was a user of Facebook actually.
But you know, once you get into the domain, once you know you read more and more and I get I am more aware than before. So I know how I have to give my data where I should give it, how much data I can give. And for now I would say I am only active kind of in Twitter and a little bit in LinkedIn to advertise for two things. So I didn't use LinkedIn for quite some time. I used it last year when I advertised for student volunteers. Simply because I have so many friends who are in Tunisia and who are everywhere in the world and they are in LinkedIn and I want them to see there is an opportunity to be at Lexus. And also in Twitter. So I will not say I am not a user of those social media. I am still a user and I like them and I use a lot of Twitter when I am at Lexus for another reason which is I try to tweet about presentations talks just because I put myself in the place like back to 2015 and 2016 when I am not able to join Lexus. So I always in my mind there are people all over the world who will not be able to join Lexus.
If I put for them a little bit of text because I don't have too much time, too much knowledge to say but more screenshots or picture of the slide so they can know or they can say oh this paper looks interesting I will have a look at it. It's really enough. So you basically use it to increase accessibility of education of content to people who then just say hey I know Manal. Manal has published some stuff okay I will have a look into it and otherwise they wouldn't even be aware of this content that they are interested in exists. Yeah and also to keep myself in contact this was an advice given to me in 2017 when I came to the Netherlands. My colleague Sude before going to Lexus she told me Manal do you have a Twitter account? I said yeah but I am not using it. I never publish or tweet something there. She said to me first thing you have to do when you go to Lexus, follow Lexus people. You need to have this a ground of people that you follow that you know and those people will tweet a lot of things about the recommender system that would be interesting for you and this is also my advice. I think it's a very valid advice I remember a situation we had at InovX the company where I work as a data scientist for and where I was basically in my junior year there. We have these meet ups that we held and where we give some presentations where we connect with people in the IT or data science community and then I remember I was a couple of times asked by our marketing people what my Twitter handle would be and I just said I don't have Twitter and they were always kind of very surprised why I don't use Twitter and this was even back in 2017 or 2018 and I guess I joined Twitter quite late so I guess it was in 2020 or the year before or something like that but I really found that there are these people that you should definitely follow. You can get a really really nice feed which is kind of a research feed almost if you are not into other areas and just for example keep it for your research or Lexus works and this gives you really nice updates on what's going on in the field and especially as you already said during Lexus there is a heavy load on Twitter.
So there is a one code that I'm not sure I'm saying it right but then I agree with it. Tell me who are your friends, I tell you who you are and it's exactly. Tell me who are you following I will tell you what you need and I can recommend some active people, Lexus people in Twitter do follow I like Michael Exron for Lexus in general, for Responsible Recommender System for fairness, for I think everything and also I would say you can also follow Martha Larson for specific news for example she is organizer of the media evil so you know the deadlines and all of this if there is something at Lexus the call for papers also Christian Bauer I will tell I will give a lot of recommendation for active people. Nice this podcast is also for people to connect and I would definitely agree with the people that you just mentioned.
Let's go a bit more into your research work so I would be interested in a couple of things of course centering around how you treat the data that you work with in order to grant or to increase privacy so there are so many questions I have floating around in my head I will try to boil them down from top to bottom so starting maybe with the why first and then going into the what and how you're exactly implementing the stuff or have implemented it could you think about cases where it is acceptable to have access to the gender so if you think about different personalization products would you say that preserving this private information of the gender that a person identifies with is always necessary or are there also cases where you would as a Lexus researcher say knowing the gender is valuable and good for the recommender system?
I'm in between and because here when we mention gender I care about fairness. I will say to you if I say no you should not give your gender I know it may impact the fairness and we know we see the signal that male users are more they get fair recommendation better recommendation than females at least for some data sets what I would say it's again in the hand of the person who have my data so it's for the company. The company has information about me and if the data, if the company is able to protect me I'm able to give my gender and then it will help to make fair recommendation. So what I would say to the company please be careful, pay attention to those sensitive information because we cannot give it everywhere and I'm a person who I don't like to give my personal data to everyone but if there is a good intuition behind I'm okay to give it as long as I trust you so there is a trust and it's important again I will link it to the paper that we published on how we can protect the gender. The gender is there but the problem is not that the gender what you see is user item the interactions the two made with movies you, me, every user. So just basically from the matrix if the company will release only users products and ratings, conception interactions pay attention because also those only these attributes can leak the gender. As I said because the attacker can infer the gender. What you have shown in your research paper that you are inferring it from the ratings what your gender might be and then you have implicitly leaked that information just by having access to your ratings. Yes, so the company is not releasing my gender, that's good. But another thing they have to be careful with is that even the interactions I'm having with the movies may leak my gender and not only my gender also other sensitive information. So before releasing or before doing something we need to pay attention to this.
It's not that obvious for everyone here I guess even for people that are technically savvy it's not always obvious that my movie ratings have a predictive capability of my gender or let's also think about other things about my educational background or about my age or something like that. But yeah this is how this world sometimes works. It's complex and it's non-linear and these are kind of the unknown unknown then but it does not mean that they are not possible, right? Yeah, they are not possible. It is possible. But what is possible is that we can do a small thing to protect a very small and simple and effective. Before maybe coming to that point because the topic of your research is actually not only privacy preserving data, it's purpose aware privacy preserving data. And I know that this first thing is a purpose awareness is pretty important to you. So what kind of purposes are we talking about? Is it really that we are talking about fairness there, about trust or what does this purpose part means and why is that so important for you? So it's related to a lot of things. I would say generally there is always a purpose behind doing something. The purpose can be from the attacker where the attacker has some capability and a goal to do something. And a purpose can be for me if I am the company, the recommended system company, the purpose is to maintain the recommendation quality. You can say also the purpose can be to make a fair recommendation. And as I said before, depending on what kind of attacks and privacy what you want to protect, then you can do the protection.
So you see, there is a purpose, there is a starting point that you say I want to protect the idea of the users. No one can know who is this person. Or I want to protect from against the inference of the gender, of the political or whatever orientation. While you maintain the recommendation quality. Of course, as a company, I want to protect my users, but also I want to keep the users satisfied. If the users are not satisfied, they will quit my system.
And then there is no money coming. Of course, this is where nice work is being done, where you are doing nice work, because it's not necessarily a trade-off.
I don't want to say the trade-off. I know it exists, but I don't like to say it.
But I actually also didn't see it in your work. So what you have shown there is, hey, we can do better this part of doing privacy preservation, but this does not necessarily mean that we lose any quality of recommendation relevance, so we can improve in different areas without getting worse in a certain area. So in economics, this is what you call a Pareto improvement, where you just say, okay, I'm keeping my level, but I'm actually able to improve in one dimension of my problem. And this is, I guess, what you are achieving in your paper.
Is it right? Yes, that's what we did. Give us a bit of background what personalized blurring actually does and how you achieve it.
So, yeah.
So as you mentioned, PERBLER stands for personalized blurring. Blurring is like in images.
You want to blur some pixels to hide something.
I'm using some terminology, but yes, when we blur a part of picture just because we want to hide a face or something, not everything, that's exactly what we are doing for the context of recommender systems.
And the technique, if people are interested, it's called obfuscation. So obfuscation is just making a small changing in a user profile.
And a user profile is defined as the interaction a user has with the system. The items the user interacted with.
So basically, if you think about the interaction matrix, it's the row of the user. Yes. So we have, you imagine, we all know the user item matrix where users are the rows and the columns are items. Can be movies you watched.
Then a user vector or a user profile is the movies, the items this user watched or interacted with.
So we take that user profile and we inject. Small thing, we make a small modification, very small, such that we don't impact the recommender system and we block, we protect your gender. Then we say, wow, what's this magic?
Actually, the magic is very simple.
Very simple. But I like it. Simple and effective.
That's a smart idea, right? Okay, yeah. I like it. I really like it because it's easy to understand for everyone and very simple.
You take what happened in this per blur. What are the inputs?
Simply. I am a user of recommender system and I am a female. You are a male.
We have so many users. What we do, first we try to find items or movies, whatever the items are, that are linked to your gender. Which means movies that are indicative for female users and movies that are indicative for male users. You may say how? It's also very simple. You know, we have the logistic regression. And in logistic regression function, we have the coefficients. Those coefficients are, you know, if we have, I don't know, A plus alpha, x1, beta, x2, etc.
The alpha, the beta, gamma, etc. Those weights tell us how much this feature x is correlated with the class label gender.
And the gender is binary here. So from this, we can know that the movie x, y, z are more correlated with males and other movies are correlated with females. This is the first input. And you can see it's really simple.
Which means that you need to have some data beforehand that of course tells you what the ground truth is. So you need to have at least access to some data that tells you these are the ratings and these are the correct labels, so male or female, in order to train the proper binary classifier that you can then use to have these coefficients available that you then relate to female or male items, right? Yeah, if you are the company, you have those resources. But if you are an attacker, you may say the attacker how? The attacker can easily go to IMDB or Twitter movie and web screen. Because they put just a tweet or some signal about a movie and you see people are commenting oh, I rated this movie 8.5 out of 10.
They gave the signal. Yeah. Okay, okay.
So basically for an attacker without having access to the data of the corresponding company, it is possible to have this information from other sources and be able to train a binary classifier.
Yeah. Okay. Okay. These two lists, they exist.
There is a paper published at RecSys in, I hope I'm not wrong, it was 2011 called Blur.me. Data obfuscation, same to protect the gender.
So this is the baseline that I have to base to do my per blur. They have the two lists, list of indicative movies for males and list of indicative movies for females.
But what added per blur? To make the personalized obfuscation. We predict the movies that you will like and we rank them. We sort the movies from like we have a confidence score to say you will like first this movie with some confidence, with a score, then you like the second movie, third movie, fourth movie, etc. So for every user we have a personalized list of movies he may like. Okay.
Then we combine the two movies. If you are, if, let's take me, I am a female. I'm saying my job. Yeah.
So I am a female. What I do as a person who will protect myself, I go to the list of indicative movies for males, for opposite gender of me, and I link the movies that I may like with this list, which basically mean if a movie appears in the top of list for the personalized items and it exists in list of indicative movies for males, then I take it to be a personalized list of indicative movies for me.
Ah, so you swap items to say. So as a female you take the items that are very predictive of you being a female and look for items that kind of counteract that female signal by being the male ones but only those that you would rate similarly like the items that you want to exchange or have exchanged. And then you basically remove this from your user profile and put in the male item. Is that right?
We did not yet get to the remove, but good point. Okay.
Sorry. Go ahead. So we are creating the list of indicative items, but it is a personalized list of indicative items.
Basically I will have LM and LF, list for females, list for males, but it is personalized for every user based on your preferences, based on what you may like, but also combined with your opposite gender.
If the movie that you may like you are really interested in and it exists in LM because you are a female, then we put it in the personalized list of indicative items for the user. Until we have for every user a long personalized list of indicative items. And this is where the application will start.
Could you refer to them as obfuscation candidate? Yes. Yes.
The input to the obfuscation, obfuscation candidate. Yeah. Okay.
Then we go through every user profile. We ended by having the personalized list and we want to add inject, make the protection.
So we go through every user and we have, let's say we say we want to obfuscate 1%, which means we add only 1% fake items into your profile.
1% or 5% means relative to the number of interactions that I have already seen. Yes. Okay.
And we add for you a certain number of fake items.
Then you will say this is the first part of Purbler. Then you may say we are making the user profile maybe longer.
And then it's easy for an attacker to maybe guess it's obfuscated. It's not the real data probably.
What we do to make it neutral, we remove some items.
But now the items are removed not from the opposite gender, but from the same agenda of the user. Ah, okay. Okay. Now I get it.
So I guess my idea was a bit too naive in the first place. So you don't want to make it too easy for the attacker now to see or identify that user profile as somehow an incremented defensive profile. And therefore you have an incentive to keep that share of obfuscation pretty low.
So you mentioned 1%, 5% not too much. Yeah. And it is very small.
Yes. Okay. But also to then remove those that kind of resemble your gender. So at least on that point I was right with my idea. Yeah. So when you mention the removal, no one did it.
The obfuscation before. And it was one of the criticisms that we say imagine a movie that was not popular, but because of obfuscation and it was very indicative. So it's used too much. Then you plot the item distribution and an attacker will say this movie is not popular, but how this movie became, I don't know this movie. It's such a...
How? Okay.
If he knows then maybe he will start guessing there is some modification happened and maybe something.
As you mentioned, it's easy, but still I would say very simple. But smart idea. And it has some implications. You may say why we choose the gender. I say because I care about responsible recommended system and because since I'm attending Rexxus, I like the workshop on... it was FATREQ and then it became RMSE and then at Rexxus and I was attending and I love the topic.
The FATREQ workshop is the one on fairness, accountability and transparency of recommendations, right? Yeah. I know that Joe Conston a couple of years ago gave that speech as part of the FATREQ workshop where he was also saying that he was getting a bit annoyed about that everyone just seems to be caring about optimizing for tiny improvements and relevancy. So I really really liked that because it was so honest and so upright and I guess this is pretty much what such a healthy community needs.
That there is criticism coming from the inside. Same point.
And I was impressed by this. Yeah. Yeah. The accuracy of the system is important. I totally agree with you. It's very important to make user satisfied because it's like a loop. If a user is satisfied, the company can survive and have money coming but it's not the only thing. Yeah. And this is where responsible recommender system comes to play.
Yeah. And in mind, I like this fairness.
It started coming emerging topic and I work on privacy and everyone says it's already privacy utility trade-off that I don't want to say and you said it and then you add more also privacy fairness. If you don't have the gender, how can you make fairness?
And I saw a paper by Michael called All the Cool Kids How Do They Fit? His paper, he did some exploration on how to... He was first testing recommender system algorithms, mainly collaborative filtering from matrix factorization user-kin and all and how they make recommendation for males versus to females.
Yeah. And he saw that male users get better recommendation compared to female users and that was a key paper for me to make per blur. In mind, I have to protect. I have to do the privacy because it's the main idea of the paper and it's the main purpose of my PhD but I also like fairness.
Is it possible to make a link?
See, application is adding some interactions, some information into a user profile. A simple question just to check. Would that help to improve, to boost the recommendation for females without destroying recommendation for males? Because for me we can achieve same score but how did you achieve it? Did you reduce, did you destroy recommendation for males to make a fair recommendation with females? This is not fair for me because you are destroying recommendation for some people just to say I am fair. This is not fair actually.
So what I tried to see is the information I'm adding into the user profiles would help or at least will not amplify this unfairness and it was kind of, yeah, at least we are not amplifying and we are trying to boost a bit the performance for the females and another thing is diversity.
So this was actually an effect that you could prove in your experiments in your paper that you have compared the accuracy of recommendations for male and female and that after doing the obfuscation you also saw that the accuracy for recommendations for female users improved? Yes. Okay. And it did not, what I was caring about, at least I'm not amplifying.
At least I was saying let me see that I'm not making the recommendation for females worse than what it was.
If it's the case I'm really failing even though I am protecting the gender.
It's a part path achieved but the next is also important for me.
And then I was watching and how can I quantify, how can I define the fairness, how can I measure the fairness. And this is where I said to you, if I would just, I don't know, say destroy male's recommendation to make it equal to the recommendation for females, this is maybe easy to do.
What is hard is to improve the recommendation for females.
And it's all thanks to the personalized obfuscation.
We stayed close to the user preferences, we did not inject bias, we did not inject noise because it's personalized and we know you will like it so you will not feel this is not your preferences it's in line with your preferences but also it helped to protect your gender so a classifier will not be able to know if you are a female or male but then it was helping to make a fair recommendation and not only this.
We have kind of diversity, it's easy to say, simply thing. We are injecting a kind of signal coming from movies, if I am a female, coming from movies that are liked by males. So which means it should make some impact on the recommendation list. It succeeded.
How did you actually measure the diversity of recommendations or at least the increase in diversity? Yeah, so here we defined something called stereotypical items. There are some items, some movies that are stereotypical for males and some movies that are stereotypical for females. Say the example, I am a female but I don't like Barbie and actually I love horror movies and action movies. So if the recommendations keep recommending me Barbie movies or some other female related movies, it's not always what I want and I will be looking more and more to make personal lives for myself, to find the movies I like. You see? And actually sometimes if I want to watch a movie, I ask my friends and my male friends because they like watching horror movies.
Then with that, we try to see if the recommendations, the top ten list of items still contains stereotypical movies for females or for males. Are we able to reduce the appearance of stereotypical movies in the top ten list? And yes, Pöbler was able to reduce, which means it adds some diverse items to the list. You were quantifying how stereotypical certain movies are for a certain gender and then you were aggregating these quantities for the lists before and then after and you basically saw that kind of, that aggregated stereotypical quantities somehow decreased that the lists were not that stereotypical anymore if you have observed them as a list. Yes, which means you are getting non-stereotypical items to your recommendation.
And still it was a good relevance, yeah? So that is an interesting point here.
Higher diversity while maintaining the relevance of recommendations.
What I really like about this and about also the idea in general, I really think that like you also stressed, it doesn't need to be a trade-off and then what I also guess is that for platforms, it should also not be in the interest to ignore these factors because I guess when in the end people, and we are talking about personalization here and then I should be able to tailor recommendations to a person regardless of his or her gender but with regards to the interests a person has. And this is exactly what you mean and what you are providing by your horror example, horror movie example, that you want to have a system that personalizes this and then this would further increase your chance of staying with a certain platform. So this means that things like fairness, transparency, diversity of recommendation they are not necessarily add-ons, they are more integrated or should be more integrated because they also pay towards the user satisfaction and the user satisfaction is I guess what a platform earns its money with because if users are not satisfied, they are maybe very likely to leave.
Exactly. You got it.
Right.
Nice. Thanks for that really great explanation. I guess everyone can easily grasp the notion of what you have been coming up there and it's also nice how you extended these ideas and were able to develop a new method from your criticism towards further methods or further work.
Something that was coming up in my mind when I was thinking about what you did so you basically came up with a smart method to somehow alter the data. Before and after you were applying still the same algorithms, so standard collaborative filtering with different notions, so neighborhood based, model based and so on and so forth so you did not really change the method but you changed the data and this remembered me of one of the very very famous figures in general, Andrew Ng and he actually gave an interview in February that was called Unbiggen AI. He gave the interview for IEEE Spectrum and this was kind of the starting point for the so called data centric AI which he is now advocating for. You were working on data centric AI because you were saying the power really lies in the data and not really in the model so if I have shitty data, so he's giving that nice example I've article here and he's providing a good example that he basically says when asked you often talk about companies or institutions that have only a small amount of data to work with, how can data centric AI help there?
He's mentioning an example of if I have very skewed data so for example if I have 1000 examples and I did imputation I have binary classification and 30 of them are positive but this 30 positive ones are very unreliable then investing work in making these 30 examples really reliable is paying off much much better instead of investing a ton of work into the models and you basically just came up with a smart idea to alter or center your ideas around the data and not on the models afterwards because then maybe the models power afterwards to anticipate these underlying mechanics is not capable anymore of obfuscating so it's wiser to do it already before and in the data. Can you somehow relate to this idea?
Yes, so you said a lot and I know about that interview and the emerging of data centric AI and if I'm not wrong he was mentioning synthetic data. This is another thing I work on and I like in my PhD topic and my Doctor Consortium was about comparing recommended system using synthetic data. You have also been co-organizing the workshop as of last year's RecSys a related topic so simulated data or simulations for recommended systems. Just tell us a bit more about this. So it is still ongoing. What's the difference between simulated data and synthetic data?
I would say simulated data and synthetic data are really different. Think about an axe where you have simulated data is to the left then you have synthetic data not in the middle but in between the middle and close to the real data.
So simulated data and real data are a bit far from each other because simulated data is more based on assumptions. It's theoretical but what synthetic data is? It's simply saying synthetic data is a data that looks like original data but it is fake. When you look at it you find the same structure of original data same attributes, same number of rows, same number of users and that's what I did in the Doctor Consortium in 2018 paper. I was able to generate called partially synthetic data where again I have in mind to protect some information.
So for that paper I was caring about the rating and I said ratings can say a lot about you because I'm influenced by the PNAS paper so I was trying to find from your rating I can infer the favorite actors, favorite directors and much more.
You know why? Because when you give five stars versus one stars are not the same. Then I was trying to synthesize the rating attribute and it's one way of doing synthesis while being close to the original data.
Changing small thing, protecting users and keeping the recommendation quality. Synthetic data for this case is interesting.
Another use case of synthetic data is data augmentation where you want to be not making a noise adding whatever how much interactions into your data but those interactions are not making any improvement because simply they are noise.
You can learn the real distribution of the original data and then you synthesize, you augment your data with synthetic data. And I see it a lot now in many other contexts. Not yet in the recommender system it is still coming thanks to the Simurek workshop. It's coming to the recommender system community.
So it's still a bit related so it's not too far off.
Was that work on synthetic data what motivated you to organize that workshop? I have seen that you tried out a very different format for this workshop. How did it work and what would you say are the challenges that currently need to be addressed in the field of simulations or also for synthetic data? Okay, from where can I start?
I got an invitation to be a co-organizer of the workshop that when I read simulation and synthetic data for the recommender system, I was so happy.
You cannot imagine, I was reading the email and writing to my supervisor, see I got this email.
And of course I need to discuss with her if she can allow me based on my situation with my PhD if how much time I have. And she said, yeah, of course, go for it.
Yes. It's so nice to always see how encouraging your environment is in the end of the So really like to hear this. Yes. And again, because I said, do choose the topic, but also pay attention to the person.
And again, see how COVID came and if you don't have that relation, that nice relation with your supervisor, you will end being alone like me. I'm not with my family.
My family are my colleagues. My family is my supervisor.
Really. We have, it's a different conversation, but just to mention to the people how much important. In COVID period, Martha can just call me and we call each other to not talk only about work, just to make sure that I'm fine, that I'm eating well, that I'm doing fine in all of this. Right. Perfect.
It's not only about work. I really love that. So back again.
So this workshop, what I also like on it is that it's interactive. In a sense, it's really interactive. It's not only submitting papers and then go and present your paper in the workshop as part of RecSys. No. We were working, already we started the workshop in August and RecSys was in September. Yeah. We have interactive sessions, meetings of I think one hour.
It may stay more. It depends. Where we, or the participants who have a paper accepted at SimRek and it's of course about simulation or synthetic data for recommender systems from different perspective, like algorithm development, user behavior, synthetic data, reproducibility, infrastructure, a lot of things that touch this simulation in recommender systems. It was nice.
You interact with, I remember first time we had many participants and they stayed for the three weeks.
Every week we have a session, all the people come. We start first by introduction and the main point of the meeting of the day. Then we have breakout rooms and everyone can go to the room or the topic he or she is more interested in. This is how we ended by having different topics in the workshop. Then we met at RecSys. We had a round table. It's not official because the official one was online, but because we have enough participants, so we had a round table, we discussed the topic and because it is still ongoing a new topic, we need to understand the idea for me, at least for me, to understand how to position simulation, data sampling, data augmentation, synthetic data, real data and why we have them, why we need them and you said why. We are going towards data-centric AI, data-centric recommender system. It's not only about the algorithm itself. There are a lot to do in the input and this is one of the things I love. Really there is a lot. Just from user item matrix, we can get a lot of information.
Synthetic data is part of the things. I really like that term. Let's move to data-centric recommendations. Data-centric recommender systems. We are saying data-centric recommender systems. This is becoming a pretty dense episode, but I'm really learning a lot and I hope all of the listeners are as well. Again, I really like how appreciative you are, but also how engaged and how motivated you look and fascinated about recommender systems and really making an impact there that I guess is very important and will become even more important in the future. I hope to. Thanks for that, definitely. I guess for all our listeners again, also for this episode, we will definitely reference all the papers, also the workshop. Looking at the time, I think the field will not lose you and I guess you have been giving lots of advices to young researchers.
Is there something additional that you want to mention to maybe not only researchers, but also practitioners that enter the field?
Yes, I am more good to advise the younger generations because I want them to learn what I learned and because I cross people who really helped me and made me the person who I am now. Back again in 2015, I was a shy person. I don't know how to approach people.
I feel scared to start a conversation, believe me.
I have a lot of thoughts in my mind that I end staying in my place. If someone comes to me, okay, otherwise I am in my corner, you know. Now I change it a lot for many things. Thanks to many people and this is my advice.
First thing, if you are a PhD researcher and you are at the beginning of your PhD, also in the middle, do think and please submit a paper to the Doctor Consortium at Rexes. This is a very nice place. You will meet, I remember for me, my mentor was Joe Constan.
I was happy, but I also was stressed.
I was stressed, I will be speaking, okay, all the mentors are in the room actually. One day before, because I gained some confidence, I invited Robin Berg.
You know why? Because in the fat rig I saw, he has a paper on synthetic data privacy and how we can generate synthetic data attributes, synthetic attributes for fairness, consumer fairness. This was his paper. So I went to him and I also invited him. So you can imagine my situation.
2018, my second Rexes, yes, but I'm still a bit scared, a lot stressed.
But you see in front of you all the big guys.
And on top of them, the first person who will be talking to me, giving me advice, maybe asking me questions, is Joe Constan.
And who is the group lens and movie lens.
And what I'm going to say, synthetic data. This is in my mind, of course.
And because I don't know and I'm still new, and that's what I want to say. Yeah, of course, everyone will feel scared. Everyone will feel pressure, stress, but do take this opportunity. Those people will not do anything bad to you. See me, thanks to those people.
I say again, I am the person who I am now. I have a lot of confidence on myself, thanks to my supervisors. Martha Larson, Alan Handriarek, my team at Rexes, my team at TU Delft, MMC Group, and to the Rexes community. This is where I go. Basically, this is the only conference every year I go to. So I was there, starting by the Doctor Consortium. And when I am a Doctor Consortium student, I am a student volunteer.
Oh, and this is where I started with the love. I love volunteering. I love doing this job. And it's not too much. And you know, you will not pay. So you get free registration. And not only this, when you do volunteering, people are counting on you. And people are smiling to you. And because you do the volunteering, you will be in some places where people will know you. They have to know you.
Who are those people? They are the senior people. They are the people who comes every year to recommend a system. So see how many friends and people you can know just by volunteering.
And if it's for me, I want to be volunteer at Rexes every year.
2018, I am a student volunteer.
2019, I applied by myself. Just to be it was in Denmark. And I have a later breaking result paper.
But I also wanted to do student volunteer. So I applied and I got it. Thanks to Helma.
Yeah, to the... I applied and I was saying, you know, there is a motivation why you want to be a student volunteer. For many reasons.
Including that I wanted to be at Rexes. Including I love volunteering.
Really. 2020, it's online. Unfortunately, we didn't go to Brazil. But I also did student volunteering. I was a volunteer.
2021, I am the student volunteer.
Co-organizer with Benedict. I wish if I can do it again.
It's too much. Yeah. There are a lot of things.
We need to prepare. We prepared a lot. And I am a person maybe a bit picky. I was insisting a lot. I hope all the student volunteers who know me. I feel they all said they are happy. But I think if I look back...
Yes. If I look back, I say I was insisting on many things. Because I want really to make and I keep saying to them I wish and I want to make Rexes 2021 just successful. This is the only word I can say to everyone.
Online student volunteers, in person student volunteers, please let's make it successful. And it was. And I heard back.
Everyone is happy. It was too emotional.
I love that moment. I wish if Rexes did not finish.
Did not end. But thank you so much for your support.
So the work that people are doing there, the organizers, the student volunteers and everyone else. So it makes the whole thing a really really nice community event. And I really like your advice for new younger people who are doing research because it really gets you into contact with people. And you mentioned it. Also changed yourself and made it maybe made you become a more open person. Exactly. And this brings with itself additional opportunities. Because if you are not afraid bumping into new people and asking them about what they are doing this can bring you so many opportunities. And then of course starting with it as a student volunteer is I guess a great great thing to do.
Yeah. Because you will be interacting with those big guys. They are a workshop organizer. They are somewhere in the organization. And they will find you will find a way to talk to them. And that's what I mean too. This is how you can and please do not feel scared to approach those people.
Just count on me. Count on what I'm saying. Trust me I did it. And it really worked. Approach those people. You will see how they are kind. How they are open to talk to you. They will not judge how much you know.
And we all know I'm not yet the expert in the recommender system. I cannot.
I'm still a student like everyone. But just simply start a conversation what you are doing, who you are and the conversation will continue. And maybe it can stay for one hour or whatever. Go for it. Yeah. How to do.
People will ask students. Will ask how can I apply. Maybe they don't know. At RecSys in every year there is call for student volunteers. It's not yet open but for sure it will be open soon. I'm saying hi to Amifa, Andres and Marcel. They are the co-organised co-student volunteers for RecSys 2022 and I know they are working on it. So pay attention. Be patient.
Keep looking to the RecSys website. This is where you find this information and apply to be student volunteers. It's not hard. It's not too much work.
Maybe to have it easier just follow Manel's Twitter account which we can also include afterwards because you I guess will definitely retweet once the call for student volunteers is open, right? Yes. I keep tweeting. When the call for Dr. Consortium comes out and it's already out now and when the call for student volunteers because they are both close to my heart and I know many students who are following me, they may get the information. Perfect. And then I guess if people still feel anxious about bumping into new people, they could just ask Manel and Manel will do the introduction.
Oh yeah, I did it actually. I like, I told you I worked on that paper because I like recommending people to people.
I like to be that bridge and make the connection between people.
Just come to me and I will introduce you or start the conversation and you will feel comfortable later. So you are not only working on research in recommender systems, you are also a recommender system yourself. A social recommender system.
You see how confident I am now.
I like it. I like it. Okay, I guess this is definitely a valuable advice that you gave there and I hope many people are listening to this and hearing this and paying attention to it. Another aspect of course, you also mentioned that you use products that are influenced by recommender system or that heavily rely on recommender systems.
Is there some product or some specific applications that you really like?
Good question. The only app I can think about now is Twitter, but I'm not 100% satisfied. Okay, what could be improved?
Okay, sometimes I get recommendations that I already saw it and maybe I even like it. I keep getting it again and again. So there is no non-positive removal? I don't know. I really don't know. So maybe for Twitter people they can know what is happening.
I'm not really a good user of LinkedIn.
I'm just watching for movies.
I don't watch always movies and when the movie deserves, which is a horror movie, I like going to the cinema. This is where it fits. It's a non-linear movie. It's a movie where you really feel it and that's why I like horror movies. You don't expect there are a lot of emotions, a lot of non-linearity in the movie.
A lot of adrenaline.
Last but not least, you have mentioned so many people that you collaborate with that also facilitated your career and development and I know there must be many names in your head, but if I force you to just name a single one that you would like to be only...
Maybe we can make two. Five? Very quickly.
Let us know who are the people that you would like to listen to in this show. I would love to hear from Joe Conston, Michael Exran, Robin Berg and Martha Larson. Recommended system and responsible recommended system. How they see the recommended system now, where we are and how they see the future of recommended systems. We were there in 2017 where fairness and responsible recommended system the fact is still new. And me and you we were listening to Joe Conston. So that's why I really want to hear their opinion, the reflections they have. It's very important. And I would say, okay, I already mentioned some, but if we want to hear from evaluation of recommended system, which is very, very important, even when I do privacy, I need to have the evaluation framework.
Unfortunately, we didn't talk about it, but please look at the paper so you can understand what I mean for that. At some point, I needed to contact Alejandro Beloghin. So I really also want to hear from Alejandro, Pablo Castles, Alain Saeed from evaluation of recommended systems. And I was in a workshop at RecSys last year where they mentioned the evaluation and how we should not take the average accuracy of the performance of the recommended system, but we should plot for every user.
And this is one of the points that I really want to hear more from them.
And I agree with them. Also for fairness and music recommendation, I like Christian Bauer and Cynthia Lim. Cynthia gave already a keynote speaker at RecSys and we saw how it is diverse, how it is from different point of view. It makes the interdisciplinary.
And I like this. Yeah. Cool. Okay. Definitely a long list of people.
Thanks for sharing that. So maybe we will make it someday and also get these people on board.
Manel, it was really, really great to talk to you. So especially with that energy I feel when talking to you and that knowledge that you gathered and your contributions, but also how enthusiastic you are about this field. So thanks for sharing all of that.
Thank you. Thank you for giving me this chance.
And again, thank you to everyone who knows me and I'm not able to mention the name. I would love to thank some parts. CBS, where I'm now doing an internship with Peter Paul and Raynaud because now I see it's not recommended system, but it's related to synthetic data in practice and Red Belt because they gave me this chance to be teaching and supervising. And I love doing this. And you can say, okay, I see you between industry and academia. I'm in that point where I have to decide after my PhD what I want to do. So I'm trying to experience both sides before finishing my PhD. Very good idea.
So thank you to TU Delft, to RecSys friends, to CBS and Red Belt. Cool. Thank you. Last but not least, I guess, see you at RecSys in Seattle hopefully.
I really hope. I really, really hope in every, after finishing every RecSys, I say to myself I hope I will be able to join the next RecSys.
Since 2017, yeah, I made it. I hope it will continue. Yes. I hope by that time too I will be able to say, yeah, I submitted my thesis, so I'm coming to RecSys. Then RecSys party is even getting bigger. Yeah.
Cool. Cool. Okay, perfect. Then I'm definitely looking forward to seeing you this year in Seattle at RecSys in September.
And Jant Bem, stay tuned. And thank you and all the best. You too. And stay safe and all the best.
Bye. Bye.
Thank you so much for listening to this episode of RECSPERTSs Recommender Systems Experts, the podcast that brings you the experts in recommender systems. If you enjoy this podcast, please subscribe to it on your favorite podcast player and please share it with anybody you think might benefit from it. Please also leave a review on Podjaser and last but not least, if you have questions a recommendation for an interesting expert you want to have in my show, or any other suggestions, drop me a message on Twitter or send me an email to Marcel at RECSPERTSs.com.
Thank you again for listening and sharing and make sure not to miss the next episode, because people who listen to this also listen to the next episode. See you, goodbye.

Recsperts - Recommender Systems Experts

More episodes

Chapters

Show Notes

What is Recsperts - Recommender Systems Experts?