Roblox Tech Talks

Anupam Singh, our VP of Engineering for the Growth team, joins CEO Dave Baszucki to talk about the technical challenges we’re solving to enhance search and discovery for hundreds of millions of creators and users. They’ll explore how our approach to growth has developed in accordance with our values to deliver the best possible experience for our community.


What is Roblox Tech Talks ?

Discover the people, ideas, and technological breakthroughs that are shaping the next iteration of human co-experience. Featuring a diverse cross-section of guest speakers from Roblox’s product and engineering teams, we’ll discuss how we got to where we are today, the road ahead, and the lessons we’ve learned along the way. Hosted by David Baszucki, founder and CEO of Roblox.

00:00-1:22
David Baszucki: Hi, I'm Dave Baszucki, founder and CEO at Roblox. You're listening to tech talks, a podcast about the people and ideas that are shaping the future of 3D human coexperience. in this series. We'll be exploring some of the most innovative technologies that have emerged in this new category and sharing stories with the Robloxians that are building them.

Today. I'm joined by Anupam Singh Vp. Of engineering for the growth team at Roblox. We'll be talking about our vision for growth engineering on Roblox and the technical challenges we're solving to enhance discovery and engagement for hundreds of millions of creators and users on the platform Anupam. Great to be with you today. Welcome. I remember, when we first met what really caught my attention is, I shared with you a vision for analytics on Roblox. That was what we might say, a little out of the box, and I think you immediately went and ran and calculated whether we could do it or not, which I thought was really exciting. You remember that coffee meeting?

1:22-1:38
Anupam Singh: Yeah, I send you a spreadsheet saying, this is data sight unseen, I didn't know what our internal data sets look like. And it turns out, it's actually more than 3 exabytes. So it's been an awesome journey just getting to know all our data sets.

1:39-2:12
David Baszucki: Yeah. And I'll I'll hint to the people listening that we will circle back to that discussion because it was. It was the notion that in 3D human coexperience, virtual environments, it's conceivable to generate events every few seconds over the course of 5 billion hours a month. And that's a lot of events. And when people start thinking about querying those and running analytics on those, it creates some mind blowing complexity for the engineering.

2:12-2:13
Anupam Singh: Absolutely. Exciting, exciting stuff.

2:14-3:23
David Baszucki: Okay? So hey, let's first off. For probably 10 years now people talk about oh, the growth team. And I think over the last 10 to 15 years at certain times there's been almost a magic aura around the growth team. And oh, our growth team! Is that why we're growing? Now there's a little bit of a subtle difference, I think, at Roblox, because we call it growth engineering. And I think in every company. There's a bit of a thought and philosophy around. How much of our growth is gonna come from strategic innovation and new inventions, and how much of our growth is going to come from extremely good discipline on A/B testing iteration, making current systems better. And it's always a mix of that. I think we lean very thoughtfully a little more on the innovation side. But at the same time it's very important. Can you share, maybe high level, how we think of growth engineering at Roblox.

3:24-4:28
Anupam Singh: Yeah. So you know, I come from the open source for the world where to on any growth, you really have the community aspect. the community has to adopt your code. You're not going to market your way through it, and I'm glad they are continuing that journey at Roblox where, if we could eek out 1% engagement growth by doing something really hacky, we will not do it. We won't even test it. We have this value that you talk about a lot which is, take the long view. So let's say, I can do something really innovative on the platform that will enable creators to grow faster. We will take that path in the short term. I may not be a great growth hacker, which, you know, we as a company, are very comfortable with our engineers, don't think about growth hacking. They think about one thing. innovation to enable creators and players to find more engagement. Growth will come.

4:28-5:23
David Baszucki: cool. so one of the things growth does do at Roblox, and it's gotten better and better over time is this thing called discovery. and I'm I'm gonna give the users what V1 of discovery was at Roblox. And then I'm wondering if you can give them a more thorough understanding because it affects video platforms, it affects Roblox, it affects a lot of platforms. The one of discovery at Roblox when we launched was one home page showing which experience on Roblox had the most people in it in real time, and showing them from biggest to smallest, and that was called our popular sort. That was discovery V1. That's very different, I think. Then today, with the industry, thinks of discovery. So could you maybe share for the audience. What is discovery? And what is it on Roblox now?.

5:24-6:20
Anupam Singh: Yeah, whenever I play Roblox, just as a player, the most exciting part is when I'm playing with somebody. And that's what we mean by co-experience. So everything that we do when discovery starts with us being a co-experience platform like, how can I get you an experience with your friends, or with somebody that you like to play with? And the other part of our platform that may not be apparent is how real time we are. I walk into an experience with Dave. We go up a mountain, we see a stream, and we jump into it. Now, how does that change our discovery algorithms? our discovery algorithms are very different from discovering a video because I have to find, as a Roblox Discovery engineer, the best content for you, and also the next best action.

6:21-7:08
David Baszucki: That’s hard. And for a Roblox user, this is the homepage. This is in addition to what I've recently played. I believe we call it recommended for you, or something like that. And this is really interesting, because in conjunction with finding the next best action, I I sometimes said, Anupam, can you set an objective function to optimize for the enterprise, value, and size of the community on Roblox in 5 years. Which, if we could, those would be wonderful selections, and the bigger Roblox is the more all of the creators have economic opportunity. But that's a very difficult engineering task, right to optimize for 5 years out. Can you share how we try to proxy and come close to that?

7:09-8:14
Anupam Singh: Yeah, like any other big engineering task, to create a personalized system for a billion users that's real time and coexperience. when you say that you already feel intimidated by it, but like any other open source, quarter, or developer. the idea would be to break it into the first line, or the first thing that you can do. First thing you can do as a user just find all similar users. So there are users like me. it doesn't mean just age, etc. location, it could be more nuanced. I like basketball games on Roblox. Find other people who like basketball. So you imagine it as the first step in the journey. Second step is to then find every game that is similar, or every experience that is similar to users' experiences. But then the most interesting part comes in, which is the activity of your friends. What are your friends doing on the platform.

David Baszucki: We have a really huge responsibility here, because in a lot of the creators on Roblox have communities, a lot of creators have all kinds of social channels where their users learn about it. But there's a lot of creators also who our discovery page contributes to them getting more recognition. And so it's a huge responsibility to treat all of these creators in a egalitarian fashion, and do this mathematically. And I know we've gotten a lot better over time, but I do think it would be helpful Anupam, for you to confirm with our creators like this has to be 100 fair, no matter how big or small anyone is.

9:31-10:27
Anupam Singh: We have to be fair and we have to be transparent. So let’s touch upon fairness first, which is that the algorithm is does not have manual intervention. it continues, it's funnel, the 3 or 4 step process that I described. It's a neural network that continuously evaluates new signals. The effect of that is multiple times last year we've had experiences go from almost 0 users to a million users within weeks, sometimes as short as 4 to 5 weeks. One of my favorites is Doors in which you go through Doors. And you get killed with, you know, sometimes because it's a horror genre. But we were able to grow that game’s community by just discovering people who loved this particular genre, the music, the effect apart from just that it was a horror genre. So that's interesting.

10:28-11:00
David Baszucki: One thing. I guess we don't really have to comment on it, because we don't want to release new products. But if you're daring to, I I think I've been asking you for a long time. Could our real time system get so good that the homepage would animate literally as my friends went to different places and moved around. That's enormous engineering pressure on real time. I'm not saying that's the best user interface at all. But do you think we'll get to that level of real time discovery where we can literally at every second know what we think is the coolest place to go.

11:01-11:37
Anupam Singh: Yeah, it's totally now it. It requires that 3 exabyte data link that we talked about. We have to. We have to take that event stream, evaluate it in real time. But we are the only company on the planet who can even think of doing it. It is difficult, but we are the only ones who can do it because our entire worlds get created with this amazing data model inside our engine, and we can actually then, reproduce this animation anywhere remote. So we have foundational work. So it is possible, but not, if you ask me, is it tomorrow? No.

11:38-11:57
David Baszucki: Okay. Now. More education for our users. Sometimes, when people hear about companies in addition to growth, they hear about search and discovery. We've been talking about discovery. Could you, maybe, for the users to say, Hey, next time you hear search and discovery. Here's how you can be an expert. This is what search is and this is what discovery is.

11:58-12:31
Anupam Singh: Yeah, discovery is pull. When in some ways you come to Roblox.com and you don't really know what to play, or you're just looking for your friend. What are they playing? And you want to jump in. Search is more of a pull, so in search you come in and you say I'm going to explicitly say what I want. Now a lot of times of course our users may not spell correctly, but that's the job of the search engine. You know what you want, and we will try to get it to you.

12:33-12:44
David Baszucki: So would it be fair to say for our users out there, Google is more like search. Tik Tok is more like discovery. One is you're asking a question, and the other is, is giving you information.

12:45-12:47
Anupam Singh: Yes, yes and we need to solve both of those uses cases.

12:48-12:38
Okay so now search is really interesting Search has had a long history of getting better and better for for those of you that out there used to use the early version of Yahoo that was searched with a you know, human generated taxonomy and a tree. Google came along with links, you know, had general search. And now we're in the midst of which is arguably in the last 6 months or so people are starting to ask, what is search as we see chatGPT, and Bard, and other much more intelligent types of search. when we map that onto Roblox, the same thing starts to be possible, and an early example of that is what has contributed, we believe, to some of our acceleration in Japan. Can you give a little hint on what semantic search is, and why that may be contributing to growth around the world.

13:49-14:58
Anupam Singh: Yeah. So semantic search. If you if you want to distill it to one word it would be: I want to know the intent of the user rather than just translate the whatever search stream there is. So let's say you come in and type in a very popular movie, any popular movie. No, we are not a video site, and we are not going to be, you know, replay that copyrighted information, but we understand the intent. Maybe you are interested in wizards and magic. So that's our first part of search which is use machine learning. And we are starting to use more and more of these models that you talked about to understand the intent. But 2 of that for international users is we've invested heavily in automatic translation.So instead of trying to, you know, just map that search to whatever English word that it maps to we actually use semantic search to understand intent within your geography.So if you're a Japanese user, we are going to give you very different results for the same almost word or phrase.

15:00-15:46
David Baszucki: So that makes it very complex because you're talking about really personalized semantic search. I was playing a little stump the search team a few weeks ago in one of our engineering meetings where we're starting to envision what level of complexity semantic. And then AI driven search might support. And there's a lot of classic things on Roblox that it starts to get exciting could be part of that. For example, that original discovery page on Roblox. Show me all the experiences ranked by the number of people in them right now. What I just said, I believe some day could be a semantic search, and we would understand that and just generate that. Does that seem possible to you?

15:46-16:11
Anupam Singh: Yeah, Dave, goes back to how fundamentally search has changed from understanding content corpus to building a page rank like system which obviously Google pioneered and built. But now it is all about semantic search and vector databases and machine learning models. So we are adopting that heavily. And you will see more and more of that on our platform.

16:12-16:43
David Baszucki: Okay, trivia question. And this once again this is no projection of future product, but it's fun to brainstorm. Long term, when we imagine all of the things we might search for on Roblox. We have people, we have our friends, we have assets, we have clothing. In the catalog. We have 3D experiences, We have 3D Objects we might want to put in an experience. Do you think these all coalesce on Roblox into a universal search bar that is text based, or do they stay separate.

16:44-17:16
Anupam Singh: They have to come together. Our architecture has to be such that it has to be personalized. It has to be semantic, but these are very artificial walls. I have to understand your intent. It is not about getting you, maybe just into an experience. Maybe you are feeling like you want a hat coupled with dancing shoes.

17:17-18:12
David Baszucki: Wonderful. Okay. Now, we're gonna jump a little bit from, I would say, search, discovery, our little more classic growth things. We're gonna now start talking about something that for Roblox is a growth thing, but is a very heavy engineering problem. And that is optimizing co-presence with people and friends throughout, really, this 3D experience platform, we use the term internally, “match making.” under the covers We hope that most people on Roblox don't know about it, but we hope everyone's pleasantly surprised when they go to their experience, and their friends are already there, and when you jump from place to place your friends come with you. So what is matchmaking on Roblox Anupam. And I'm gonna ask you first conceptually, but then I can ask some engineering things as well.

18:13-18:40
Anupam Singh: Yeah, this is the one that I'm always excited about day and day out. Matchmaking is deceptively simple. Anybody who comes to Roblox comes into the homepage. It's an experience and says, Oh, I want to play. and they press the play button. Behind the scenes, we then try to find the right data center to send you to, and the right friends to play with. And that's where it gets interesting.

18:41-18:58
David Baszucki: It's super interesting, because as we've grown. I don't know. Is this an N squared and cube or end of the end problem. Because this is really a complex situation. When we grow from a hundred to a thousand to a million to 10 million people trying to match make at the same time.

18:59-19:44
Anupam Singh: Yeah. And remember, when you press the play button, the Internet has trained us to think that it should be instant. There's something about the play button that makes our users feel that it has to be instant, so it is an end to end problem. Because, let's say, a million people are already playing this experience. And Dave, if you click play, I have to then compare with all the million and see which ones would be the best to play with. But then I also have to figure out which is the best data center. Does it have space for you? How will your latency experience be? A lot of calculations go into that one second that we have before you get dissatisfied with the play. Button is not working

19:45-20:16
David Baszucki: So we've made tremendous progress on this. A long time ago, if a Creator ever had an event at an unexpected scale, we literally could get into situations where the N to the second or N Cubed type behavior just essentially caused us a lot of problems. Van you share a little about how? Now? I think it's it's very routine. We can handle 1, 2 million people simultaneously joining. And how we've figured that out.

20:17-21:46
Anupam Singh: Yeah, this brings me back to last year. I just joined a couple of months and you were telling me about this opportunity. And I asked you so, Dave, how many people do you think will press play simultaneously? Because, as a computer scientist, you want to know. And you said, Anupam maybe a million. So I went through the data and tried to look for. When was the last time that has happened? It had happened once for the Lil Nas X concert. A 1 million people that try to get into the concert at the same time. And of course we went into that N and and raised to N situation and got stuck. We decided to call this the thundering herds. and you asked us to build the system as if this just happens and you raised the stakes. I remember you said 10 million joins in 10 seconds and I thought, we have time. It'll be years before, you know. This is just, you know Dave's ambitious long view project. You fast forward to last November, December, January every weekend. every weekend. We went from 900,000 to a million to 1.5 million people pressing play at the same time. It's amazing. And of course our system has stayed up. Not only that, we have been able to match you with your friend in real time, so that you can have a great experience on the platform.

21:47-22:14
David Baszucki: Yeah. And just to highlight, thundering herd is a common term in the industry that really applies to these situations where everyone's doing the same thing at the same time. And that's really what we have to engineer systems for? it's really we engineer for thundering herd, and then everything else is like, we're sitting on vacation on the beach, you know, sipping a cocktail because thundering herds is where it really happens.

22:15-22:30
Anupam Singh: Yeah. And this has become the best recruiting line for us when we go talk to an engineer who's a distributed system computer scientist thinking, oh, Roblox is just press play get in. Then we describe this system to them.It's amazing.

22:31-22:49
David Baszucki: Yeah. And I think someday, we're gonna see 10 million and then 20. And then 40 million people in certain events around the world concerts, things that go beyond TV. You mentioned a personal experience about a concert on Roblox with someone in your family, even kind of appreciating the work we've done.

22:51-23:27
Anupam Singh: Yeah so funny enough last June, my pager goes off on the phone because we are having a thundering herds event. And before I could log in, my son, who's a 16 year old, who's a big Roblox user, says, Don't worry that’s just pick simulator having an event. And what was it. Amazing was my son had diagnosed the pager before I could even get it. This shows what the community is. And of course he was pressing play and causing more traffic. But I didn't take that personally.

23:28-23:43
David Baszucki: So one of the nice things about thundering herds is, in addition to thundering herds, If you start to fail during the thundering herd. It even gets worse, because then you get users trying to join every second. And you, it goes even into a more difficult space.

23:44-24:18
Anupam Singh: Yeah, our users are demanding and our creators are extremely creative. I was taught in software engineering that you release your software when there is low traffic, but we have enabled our creators to create entirely new experiences in real time, and they want all of their users to experience it together. So when we don't satisfy that demand. People press play again. They reload Roblox.com. They are incoming. And now the 1 million turning herd becomes 2 million 3 million really quickly

24:19-26:00
David Baszucki: Thank you Anupam for solving that and engineering it. And now we're going to get into another 2 areas of engineering.I think that a long time ago, people looked at Roblox, and in addition to all the educational aspects, thought, that's going to be a great place for people to learn to code. And sure enough, millions of people learn to code. When we met a few years ago we knew that data science and analytics was also gonna be a place for people to learn on Roblox. And we're gonna talk a little about that. And then I'm going to hint, to the audience, without going into it, that we do believe ultimately learning how to leverage AI and machine learning, not just for engineers, but for all of us is also gonna be part of the future. And so in each of these 3 areas we provide a platform that's very accessible where millions of people can code. Let's talk about providing a platform where people can do data science. And I want to talk a little about the vision we had on that napkin when we first met, and that is the notion that more and more the creators on Roblox have access to a massive data platform, and more and more In addition to using canned queries, can write their own queries. This is a ridiculously complex problem, right? As you mentioned so many events, those queries can get so complex. 2Can you share a little about the magnitude of the data where we are today on that journey? And we can riff a bit on it.

26:01-27:04
Anupam Singh: Yeah, 1 billion plus hours of engagement. That's the first number people should remember. Now, let's say you collect data every 5 seconds or something. Pick your time period. In our case, we get from that 1 billion hours a week we collect 500 billion events per day. So that's the first big data point. just so that people don't have to do math when they are listening. that translates to 182 trillion events a year. then say for the last 5 years. we have 910 trillion events. it's a quadrillion rules. So if we don't optimize the same as thundering herds, if we don't optimize this, it will just stick there. The query will keep running forever. It may never come back.

27:05-27:27
David Baszucki: Got it. We should start using gigapeta exit terminology on our events, because I think that's an exa event or something like that. And okay now I”m going to put you on the spot. What happens today when the new engineer says tablescan the quadrillion events? and what might happen in the future?

27:28-27:36
Anupam Singh: Yeah. So today, of course,if you do that without any filters or without any predicates. we are going to just refuse to run the query.

27:37-27:40
David Baszucki: But what would happen if we ran the query Anupam?

27:41-28:48
Anupam Singh: Oh my. In some ways you can imagine the servers just melting down, because what you're doing is you're getting this role of reading and doing something to it. and so what we have to do, though, is enable that. which is odd, right? You know, same as thundering herds, same as search and discovery. We have to enable our developers to do something really, really creative, including going through all the events that we have collected, and we are ready to share with them. how do we do it today? We did something very interesting. There's a paper by Google, it's called Learn indexes. And so we have this very esoteric filter called Bloom Fixes, and the trick about keeping this much data is to always factor it out. And so we've implemented some new tech in this area so that Dave can come in.
run this query, as long as he is a printer I'll actually cheat, and I will not run it on an exabyte of data. But from your point of view I just ran through. Exabytes of data in seconds.

28:49-29:35
David Baszucki: that's really exciting, because it it really hint at the future where. if I'm in a high school and I'm learning data, science. And we have a Roblox server collecting a lot of events. It's conceivable to learn about queries, to see the results of those queries happen on production data. for a lot of people that are learning this in addition to our developer community as well. and then my favorite question is: how complex could you imagine those queries getting where we still automatically figure it out for you and give you some answer, even if it's been asked by a student.

29:36-30:14
Anupam Singh: Yeah, I I think, because querying is becoming simpler for our developers, they will ask more complex questions. Even today we have 2,000 pipelines. Right now, as we are speaking to each other, there's 2,000 pipelines running on this, you know, almost exabyte of data. And that's hundreds of thousands of queries running simultaneously. So it is going to get complex. But here's the deal. The word of machine learning is coming to help in this area. So in our, in our case at Roblox, the machine learning platform is there to help run these queries. And we can talk more about that.

30:15-32:28
David Baszucki: Okay, that's cool. Now we're gonna get into some fun hints at the future. We've talked about search and discovery. We've talked about matchmaking we've talked about, then, this analytics platform and under the fabric of growth engineering Anupam. You also have a special other project that Roblox which is really the growth of our machine learning platform and when we build stuff at Roblox, we tend to build platform so that all of the groups and teams inside Roblox can use this type of platform. We we've hinted a bit that for a long time we've been using Ml and AI behind the scenes in our trust and safety system. more visible to our audience, has been generative materials more visible to people on Roblox has been code generation as well. but it might be fun to just share a few numbers on the different quantity of Ml and AI pipelines qwe're running behind the scenes that most people don't realize we're at that scale already.

31:29-32:28
Anupam Singh: Yeah, it's been amazing. You've seen this internally. 6 months ago, just a mere 6 months ago, we had maybe 10 or 20 machine learning models running in production. This could be, you know, when you load the homepage. When you do semantic search, like you say, image auto mode for safety.: You come to today. There are 70 production models I'm not talking about. These are not, you know, Hack Week projects just sitting. These are actually things that are being run in production. And you mentioned some of them. What your generation, real time moderation, audio transcription. And so that's that's the high level view. The under underlying view is every one of us knows that these large models are here to stay. And so the next step of our journey is Roblox has many models that we have developed on our own, how do they interact with these large models?

32:29-33:22
David Baszucki: Yeah, well, I think. And and for people out there listening, this is the incredible opportunity for us, whether it's safety or 3D model generation or avatar generation, and that the more we can safely and in a privacy compliant way add signal on these various things, the more we can accelerate existing Lms, other types of ML, models and increase the quality which I'm extremely excited about. So those signals are good. can you give any hints? Maybe without announcing any future products of other places where it I mean, it's very obvious, right that someday we'll be generating more and more 3D assets generatively, beyond materials. Any other hints you think of exciting areas you think for Ml acceleration.

33:23-34:11
Anupam Singh: I think, one area would be just voice. You and I. This conversation we also happen to be, let's say on a video call. And let's say you and I are also walking. Right? So now all of this has to be question. Go through a safety pipeline on. Maybe you and I are friends and we can talk in a certain way. But if you are in a professional setting we can't talk like friends talk, all of this has to happen in real time. So, without announcing any future products, being able to figure out that this is a conversation between friends versus, this is a professional meeting and putting different safety filters. It can be only solved with machine learning. It cannot be done with conventional technology.

34:12-35:09
David Baszucki: Yeah, and I think for double bonus points, I'm gonna add on additional layers of complexity. One, in addition to that level of moderation. 2 is a language translation, and then 3 is arguably the notion that ultimately some of us will choose not just to wear avatars, but to wear the avatar persona and change our voice to match the avatar. And that creates this very interesting demo use case where I'm in the US, and I wanna look and sound like Mickey Mouse, and someone else is in Japan, and they want to look and sound like Donald Duck and translate in real time. That's a lot of complexity in moderation, voice, synthesis, and translation at the same time. do you think such a thing will be able to be achieved at some point

35:10-36:05
Anupam Singh: Yeah, I think six months ago when Machine Learning was being used for conventional use cases, I would have said, this is difficult, but I think we are doing step functions. Now also, remember the data sets. I bought it here because they're becoming more and more 3D. How to take a 3D object. When you change your hat, you change your voice, you change your shoes. How to take all of that and feed it into a training and inference pipeline. I'll just give you one nugget I always tell my son. Every time he changes his avatar he increases the data that I have to store and process. We are processing right now 660 million thumbnails a day, used to be 10 million thumbnails, and all of that gets fed into machine learning modules that have to be trained and then used for inference.

36:06-36:19
David Baszucki: Well, I think that's actually a good problem to have, isn't it? Because that implies user growth. And it implies we're doing increasingly awesome things on the platform. So I kinda like it that we have that thing to handle.

36:20-36:34
Anupam Singh: Yeah. And the fact that people are trying on new clothes all the time. as a parent, I always look at that, and you know I love music, and so to be able to experience music by changing my personal life is amazing.

36:35-36:58
David Baszucki: Well, hey, Anupam, that was a tour de force. So I think we covered what is growth? We covered search discovery. We covered thundering herds and matchmaking, and then finished I think in this super interesting area of data, science and AI. it's just been a pleasure hearing about all the amazing engineering work you've been doing at Roblox. And we appreciate you coming on the program.

36:59-37:01
Anupam Singh: Okay, sir, thank you very much. And let's keep building.

37:06-37:22
David Baszucki: and that's all for another episode of tech talks. Thanks for listening. And if you'd like to find out more about careers at Roblox. Check out Roblox.com forward Slash careers. I'm your host, Dave Baszucki. See you again. Next time.