Roblox Tech Talks

In this episode of Tech Talks, founder and CEO David Baszucki sits down with Roblox product and engineering leaders Varun Mani, Sergey Makeev, and Tsvetan Tsvetanov to break down how SLIM (a hierarchical, dynamic level-of-detail system) and Cloud Transcoding are redefining what’s possible on Roblox.

What is Roblox Tech Talks ?

Discover the people, ideas, and technological breakthroughs that are shaping the next iteration of human co-experience. Featuring a diverse cross-section of guest speakers from Roblox’s product and engineering teams, we’ll discuss how we got to where we are today, the road ahead, and the lessons we’ve learned along the way. Hosted by David Baszucki, founder and CEO of Roblox.

SLIM + Cloud Transcoding Transcript

Keywords

Roblox SLIM technology, Roblox Cloud Transcoding, Roblox game engine, Second-generation game engine, Roblox streaming technology, Roblox instant play, Roblox creator tools, Roblox cloud rendering, Roblox platform scalability, Roblox immersive 3D worlds,Real-time 3D streaming, Level of Detail (LOD), Asset transcoding, 3D content optimization, Cross-platform game development, Cloud-based game engine, Dynamic asset streaming, Hierarchical Slim system, Roblox engineering innovation
Speakers
Dave, Varun, Sergey, Tsvetan

Roblox Episode #30
Dave: [00:00:00] Hey, this is Dave Baszucki, I'm founder and CEO of Roblox, and you are listening or watching tech talks. So today we're gonna be talking about what makes up a second generation game engine, how it involves cloud, how we help users join instantly at high resolution. We've got the technical dream team here, um, as part of Roblox.
So we've got Varun Mani, we've got Sergey Makeev and Tsvetan Tsvetanov and we're gonna be diving into everything streaming. Um, we're gonna be talking about slim, we're talking about asset transcoding, everything that makes Roblox ultimately support a hundred thousand players. If we look at the gaming industry for the last 10 or 20 years.
There's a traditional way that all this 3D content. Gets in front of you and, um, uh, and, and there's some [00:01:00] issues with this traditional way. 'cause what we're trying to do at Roblox join instantly. Uh, we're trying to have the same experiences run on a phone as well as a giant. Pc, uh, we're trying to do this without any latency, and when we look at the way traditional games work, it's not quite the same, right?
Like I, I can remember, um, in my days, they were distributed on these things called floppy disc. But even today, right, we distribute games on. Big DVDs, I think. Is that fair to say?
Varun: Yeah. And I think that the key to it is it's all packaged together. Yeah. Right? So whether it's a DVD or whether it's you're downloading a game today, it's coming as one giant package that you cannot, you cannot split it up.
So you have to wait until the entire 4.7 gigs, or 10 gigs, in some cases, fully downloads before you can start.
Sergey: Yes. And as Varun said, like for some games that could be like, you know, like 20, 40 gigs, like, and you have to wait like, for a very long time before I can start playing. And that's kinda, you know, like killing this like flow, right?
[00:02:00] So because like, you wanna play, you have like maybe like an hour or two hours, like after work, you wanna play a game and then you have to wait like for 30, 40 minutes, like while it's updates and like. Pretty much, that's much like my experience, like playing in it. Like
Dave: so isn't that somewhat akin to what we've seen in video and movies?
I can remember a time when movies were distributed on DVDs and then I could remember a time with BitTorrent, for example. You would download the whole thing and then finally we got to like I go to my Amazon or my Netflix or my Hulu or my Tubi and I can watch right now. But I, I don't think that's quite happened in the gaming market yet.
Not yet. Not yet. I think not yet.
Tsvetan: We are the only one that are targeting
Dave: that. So, so historically, Roblox has worked that way and behind the scenes, um. Most people might not notice it, but when you go to that homepage of Roblox and you click play, you can [00:03:00] join instantly. This is actually very non-traditional because when I'm on some very elegant, um, gaming platforms, I've gone as high as seeing 200 gigabytes of a download occur before I can play.
So, so we're already doing this instant play? Yes.
Tsvetan: Uh, Roblox I, as far as I know, is the only platform that allows this instant join and play. Oh yeah,
Varun: yeah, sorry. And, and the kind of like when you were talking about the video streaming, right? Like what are you actually doing when you're doing a video stream where I don't wanna download the entire movie, I'm just downloading just enough of the movie in front of me and we call that buffering.
I'm, I'm streaming the movie. Right? So That's right, but that's just in one dimension. 'cause there's only a time dimension there. Yeah, that's the key difference here. That's what's hard about doing it in Roblox 'cause it's a fully interactive 3D world where you have to stream in both instances and assets, which is kind of the basic building blocks of everything in
Dave: Roblox A a and uh, I would say this has made even more difficult 'cause in the traditional video game market.
I tend to see different versions of games on like a high-end PC and low-end mobile. And [00:04:00] what I, the way I interpret that is on high-end PCs where I can like download hundreds of gigabytes. Gonna have these really big complex assets and on mobile's it is almost like creators, I think make a separate version
Sergey: at some.
Uh, yes. So that, that, that's very good point, because like for in traditional games, you know, like, uh, they have this process that called like asset cooking. So they, uh, preparing a very optimized version, like of, of the rices, like per platform. And if you think about it, like from a more, you know, like video streaming, streaming perspective, then, you know, like, it's like on DVD.
Uh, the video was just enc coded once, right? Yeah. So, but like on any streaming platform they have like multiple encodings, like for the same video, so it's kind of automatically fits your device. And, uh, all current games they, you know, like they pre-cook like their assets like, and ship very specific while triplex here.
So we are able to generate optimized version of one asset per platform on the cloud. That's right.
Dave: So hold that thought, right? [00:05:00] Because a while ago we had the vision of, if you're a young creator and you make your experience, it can run in any language. We had the vision that it can run on any device and running at various resolutions on any device.
We're gonna, we're gonna dive into it. So, so, so traditional gaming market, you know, packaged either downloaded. Um, we wanna solve those problems of instant access published for any device, low latency and all of that together. And so we're gonna be talking today about, is something called literally 3D and 40 streaming.
We're gonna be talking about, um, contents in the cloud. We're gonna be talking about something called slim, which I think everyone's gonna get really excited. And so maybe, um, to dive in, Varun first a little vision on what is slim and what. Um, we call cloud Transco. Yeah. And what's that all mean?
Varun: So, so if we take a step back, if you think about like our, our overall vision is we're trying to get to [00:06:00] 10% of the gaming market.
Yeah. The way we do that is to give creators complete freedom over what they build. Um, and if you think about how we grow, there's kind of two dimensions. You can either like do platform expansion or we could do genre expansion. Platform expansion is how do we get the same content to work on any platform?
So you create once. Roblox, make sure it works everywhere. And in genre expansions. We want you to create more and more impressive things and more and more crazy things out there.
Dave: And just so when you say run anywhere, literally, you know, imagine something really big and crazy and fun. Grand theft Auto, red dead redemption, high-end console.
Running on two gigabyte Android phone and everything in between that That's right. For
Varun: b PC all the way down to two gig Android should be, should just work without the creator having to do extra work. So that's the vision. Exactly. Cloud, transcoding and Slim are just two of the first steps we've started to take down that path.
Dave: Okay, so that's a bit of a teaser. We're gonna explain what Slim is and we're gonna explain, I think you'll probably explain what Transcoding is, but let's, let's keep introing first everyone. So Sergey, um, your back, can you share a little of your background in engine [00:07:00] optimization? One of the most fun things in the world, right?
C plus plus assembly language like cm D like, can you share a bit of, of what you've been working on?
Sergey: Uh, so yeah, so as, as you mentioned before, so, so yeah, I'm pretty much been working like on games or game and just like for my entire life I get doing like those all optimization, starting like from the regional Xbox like and stuff.
Uh, and. You know, like when I first learned about like, Roblox, I, my mind was like just blown away, like with this idea of like, uh, a game created just once or like, not a game. Like maybe like any experience that created just once, like, and then it's able to run everywhere, on any platform, on any device, like without requiring any additional user input.
And that's the extremely challenging engineering problem. Uh, as, uh, as like everyone can imagine, right? So, uh, and that's require like a lot of, uh, you know, like traditional optimizations, like CMD, like multi trading, like very careful like GPU, scheduling, all of that. But at the same [00:08:00] time, you know, like there like some limit, uh, like on how much we can scale like on, on a single device.
Fortunately for us, like we always have like our cloud, right? So, and we can upload some of this work, like to our cloud and help those devices, uh, like to render more efficiently to render bigger walls. Uh, like to a degree, you know, like when we can render like some walls that cannot even like, fit into a physical device memory because, you know, like they, they also like exist on the server side and server helps kind all the clients to render them efficiently.
Dave: Yeah. And so just, just as a highlight, I think part of the hint that we're gonna be sharing underneath it is traditional game engines typically written c plus plus, and you download all of the assets. But say a game you wanted to play was the entire world, for example, no one can put the entire world. On a DVD.
And so what you're, you're really hinting at, and we see it already with Google Maps and other [00:09:00] types of that. There's no way, um, you can put all of that in the same place. And so part of what we're gonna talk about when we say streaming is that same thing for games and then enton. Can you kind of, how, what got you to Roblox and what was your background and maybe a little bit of a hint of what Cloud transcoding is.
So,
Tsvetan: uh, before joining Roblox, I spent, uh, my, uh, professional experience in two domains. One is, uh, 3D design and simulation software for architecture, construction and mechanical engineering. And the second one was large scale distributed computing, which. I can apply my knowledge from both those, uh, domains into Roblox because pretty much that's what Roblox is.
Dave: Yep. Let's jump over to streaming on Roblox and talk about it. What, um, so we know what video streaming is. Video streaming a video is kind of a, a linear thing, right? I just need a frame. After a frame, we send one after the other. Can you share what. Streaming might mean in [00:10:00] 3D in real time.
Varun: If you actually look at what, um, almost everything in, in, in a ROBLOX game today is made up of instances or assets, right?
Instances, kind of the structure of the world. And then an asset is the thing that's actually driving that instance for the actual content. Um, if you think about streaming, it's about streaming instances and streaming those assets as well. That you have to do. So if it, if in a, in a video stream, you just have one asset, that's the video.
In Roblox, we're doing that thousands, 10,000 times, uh, over and over and over. So when you're saying instant streaming it, you're just trying to stream in enough of the world around you and you don't know what the player's gonna do. The player might walk this way, the player might walk that way. You wanna stream just enough or buffer enough of the world.
And then in that world that you've buffered, you wanna buffer the assets at exactly the right quality. So as Sergei mentioned, you have different quality levels for video. We have the same thing from meshes, same thing for images.
Dave: So, so if we broke it down, would it be fair to say the instances are things we might be familiar with?
A car, a tree, another person, a hill, something like that. And then the assets are [00:11:00] the technical things that make up those instances. Like inside of those 3D objects. When we're all on a game, we, we have like blobs of triangles, we have blogs of textures and so we're, we're talking about picking which of those to.
Really pull in at any time.
Varun: Yeah. And which representation of them. So as you're walking, th walking through the world, it's like you, the, the device is saying, okay, I want that tree, or I need that tree. That tree is made up of a bunch of meshes and of a bunch of textures, and maybe even in, in the future animations, audio, so on and so forth.
So then it's deciding. Okay. Based off of the position of that tree, the importance of it, how much screen space it's taking. I want the texture at high quality. I want the mesh at medium quality. And then the audio, maybe I don't need the audio yet, so don't hold off on the audio. So it's constantly making those,
Dave: so in, in a way, this is a little like a, when I use a mapping program, I, I bring in just the little images, a little chunk of the map.
We're talking the 3D versions of that and, you know, interactive 3D components as well. Okay. And, and there's something really essential [00:12:00] here behind the scenes. I, I, I would say our vision has always been no matter what device or whatever you're on, we're trying to get the right ones in as quickly as possible.
And I, and sometimes we say we want to get them in as quickly as possible to maximize the human perception of magic. Like you're just, whoa, the world appeared for me. So Sergey, how do we pick and, and how do we sort what to load and what to bring in.
Sergey: So, yeah. Uh, optimization, like, uh, for perceived, you know, like latency.
That's, it's very important because like, as Varun just mentioned, you know, like. We constantly measuring, you know, like screen space area, like how important the object is. Like, and for example, if you see a tree, like at a very close distance, like, or you see exactly the same tree at a very long distance, those like could be like two different trees, right?
So if you see it like from the distance, you don't really care about like a fine texture details, like, or something. So we can, we can always fitch, you know, like a, a low resolution texture like [00:13:00] for the tree. Uh, you will, you won't be able like to distinguish between, you know, like if it's like a low risk like, or a high risk structure because like it's, it just like very small, like on your screen.
But like on the other hand, like we can fetch like, uh, almost like immediately because like this way less data. But when you get closer to those three, like we'll progressively fetch like more and more data and it's will look more detailed. So you won't be able to see the transition, you won't be able to tell the difference, but you will be immediately, immediately able to see and interact like with that.
Dave: And, and part of this is when I'm using my, either my two gig Android phone or my gaming pc, and I want to join something instantly. Even though the internet is really fast, we don't have. Bandwidth, like, like we are trying to push the world through that pipe, whether you've got 10 megabits a second or a hundred megabits per second, and balance what we push through there.
And, and I, one way I, I sometimes think about it is one tree that's close to me. Um, can use the same information as a hundred trees that are [00:14:00] far away from me and essentially the a hundred trees that are far away from me may have the visual importance as the one that's close and we bring them in at different resolution.
Like the one that's close to me is, is much more high fidelity than the ones that are far away.
Sergey: Yes, uh, that's exactly right. Uh, and, and, uh, the important point that you also like mentioned is like, we're not only measuring, you know, like how important the object is, but like how much, how much compu computer source, like do we have?
Like how much memory do we have? Like network bandwidth, like everything. So we, uh, we take into account like everything that affect user experience, like, and that include performance. You know, like how much memory, like do we have, like, and all of that.
Dave: So, so we've talked about instances are like the tree, the assets are the mesh or the textures.
Can you share more like how they different and, and what underneath the covers really we're doing with asset streaming. Okay.
Tsvetan: Yes. Um, for instance, streaming the server, the sites, what is important? Yep. And [00:15:00] based on, uh, the importance it, uh. G gives this information, pushes this information down to every client, and every client actually gets a different, uh, instances for that, depending.
Is it visible? It's not visible. Is it participating in the interactive quarter or not? For the asset streaming? It's the client that actually streams in what was in the world. Uh, based on the quality that we are targeting, as Sergey said, based on the capabilities of of the device, we are only targeting those exact qua um, uh, representations based on the quality and based on the capability of the device.
Pretty much that's how we are. Uh, so
Dave: those trees way off in the distance hopefully are very compact, right? A, a lower resolution, not taking a lot of it. So we can spend more time and more compute with a tree that's closer. Into me. And so we have this thing called the content pipeline or that asset pipeline.
Can you share a little about like if I'm a game creator, what does this transcoding pipeline look [00:16:00] like? Like I'm using Roblox Studio, I build a really nice car. What happens behind
Tsvetan: the
Dave: scenes?
Tsvetan: Because of the, uh, improvements in engine, we also wanted to preserve the, uh, artistic intent for the designers and game developers.
So we came up with this idea of cloud transcoding that allows us to, uh, uh, ingest the content from the, uh, creators store It. Optimize it based on the dev device, uh, devices and capabilities that, uh, uh, the client needs. And, uh, this pipeline is end-to-end. It is lazy on demand. What that means is that when the client requests a specific representation, only then if we don't have that representation, we start a compute.
If not, we return it if it already existed. So we are very optimal on, uh, the bandwidth and the compute, uh, that we are providing to the client. So
Dave: if I, um, and we, there's a long time ago when we didn't have this, or maybe actually pretty recently, that we had to put limits on the fidelity of these assets.
Like, I [00:17:00] think we would say you can only have so many triangles in your car, and the more triangles, the more accurate your car looks. Or we'd say you can only have so many pixels in the size of a texture. What we're saying now is, I believe you can upload that car however you've built it, and then when you say transcoding behind the scenes.
We're making various versions of that car really with fewer and fewer triangles.
Tsvetan: Yes, that is correct. Or up actually in the future. Yeah. And uh, so we have this capability now that we can, uh, depending on the improvements of the engine and uh, compute power that we have, we can, uh, IM improve the representation of number of triangles or texture, meet maps or audio, video, animation, whatever we need.
So,
Dave: so this is on demand, and I think what you were saying earlier, Sergey. Is you used to have something called baking like you would bake the high end. We're, we're essentially baking on demand. Many versions of that car at various [00:18:00] resolutions. They're all sitting there. And then the car way off in the distance I may see is one of those lower resolution ones.
And this is, uh, I believe what we're calling transcoding. Yes. So how does transcoding work? Like how do you take a million triangles and turn it into a thousand triangles? I mean that, uh.
Tsvetan: Uh, ingestion. Yeah. So studio, we, uh, actually ingest that, uh, 1 million triangle mesh. We then store it in, uh, our backend, uh, content platform, uh, backend, uh, system.
And then, uh, upon request, we actually run, uh, LOD processing,
Dave: which stands for level of detail. Level of detail, which means
Tsvetan: depending on, we are trying to preserve as much quality for that, uh, limited representation of the mesh. Or texture.
Dave: So if, um, maybe Varun if we thought about how this is working across devices, you're, um, so you're the creator.
You put in this giant high res car. I'm on a low-end Android device. You're on [00:19:00] a high-end PC gaming device. Hopefully we can still play the same game. Exactly. Like what, what happens for both of us? And,
Varun: and so, so basically what, when my client comes online, it's like, Hey, I'm a low and Android device. I want this specific, 'cause one thing we didn't mention, actually, it's not just about level of detail, but it's also a platform specific representation.
So Android, iOS, each of these different platforms have very specific representations that you can compress the textures, you can actually conform them in such a way that it works way more optimally. So my Android device will come online and say, Hey, I want the Android version. Of a low end texture for that tree, and then the cloud transcript will go off and make that for me, and it'll cache it so the next Android device comes online.
It doesn't have to do it again, it can just send the version that I made. When your PC comes online, it's like, Hey, I've got all the, all the resources in the world. Gimme the high quality version, gimme the uncompressed version. I just want highest fidelity. And then it goes ahead and cashes that, which means that this is infinitely future proof false.
Dave: And also infinitely instantaneous. So I love Grand Theft Auto, and they, they have like this new version that's gonna [00:20:00] be absolutely mind-boggling, which I'm really excited about. If I was an artist who, um, wanted to adjust an asset. In our, hopefully our future system, I would re-upload in about five seconds.
Your and my game would trigger a new transcoding of that. And literally, you know, at at peak concurrency, like something like grow a garden with 25 million people, you could in five to 10 seconds have all of those new people. Possibly getting that new asset. This is why I think it's, it's somewhat of the future because I, I think part of, um, the beauty of Grand Theft Auto is, is they, they bake in so much quality.
They're gonna put it on giant DVD. They have to make sure that's perfect and there is no way to iterate on that. You gotta get it absolutely perfect right there. And, and, and so then I, I, I guess finally. On the future proofing, in addition to instantaneous changes, the, there is something that [00:21:00] seems good about storing artistic intent, you know, storing the original work of art.
In a way, can you, can you mention how that might help us? Uh,
Sergey: yes. So since like, we're trying to capture like, uh, as much, uh, original like artistic content, like as possible and like we're storing like for this, like we're storing, uh, the source co the source version that's right. Of all the assets. And we this like, as you can imagine, you know, like we can.
Not only like down sample it like by generating like simpler versions of a texture, but also ups sample it like in the future because like we, we already captured that semantic that like what, uh, the original intent was so we can upscale this asset in the future. So, and without requiring any user input.
We can make it look better.
Dave: That, and that's gonna be a big hint as we get into the future. Um, so let's, um, let's add a second component to what we've been talking about. So we're streaming. Make instantaneous [00:22:00] fixes. Cloud transcoding, deliver any LOD. And then what we hinted at the start, we're gonna compliment that with something called Slim.
So Slim is not a workout program. It's not like the Roblox, it's not the Roblox snacks that are, are hopefully gonna make us all really feel good and think clearly, uh, what's slim in the context of the Roblox engine.
Sergey: So, yes. Uh, so. Uh, lemme step back, like, uh, and uh, bring this like, uh, example again, like with a tree, right?
So we have, uh, you have a tree, and as I said, like where you might have like multiple trees, like, uh, at the distance, uh, slim is what turns like all those individual trees, like into forest and optimize, like, not individualized like anymore, but optimize them like as a, something like that's bigger, that's aggregation, like of individual license and, uh.
And it's the slim technology. It's always optimized like those assets, like in a specific context. So let's say you have [00:23:00] like a building, right? So, and like a bunch of, uh. Uh, rooms inside that building. Like if you're outside the building, you don't care about the rooms inside the building. So, and Slim is aware about this context, like, and it can very efficiently remove all the building internals, like in that case, and opt and, and generate like a very optimal version of that asset for you.
Dave: So there's, there's three use cases that as, as you've been designing this, we've been, you know, I've been thinking about. So one use case is people are building more and more complex avatars. And historically on Roblox we've been very careful how many layers of clothing can you wear? How much jewelry can I wear, how much stuff.
But the future may be avatars have many, many things like a hundred. And I think our creators are very savvy around performance. They, they may actually constrain the fidelity of avatars 'cause they want so much perf so. Avatars that are complex, number one. Number two, as you mentioned, there [00:24:00] may be a hundred trees over there and they're way off in the distance and no one's interacting with them.
And, and then the third is maybe a building with rooms inside. Uh, I think what, um, we're gonna try to do here is no matter what it is, a, a moving avatar, a bunch of trees or a building, this exact same system is essentially going to composite down into a single object, single mesh, single texture, and that's absolutely mind boggling.
Uh,
Sergey: yes, you're exactly right. So that, that's a good word. Like for this like, uh, that system is dynamically generate like a compose compass representation, like of multiple assets, like, uh, and it determines, you know, like the boundaries of this composition, the level of details like for that composition, like dynamically, it's always operating like in this like world context.
So exactly the same as like in two different experience, like might be optimized differently based like on the target device based, like on the context like, uh, in which they're used like, [00:25:00] uh, and. Uh, we also support dynamic updates like for that content. So it's not always like, it is not like a static optimization.
If something, uh, is changing, like over time, uh, this will be reflected like in the slim representation because like we, we can update it like dynamically.
Dave: It, it almost, one of the reasons I really like this is it almost feels if I was a Roblox developer. And I really wanted massive scale and example, for example, really complex avatars.
I might be dreaming of this, like I, like, I can walk up to someone close to me. They can take their hat off, they can put on some new jewelry, which is the way this system's gonna work with objects that are close, may work at regular high fidelity. And then as that avatar works away, it, it composites down into a single object.
And as it goes further away with LOD. Um, we joke like, like the ultimate is an avatar way in the distance may be 12 triangles and, and like a really small texture. So this is, this is almost like what a developer I, I think [00:26:00] has dreamed about. And then I, I think this makes, um, one of the things that's this makes possible is larger world.
Across devices.
Varun: Absolutely. I mean, what Slim is essentially doing is it's giving us a new axes of scalability. So we talked so much about cloud transcoding in different levels of detail. Now we're adding another level of another axes here, which is about composite like composition. Right? And, and you can think about a, a two by two here, and any point on that is a valid experience.
That's right. So you could be like high quality, low compositing, you could be high quality, high compositing if you're on a very low end device or, or a different device. And all of those points make sense. So if you're talking about a massive world, that entire world could be 12 triangles. Also, if you're far enough away, or if it doesn't, if it doesn't make a difference.
Dave: And, and so then thinking about user experience and how Slim can lead into a better user experience. If we think about LOD, we think about cloud trans coding. If we think about Slim, we think about streaming relative to Roblox today or, or a year ago, how. [00:27:00] How can this affect the user experience?
Tsvetan: Yeah, so slim, uh, we build Slim with the idea that, uh, we are enabling the developers to create richer and bigger complex world that are going to, uh.
Uh, behave and interact with the end user. Siemens? Uh, yeah, very easily. Like without, uh, and the end user, uh, user experience is exactly the same as if you have the, uh, most powerful machine. And, uh, slim is, um, this system that today enables, um, static models. And we are, uh, you know. Enabling in the future avatars and dynamic models that are, um, generated, uh, hierarchically, uh, so that we can, uh, make the system as fast as possible for the end user,
Dave: which for the end user may be more rich worlds on lower end devices.
It might be on my low end phone. I can play with you on your gaming pc. Uh, creators can make higher, um, not worry about the [00:28:00] number of layers of clothing on an avatar. More complex vehicles, um, complicated. Um, mechanical objects as well are gonna get compressed by slim, which is gonna be really, really cool.
And then. Um, maybe thinking through how they, they work together. So we talked about cloud transcoding, uh, so offering lower LOD. Then we talked about slim, which is really lightweight models. Do they work together?
Varun: Absolutely right. And that's what I meant by that, by that access thing. So all of the slim models go through the exact same cloud transcoding pipeline as well.
'cause once you composite them, and once slim is, is decided based on the context and based on the content, this is what the composite should be. It goes through the same transcoding pipeline and then generates a low quality version, a high quality version, an Android version, an iOS version for everything.
So it's, it's, it's very, very much complimentary, um, between,
Dave: and so that really for the creator is, is almost like a double whammy, very [00:29:00] complicated avatar way at a distance composited into a slim model, then transcoding. That avatar, once again could be 12 triangles and like a single color. And, and what
Varun: you'd hope to see, or what I would expect to see is kind of a first order.
All of the existing experiences can just start to run on lower and lower end devices.
Dave: Yeah.
Varun: But the really exciting thing is the second order effects, right? Creators start to realize like, oh, I can put more stuff into the world. Oh, I can add. Well, you think
Dave: about it, um, if I am building video. It's almost as if the camera is pretty thoroughly designed.
You know, I get my 4K camera and I never worry, like wherever I point the camera, can I take a movie like, like I do. But our creators don't have that camera today. Like if they point the camera in the wrong place. The world might blow up, right? Like too much resolution like, like if they point the camera at a crowd of 10,000 people, it's like, so this is almost like freeing up the immersive 3D camera,
Varun: but giving [00:30:00] creators freedom.
So we can do platform expansion and then genre. Yeah,
Dave: and, and I do think there is a time we do not know when it, it is some mixture of Moore's law compute Slim 3D, streaming ai, local tech up sampling, whatever. Where it may be that this camera is no longer a limitation, where if you wanted a hundred thousand people in a giant stadium, photorealistic hanging out with your friends, streaming on a low end device with your low latency, like feels good.
That will be, I think when we say the immersive 40 camera is mature, the, um, the good news or the bad news is there's a lot of work to do. So it makes our jobs kind of interesting. And I, I guess I, in, in that theme of what is the, how fast we're moving towards that ultimate three and four dimensional camera, um, Sergei, what's next for Slim and like, what would we optimize after this?
So
Sergey: for Slim, like, uh. Few big steps like, uh, in front of us. Like, so [00:31:00] the, the very next step is like to make slim fully dynamic that support like avatars, moving vehicles, like moving mechanisms, uh, and all that stuff. But the more important step after that, like, is to make slim hierarchical, as Barron mentioned, you know, like that slim generates like pretty much the same assets, like as uh, like just regular assets.
They go through exactly the same like streaming pipeline. We optimize them because like each slim model has like individual ologies, like, and stuff. But like, what if we can take it like even further, what if we can combine like multiple slims, like into one, like, you know, like mega slim and then again and again in like, in like this, like fractal like way.
Dave: So this was part of our design question. 'cause when you, I originally proposed slim, we were looking at avatars and we had a separate project. Thinking about how would you put a hundred avatars together and, and how, how are we gonna simulate a hundred thousand people in a stadium? And what was really, there's this fascinating day [00:32:00] when I think you said, you know what, we may be able to take slim avatars and then put a hundred of them into a collection of a hundred avatars with exactly the same technology.
And that, that's a little bit where we said. Mind blown, like that hierarchical nature?
Sergey: Uh, yes, uh, that's exactly right. And if you take this example to the extreme right, so you can think about like, okay, so our like virtual camera, you can see the entire planet. So we see the entire planet and it's like just, you know, like a sphere, like with the texture.
When the camera get closer, you'll see like individual con continent, like, then you will see individual cities and then you can go all the way down like to the stadium full of people because like we have all the simulation running on the server and client doesn't need to know like, uh. About like the entire world.
It's clearly, you know, like you cannot fit, you know, like this entire planet into a device memory, even if you're on a powerful pc. But our servers like are very powerful and by generating this like [00:33:00] key representation for slim, you know, like we can go like from a planet scale all the way down to the like, you know, like room scale, like, or even like Microworld.
Dave: Well, right. We may go all the way down to the watch that you or I wearing, looking at the little knob on the watch and even further than that. And so then, so as we think about this then, uh, Satan, what like it, how might this change what creators start to do? Right. We're gonna. Get, hopefully get rid of the, oh my gosh, I have to worry about the performance of an avatar.
Um, any other changes you think we might see
Tsvetan: this will remove, uh, in my opinion all the limitations that they have today. They, as you said, they don't have to think about, uh, how complex the world they are creating. They just have to think about what they wanna present as a game dynamics to their uh, players.
Nothing else. Everything else is going to be handled by us automatically.
Dave: So, and, and good diving into some of that automatic stuff. You know, there, in the old [00:34:00] traditional Roblox we did this thing called texture atlas. We actually had a template for a texture, and you, you would have to figure it out. I'm assuming with a slim model that's completely automatic.
Um, like the generation of that texture at lower resolution.
Tsvetan: Yes. That will be automatic as, as well as determining those, uh, building blocks that Sergei explained in the hierarchical dynamic world.
Dave: Okay, so now, um, now we're really gonna blow our minds and we're gonna talk about, um, it's easy to imagine something being simpler, right?
It's almost like if you're an artist, it's very easy to imagine kind of. Making it more crude and flattening something out, it's a lot harder to imagine something getting a higher resolution. And so I want to, I want to poke a little bit into, um, the future of ai and there, there's a lot of work right now on ai, like how we're gonna generate video games and real time immersive streaming and world model and [00:35:00] all of that.
Um, and I, I. Feel more and more we're gonna find, this is not a monolithic thing. This is gonna be a stack of 3, 4, 5, or six models all working together, including, you know, a command line where if we're hanging out, we, like we do with four D, we can just say. You know, turn the Washington Monument into Godzilla and boom, it's gonna happen right there.
Um, and, and ultimately generate objects or worlds or full games. Um, and hopefully in a multiplayer context. So we're, we're walking around together, the four of us, and we say, Hey, over there, the four of us would like to design a new game and have it all magically appear. There's a secondary aspect. Um, in addition to Command Prompt and World Gen and 3D, which is what you were saying, um, is Upsampling and, and that, um, if, if I imagine sometimes classic Crossroads, which is a 20-year-old game [00:36:00] on Roblox, there's actually enough information there.
If we took classic Crossroads and Sergey, you said, make this look. Like a photorealistic, medieval version where the gameplay could be the same. The, you know, the, all, all of this fun little rocket launcher tools are the same, but all of a sudden everything has a new look. So how would this work in the context?
I, I don't know who wants to jump in. On, on LOD because now LOD has to go up rather than dead.
Varun: I, I think one of the points that Sergey made before, right? It's about storing as much of the original artistic intent as possible so that we can do some of this stuff in the future. And, and my favorite example, and this is random example here, but imagine I'm using a texture, but that texture was actually a picture that someone took if I actually stored the aperture.
Size, if I stored the, the f-stop when that picture was taken. Now my, my uprising algorithm has so much more context. Yeah. If I know where that texture is being used, if I know that that's [00:37:00] being used on a, on a castle, is that being used on the, on the floor? My, my Uprising algorithm can be so much smarter about making that and keeping that original creator intent all the way through on that high-end PC or some future pc that's never even been invented yet.
So
Dave: we showed this at RDC, like I, I think we showed like a 15 second video of Crossroads being up sampled. Um, can you, let's talk about what might actually happen. The asset is in the cloud, the 3D objects, the textures. Um, there's an additional prompt sent into our content system that says, look, all of these assets are part of, uh, which should be a much more photorealistic, medieval world one can imagine.
Then all of those different LEDs being regen to be much more close to the photorealistic version by a. What we would call it a 3D UPS sampler or really a 3D generative creator. So, um, so who wants to dive in on what might happen to the, the castle, for example?
Sergey: I, I, I think like what Baron just mentioned, you know, like, [00:38:00] uh, seems like we capturing like, so much more like semantic, like from that world.
Like that's like what extremely important. Like, so for example, for this like crossroad example, right? So if you look at the castle. Uh, from my perspective of individual assets, just like, you know, like a number of blocks, they're just boxes. Like, and you don't have like enough context, you know, like you cannot make like those blocks looks any better.
And the castle is just, uh, it's more than just like individual blocks. It's a sum of them. Uh, and slim, since like slim operates, like in this like context, slim is aware about this thing, like as a whole. So, and now like any assembly, like will know, like way more like about, you know, like a castle as a whole, not about like individual assets.
So, so it can make like a way more. Uh, better version like of that up sampled castle because like it's, it just like, it just understands that
Dave: like, and, and, and in a sense, one could imagine without having that existing crossroads castle. And just so everyone knows, crossroads [00:39:00] is a very old, blocky looking Roblox game.
That castle's actually a really good aid to the prompt. A build me a castle. Over there is a pretty open-ended prompt. Like we're gonna get a random castle of a random shape. But the, the castle in crossroads, actually dimension size, like, that's a lot of good prompt information to up sample into really a high quality castle.
Sergey: Uh, yes. Uh, that's exactly right. So because like, uh, uh, this like 3D dimensional, uh, part of the castle, first of all, it's, it's gonna be like very important, like from a gameplay perspective, right? So you cannot just generate like any castle, like, and make it work like for, uh, for the game. Uh, because. But if you can capture like all the like dimensions, like all the, you know, like semantics, like in, in all the contexts, like in what, uh, that isset been used, then you can up sample it like very efficiently.
And, and you, you won't break like your game like by [00:40:00] doing this. Like, because like this, like another very important thing. You don't want to ups simple. An asset that can make your game like unplayable.
Dave: Well, well, I think there's a future world where in addition to all your 3D information, you can either be editing your 3D information or editing your prompt.
They both come together to generate that world so that. The five second update we talked about, which is an asset modification in the future. It may be a five second prompt update and a a little more hint in the prompt. And then lazily generate that world on demand. And, and then I think there's one last, um, when we think about all the dimensions of how AI can come together to bring these things photorealistic, we're not.
There, there's a final layer, which is the 2D layer. What's happening on your gaming pc? What would be happening in some video appliance in the cloud where even higher up sampling may occur? So, um, for example, perfect hair on my head. The best place to [00:41:00] do that hair may be locally with a 2D up sampler. I know you've worked on a lot of games that try to do hair in 3D do.
What's your, do you think someday all hair will be done with 2D Upsampling or will it be 3D hair?
Sergey: Yeah, absolutely. I think like we're moving like into that direction because like some things like you can only do like in screen space, like it's more efficient. Like, lemme put it like this, it's more efficient, like to do in a screen space.
Like for example, like this, uh, like hair, like rendering, right? So, and like, or. A lot of game engines like today doing like ML based, like upsampling, like, uh, going like from like 10 80 p like to 4K. But you can, you can do like way more than that, right? So you can change the shading, you can, uh, change how the things look.
And if you take this example like to extreme, like a game engine can just render, you know, like some like semantic information, like for like ML algorithm, like, and then this ML algorithm like will generate, you know, like good looking hairs like, or good looking materials. Like in sprint space, the directive.
Dave: Super good. [00:42:00] Um, so then, and then as we start to wrap up, maybe for all of the c plus plus coders out there, the assembly language coders, which, um, I, I wish I was like, that's one of the most fun things in the world. We're doing something very difficult here, right? We're trying to, um, make all of these technologies run on all devices.
Question on transcoding. The question on transcoding is, I know a lot of PCs have optimizations, um, with SIMD registers and things like that to make these things run faster. Have we started to probe around with SIMD in any of our transcoders, or is that future optimization?
Tsvetan: Uh, so first of all, by the way, transcoding can run on engine as well today.
Okay. Excellent. Good. So we are taking advantage of CMD optimizations, but we are, uh, looking even further than that using GPUs and, uh, other,
Dave: so transpo Okay. Transcoding on GPUs is coming possibly.
Tsvetan: Well, yes, we are looking into it.
Dave: Okay. Which is super, super fun. Okay, so, um, [00:43:00] great update everyone, uh, uh, for those who want visual, possibly RDC, main stage presentation.
I think we should, we had visuals of a lot of this. Everything we've talked about is, is on the pipeline. Uh, we're watching it carefully and I, I, I really do think, um, all of you and the various teams behind you are really contributing to what would be, what I sometimes call game engine two point, uh, which is heavily cloud.
Heavily supported by Transcoders in all of this technology. So thank you all. I think we blew everyone's mind today. Really appreciate it.
Tsvetan: Thank you.
Dave: Okay, so hey, this is Dave. Uh, once again, tech talks and we'll look forward to seeing you our next time around. Thank you for the whole team. Thank you.

More episodes

Chapters

What is Roblox Tech Talks ?