Computer Vision Decoded

Join our guest, Keith Ito, founder of Scaniverse as we discuss the challenges of creating a 3D capture app for iPhones. Keith goes into depth on balancing speed with quality of 3D output and how he designed an intuitive user experience for his users.

Show Notes

Join our guest, Keith Ito, founder of Scaniverse as we discuss the challenges of creating a 3D capture app for iPhones. Keith goes into depth on balancing speed with quality of 3D output and how he designed an intuitive user experience for his users.

In this episode, we discuss…
  • 01:00 - Keith's Ito's background at Google
  • 09:44 - What is the Scaniverse app
  • 11:43 - What inspired Keith to build Scaniverse
  • 17:37 - The challenges of using LiDAR in the early versions of Scaniverse
  • 25:54 - How to build a good user experience for 3D capture apps
  • 32:00 - The challenges of running photogrammetry on an iPhone
  • 37:07 - The future of 3D capture
  • 40:57 - Scaniverse's role at Niantic
Learn more about Scaniverse at: https://scaniverse.com/
Follow Keith Ito on Twitter at: https://twitter.com/keeeto

Follow Jared Heinly on Twitter: https://twitter.com/JaredHeinly
Follow Jonathan Stephens on Twitter: https://twitter.com/jonstephens85
Follow Jonathan Stephens on LinkedIn: https://www.linkedin.com/in/jonathanstephens/

-----

This episode is brought to you by EveryPoint. Learn more about how EveryPoint is building an infinitely scalable data collection and processing platform for the next generation of spatial computing applications and services: https://www.everypoint.io

Creators & Guests

Host
Jared Heinly
Chief Scientist at @EveryPointIO | 3D computer vision researcher (PhD) and engineer
Host
Jonathan Stephens
Chief Evangelist at @EveryPointIO | Neural Radiance Fields (NeRF) | Industry 4.0
Guest
Keith Ito
Hello, world! 3D enthusiast building Scaniverse.

What is Computer Vision Decoded?

A tidal wave of computer vision innovation is quickly having an impact on everyone's lives, but not everyone has the time to sit down and read through a bunch of news articles and learn what it means for them. In Computer Vision Decoded, we sit down with Jared Heinly, the Chief Scientist at EveryPoint, to discuss topics in today’s quickly evolving world of computer vision and decode what they mean for you. If you want to be sure you understand everything happening in the world of computer vision, don't miss an episode!

[00:00:00] Jonathan Stephens: Welcome to Computer Vision decoded a podcast about the quickly evolving world of computer vision. As always, today I'm gonna be joined by our co-host, Jared Heinly, the Chief Scientist at EveryPoint, and today we have a special guest, Keith Ito, the founder of Scaniverse, a 3D capture app for iPhones. Keith will be sharing with us the evolution of the Scaniverse app and the challenges that he faced with building an app for 3D capture.
He's gonna dive into the nitty gritty of balancing speed, performance, and quality of output in a 3D Capture app, and we're also gonna talk about the challenges of building an intuitive interface for end users. He's also gonna share a bit about the future of Scaniverse as they're now part of Niantic. Let's jump into this episode and decode what it means to build a 3D capture app on an iPhone.
Welcome to the show, Keith and Jared. Really excited to have you here. Keith, talk about Scaniverse and, maybe just a little bit about the background of your career leading up to building this excellent 3D reconstruction app. So, can you just kind of give us a, a, a quick background of like what brought you to up to Scaniverse in, in your whole entire experience working with 3D data?
I know that you prior had to experience Google Maps in other places. So yeah, could you start out and just give us a little bit of background about you there?
[00:01:28] Keith Ito: Sure. Yeah. First of all, thank you for having me on the podcast. I'm super excited to, to be talking to to, to both of you. Yeah, as far as my background goes you know, I think. I started out pretty much straight out of school going to, to work on Google Maps. And this was kind of in the kind of mid two thousands.
It was just kinda like a really exciting time to be to, to be working on maps and at Google. You know, I think when I got there Google Maps was only launched in the US and, and the UK. So the rest of the world was just like a giant blue ocean without even any land there on the map.
And so while I was there I worked on the map tile rendering side of things. So. You know, it was mostly 2D graphics but taking all of this geographic data that Google was either licensing or collecting itself and turning that into you know, a map of the world that you could kind of zoom in pretty much as far as you wanted, or zoom out to see the whole, to see the whole world. And you know, I think that's kind of when, when I first started getting really interested in this idea of being able to Map the world and to be able to you know, take data and, and turn it into something that that you could share with people or that, that, that people would find useful.
And so I, yeah, yeah, I, I was there working at Google Maps for a while and at some point in time, I think it must have been around Maybe 2008 or so, 2007, 2008. I think maybe Apple had just announced the iPhone. Android was kind of gaining steam and you can kind of see this big transition toward mobile kind of on, on the horizon there.
And kind of at, at that point I kind of moved over to start thinking about you know, how. well first of all, how, how maps would work on, on a mobile device, but also like how how you would do like a cloud first, like navigation service. So like back then you know, if you wanted turn by turn navigation, you know, maybe you would buy like a, like a Garmin or you know, one of those standalone GPS units and stick it in your car or, you know, maybe your car would have one built into the dashboard.
But the data would usually be a, a little bit out of date, right? It would, you'd have whatever it came with or maybe you could get updates for, they'd be like once, like once a year or something. And.
[00:03:26] Jonathan Stephens: you if you even took the time to do it. I know many people who would buy Garmin and never update
[00:03:31] Keith Ito: Right, right. Yeah. Yeah. And, and so the idea of of of it being cloud first was something that you know, Google could do because and that could really only be done like once you had this device in your car that was like constantly connected to the internet. And so you could update the data.
You, you could update. But you know, you could even get more granular than that, right? You could have real time traffic and then also update the route that the user would take based on current traffic conditions and, and, and things like that. And so that was also like a really interesting experience to have to work on this product that was.
You know, taking it really reinforces me like the importance of keeping your map up to date and, and keeping you know, the data that you're collecting up to date. And then it also was, you know, I think one of my it, it , it was when I, first started my obsession with getting the most possible performance out of a mobile device because you know, back then you know, things were, were not as performant as, as they are now.
And you know, even just, just, you really had to optimize and, and kind of think ahead of time, like how you were going to optimize everything to, you even just be able to render it like 30 FPSs on, on a, on a device. So
[00:04:38] Jonathan Stephens: You can almost see that in the trajectory of how you've taken Scaniverse. we will get to that, but just, you know how it's like you made it work and as is, if you almost followed your Scaniverse blog, you could see you weren't finished with each update. You're like, no, we're not done here.
There's much more we can do and so I'm excited to kind of talk about the little optimization path you took, but,
[00:04:58] Keith Ito: Yeah. Yeah, yeah, definitely. Like I, I, I think like you're basically never done with optimization, right? Because there's always more, there's always more you can do, right? And you know, whether it's, you know, especially in, you know, like scanning or reconstruction because you can always make it faster.
You can always, or you can always bump up the resolution, right? you can get more detailed. It's kind of like it's, it's a, it's, it's a really interesting problem cause you're, you're, you're basically never done. And So, so yeah, I think it was kind of around that time that, that, yeah.
Yeah. I really realized that. And, and it can be a lot of fun to, to optimize stuff.
[00:05:28] Jonathan Stephens: Jared could talk about that too. I mean, it's like when you have stuck with a problem, you'll, what? You'll take a, you'll take a fun break and optimize our backend and make it somehow faster.
[00:05:37] Jared Heinly: Yeah. It's, it's, that's fun for that variety, you know, like you're saying, you know, being able to work on these hard problems, but how do you do navigation and, you know, and mapping, you know, of the. But then also make it go really fast. And so it's the algorithms as well is just that engineering effort to make things, make things performant and easy to use. That's such a fun mix.
[00:05:55] Keith Ito: Yeah. It's, it's almost like a, it's almost like a game, right? Because you usually have a, some, some metric that you're trying to optimize for and, and then you try something and then, and, and then you know that, that time goes down or your, your FPS goes up and, and then you, you know, you're just trying to find, find, find different things that you can do that will, that will raise your score basically.
[00:06:11] Jonathan Stephens: and, and there's, that's what's helpful to, you know, talk to your clients or clock talk to your user base. Be like, well, would, would you put up with taking twice as long to get a 3D model? it'll be much better quality. And you know, some people say, I don't mind waiting. Or some people who just said, no, I, I need it like right now. So just finding that limbo of where do you optimize for speed, quality.
[00:06:35] Keith Ito: Right. Yeah.
[00:06:36] Jonathan Stephens: Ideally, you hope you get both
[00:06:37] Keith Ito: Yeah, hopefully you can get both. Yeah. Eventually. Yeah. Yeah. And yeah, or maybe some people want one and some people want the other and you need to kind of offer both. So I guess , getting back to my, my, my background kinda after that there was this other project that was just spinning up inside of Google and it was it didn't have a name yet, but it was what eventually became Google Glass and You know, I hadn't really done anything in like, AR up to that point.
But they needed someone to kind of bring some of the mapping and navigation pieces over to glass. Because I'd worked on that before they kinda asked me to do that. And you know, that glass was a, a, a really interesting project and an interesting product. I, I think it was probably a little bit before its time both in terms of.
You know, what the hardware could do at the time. And in terms of people's perceptions, like, in terms of like what , you know, what people thought of AR at at the time? Yeah. You know, it was very there wasn't much of of a basis to, to go on for people what people had used before. And so like, you know, on, on the hardware side, right?
A lot of the you know, display and optics technology maybe wasn't, wasn't quite there yet. Al although we, you know, you could see it on, on, on the horizon, right, right. But you, you just and a lot of the stuff that's coming out now, right with this kind of renewed interest in AR and, and and mixed reality is it was kind of on the horizon back then, but, but just.
Tantalizingly, like out of reach. And so so, but we, yeah, we, we learned, or I learned a lot about you know, what it takes to to, to build experiences in AR and and, and what it takes to kind of develop a product on new technology. That is where, where the the hardware is still being developed and the The software is, is still, isn't written yet, and, and the nobody knows what the UX should be or the, you know, so it's, it's
[00:08:15] Jonathan Stephens: it's kind of the problem that Magic Leap ran into as well. They're. Define what this new, I mean, maybe it's not new, but really new to consumer technology is, and a lot of people's like, oh, I don't know. You
[00:08:26] Keith Ito: Yeah.
[00:08:26] Jonathan Stephens: You know, that to be honest, my favorite, my, probably one of my favorite gadgets or devices I've, I've got in the last couple years was the Ray-Ban Stories because,
[00:08:37] Keith Ito: Hmm.
[00:08:37] Jonathan Stephens: Just having that camera on my face while I'm playing with my kids, The last thing I wanted to do be like, hold on, I gotta pick up, pick up my phone.
I could care less if the quality of the images were not as good as my iPhone 13 at the time, but I got it in the moment and I didn't have to stop what I'm doing. So I think the, I think there is something still in that form factor, but like you said, you know, it's, it's still very new to the thought of having overlays on your, on your face and all those things that.
We've all kind of done a pretty good job at optimizing the iPhone and, you know, Samsung, all these, all these great Android interfaces that they're, everyone's used to versus this is all new on the face.
[00:09:19] Keith Ito: Right. Yeah. It's, it's all new on the face. It's, yeah, it's something that's obviously, you know, very obvious on, your face. And it's, I, you know, I think it's something that will. take some time to, get used to. But but yeah, like you were saying, like the, the ability to, and, you know, capture something while you're in the moment the ability to like get information when you need it or, or to, to augment the world in kind of interesting ways is I think I think it's a, a really promising technology for the future.
[00:09:44] Jonathan Stephens: Mm. All right. Well, let's, let's pivot because the, you know, we really are excited to have you on the show to talk about Scaniverse. Which I'm, I'm assuming most people who tuned in have heard of this app. But out of all of the 3D scanning apps that have been on an iPhone it's by far one of my favorites.
I don't wanna say it's like the number one because I know there's other great apps that are slightly different than it, but it has definitely been from a usability standpoint, I mean it has been great. I love it. But can you just for the people who maybe have never heard of Scaniverse, can you give people just a quick overview.
[00:10:21] Keith Ito: Yeah, I mean, so Scaniverse is an app that lets you kind of capture, edit, and share the the moments, the things the, the places and, and the people that matter to you. And so the way it does it is, you know, it It's, it's an iPhone app where you can just go around, scan something, it can be basically anything.
And then it will reconstruct a 3D mesh out of that for you. And then allow you to, you can share it on in messages, you can share it on the web, you can share it to sites like Sketchfab and let other people experience it either, you know, in in 3D, in AR you know, kind of.
Whatever format they, that they, they, they they wanted to experience. And so, you know, kinda the way I think of it is You know, kind of every I, I guess every medium for, for sharing has, has, its, has its own corresponding capture technology. You know, like kind of if you think of like, I don't know, like paper or like 2D sort of things, like you had photography for that and then we got these you know, screens that can, that can show like videos and multiple images in, in succession and and that really drove like video capture.
And now just on the horizon, we, or even today, right, we have VR and on the horizon, a bunch of like interesting AR and mixed reality technology coming out. And you know, people I think are really going to need some way of easily capturing stuff in 3D to, in order to fully engage with those platforms and that's kind of where Scaniverse fits in.
[00:11:43] Jared Heinly: Yeah, totally. I, I like that because it's like we, you know, we live in a 3D world, you know, and so having efficient tools to capture that world and share it with others. So, I mean, I mean maybe just a good little too, just to make that more specific. I mean, like, so what, what inspired you to build Scaniverse?
You know, how did you decide you wanted to tackle that space of how, how do I document my 3D world?
[00:12:03] Keith Ito: Right. So I, I mean, it was kind of an app that I had wanted to have for, for, for a long time. Like going back you know, I used to Do some photogrammetry you know, where you would, basically the workflow for that is, or at least back in the back in the day, it was, you would, you would go and you know, take out, take your, your, you know, your dslr, you'd go take a bunch of photos of something you know, maybe a hundred or or 200 photos.
You'd copy them to your, your desktop computer and then you'd let it kind of crunch overnight while it was, while it was processing them into like a 3D model. And, you know, sometimes it, it would work great and, and you would end up with like a, like a really great 3D model in the morning when you woke up.
But you know, sometimes, you know, you would make a mistake when you were capturing it or, you know, the object just wouldn't work very well for the capture. And so you'd, you know, the next morning you, you'd go and, and the geometry would be wrong. Or maybe it would just fail completely.
And so one thing I, I, I always wanted to have was something kind of made it a lot easier where, you know, you could capture and process directly on your device. You could get the model back in about a, about a minute or so. And so, you know, you would immediately know while, while you're still that, you know, something went wrong and do it again.
And, and just something where that was kind of more tolerant maybe of user error or guided you while you were capturing to know what you've captured, to know what you still needed to capture so that you could get a better scan and have more confidence that that your, your model would turn out.
So this was something that, you know, I had wanted to exist for, for, for a long time. And if, if it did exist I, I would've, you know, gone out to the app store or whatever and, and then like, like it's it's biggest power user. But it didn't at the time.
And so I kind of set out to, to, to build it. And I think one, one thing that, that came along that really was really quite, helpful or, or quite amazing really, is, is that when Apple announced that the iPad would have a LiDAR sensor. Right. I think that was in, gosh, was it? Yeah, I think, yeah.
I think it was, yeah. 20 2020 iPad Pro. That's right. Yeah. And and all of a sudden it was like, wow, you know, you can, you can have this technology that, that, you know, You know, it used to be like LiDARs used to be like this, you know, standalone rig and you'd have to, you know, carry it around and stuff and, and to, to think that they, you know, make are, are, are making this kind of standard equipment on the on, on the iPad Pro was just incredible.
And then, you know, later on that year, they brought it to the, to the iPhone and it was at that point , I realized that, that, you know, Really be a platform that you could do like really good 3D capture on. And it's, it's, you know, just this thing that sits in your pocket, which is, which is just it's, it's amazing how fast like technology develops and how, how all of these things that you know, you thought were either impossible or, or, or years away where we're, you know, Show up one day, right.
it's as a software person, right? It's really great because like you, you know, you like you know, like Apple and, and Google and all, all these other Samsung all it's almost like Christmas, right? When, when, when they do an a, a big announcement and you find out that, that, that there is all this stuff to play around with and to build software for.
[00:14:57] Jonathan Stephens: It's, it's been interesting watching like GPU technology as well, where, you know, someone asked me one time, should I get the RTX 2080 because the 3080s are out now, but I know that the 2080s are, you know, I can get 'em for a better deal. And like you don't understand, like, this technology is not just getting incrementally better, some of this technology.
It's like four times, five times better. And that's what the GPUs were doing. It was like if you went back a model, you weren't just half as good, you were like a quarter as good. And I'm seeing that even with the iPhones and the smartphones from Samsung, the the computational photography is able to do on it.
It's just incredible, you know? So you know, I always say like, just buy the best smartphone you can afford and that's gonna be amazing.
[00:15:43] Keith Ito: Yeah, it, it really is incredible. You know, people say that you know, a smartphone is like a super computer in, in your pocket, and it, it turns out it, it really is. I, I was actually looking this up the other day and like an iPhone 14 Can do something like two teraflops on the GPU. Right? And if you go back and if you look, if at the, like the, the top supercomputer lists, they, they have these lists that go back in, in time.
And if you go the iPhone 14 would've been the top super computer in the world in 1998. So if you can, like, like, I mean, and these things, you know, they, they cost hundreds of millions of dollars. You know, they, they took up a whole room and, and, and now to. You can carry around something with equivalent like compute power in your pocket is just absolutely insane and like 1998 isn't that long ago, right? I remember 1998
[00:16:29] Jared Heinly: even just more recent, it's just amazing. Like just the raw power, like single threaded performance. It's like, okay, I've got a desktop PC that I do my development on, you know, day to day on, you know, in there like, oh, if I've got some single threaded code, that desktop PC versus the iPhone. It is like head to head.
It's amazing. You know how fast, you know the computation is on the, you know, modern, smartphones.
[00:16:51] Keith Ito: Yeah. Yeah, definitely.
[00:16:53] Jonathan Stephens: and, and then just putting all those the, you know, it's not just like single thread processors and then put those, those performance processors and then they put, you know, just different ones geared for, Hey, I want my phone to stay, you know, charged for a longer period. you can optimize for that.
not say, oh yeah, well we'll get this really fast, but your phone will be dead in the next 30 minutes. You know, they've really done a good job and not, and you know, all the manufacturers are doing that now. So, yeah, it's so I, I guess then talking about when this app first came out it's, it's not like if you download it today, the app that you see today will, it looks very similar to the original app that came out in was it 2021 or 2020?
[00:17:35] Keith Ito: Yeah, the very end of 2020, like
[00:17:37] Jonathan Stephens: So that app only supported LiDAR, correct? It was just for LiDAR. Can we, so I, I think there's a lot of, I could go, we could go really deep into the lidar on an iPhone. There's still a lot of debate. Is it a lidar? I'm not worried about that. I think what people care about is like, not what, what it is, but what it can do.
And that's always say you know, people who are in the LIDAR community will say, well, this isn't true, LiDAR. I don't care. All I know is it gets me, gets me depth data. And that's really all you really are looking for. But can you like, tell me, can you speak about kind of like, what, what are, what are, what were the pros and cons of using that lidar?
Is there some things that, like a headache of using the iPhone LiDAR. Cause it, it, it is not the same as a terrestrial LiDAR. Can you speak about how's that been helpful, but also you wished you wish things were a little different on it, perhaps.
[00:18:28] Keith Ito: Yeah, so I, I mean, the biggest pro of it is it's, you know, this thing that's built into this device that you're carrying around all the time. And so it's always with you and, and you don't have to you know, think about it, you just, you just pull out your phone and start capturing and I think it's an amazing technical achievement that they managed to, to, to get this into a consumer device that that, that, that everyone can carry around with them.
I guess the, You know, on, on the, on the other side is that, you know, they had to make, I, I think a number of you know, kind of compromises in order to get it in terms of, you know, in order to get it to be a small size and lightweight and not draw too much power and, and also be, you know, inexpensive enough that you can make it a standard feature.
And so I think some of the drawbacks are that you know, it's fairly low low resolution. So the depth maps you get back from the the depth API are 256 by 192. So they're, they're very, they're tiny. They're like postage stamp size things, but it turns out that even those are.
Those are actually kind of generated by Apple using a, a neural network from about 500 individual depth readings on, on, on the frame. And so they I think they take the the color image and they take their kind of depth readings and then they, they pass a neural network over it that that will Fill in the, essentially fill in the gaps between the depth readings so that you get a nice depth map.
And while that works, you know, surprisingly well, like a lot of the times, sometimes Because it's a neural network, it hallucinates things. So you know, it like if, if you, if you see like, like an object, it might be unclear whether it's part of the foreground or part of the background, right?
And and it might get it wrong some of the time. So you do have to be, like, you, if you're, if you're using this sensor, you have to be a little bit careful to to be tolerant of that sort of error. It's, you know, it's, it's a very, it's a very different error from. You know, a typical sensor where there might be like noise in the sensor, but you can you can accumulate over time like multiple samples.
And that should you know, if, if your, if your noise is kind of distributed around the, the actual value you know, it'll average out over time. With the iPhones depth maps because it's hallucinating, it, it could very confidently hallucinate stuff on multiple frames and, and, and that will just end up there.
The other interesting thing is that I think You can also end up with some like systematic error in the LIDAR readings. So again, like if it's just noise, you can average that out and that goes away. But if it's, say consistently predicting that something, or consistently telling you that something is say, you know, 3% or 5% farther out than it is then that can create problems too, especially because if you're using like the poses, for example the camera positions from ARKit, like as you move around something those, those poses are all calculated based on like visual features in the scene. Which is kinda, it's kinda like a different system from the lidar. So you have these kind of two sources of data, you know, the, the visual features and the lidar and you're trying to combine the two and that can be.
They can be kind of difficult because they have completely different, like, error characteristics. So, and I, I see Jared nodding his head cause I, I know he's probably dealt with a lot of these problems as well.
[00:21:28] Jonathan Stephens: Yeah. Jared, do you wanna add to that?
[00:21:31] Jared Heinly: I was gonna, I mean, that's a great point. Your, your, your distinction there about the two different sources of errors saying like, yeah, the LIDAR is gonna have its own set of structured errors based on that neural net's output, you know, or the limitations of the sensor itself, you know, and then with AR kit, and then it's doing the pose estimation using those visual features.
Yeah. That's gonna be a different error. You know, that's gonna be maybe like long term drift, you know, or, you know, so, so when you go and try to generate, sort of a fused, you know, 3D understanding fused sort of 3D reconstruction of that scene. Yeah. You've got these different errors, you know, sort of local, maybe structured errors from lidar plus sort of the long-term error, you know, or different kinds of error from that, from that ARKit data that pose data.
Yeah, that's a good point.
[00:22:10] Keith Ito: Yeah. I've kind of gone deep on the on the kind of negatives of LiDAR, but it's actually an amazing sensor. Especially it, it kind of compliments what you can do with photogrammetry a lot. You know, photogrammetry won't work on you know, like flat textureless surfaces like a, like a wall, right? But the, the LiDAR you know, works, works great on that. And so in, in a lot of cases you can use it to fill in the gaps that are not working well with photogrammetry.
[00:22:34] Jonathan Stephens: Mm-hmm.
[00:22:34] Keith Ito: And then of course You're instantly getting back these depth readings.
You don't need to do any computation before using the depth. And so it's great for like, live visualizations and things like that.
[00:22:44] Jared Heinly: Yeah, that live feedback and you, you mentioned that earlier, you know, talking about just the benefits of having that live feedback and not having to, till you get back home, run some photogrammetry, your software overnight, and just crossing your fingers and hoping that it works. Having that live feedback when you're in the field, you know when you're, like, if you're on vacation, you, oh, you're at the, you're only gonna be at the museum.
You're only gonna be at the beach for a certain amount of time. You know, you want to capture that scene, you know, scene, capture that memory, you know, or if you're an inspector doing an inspection of a bridge or a, a building, it's like you may have just driven a few hours to get to that location, do the inspection.
So having that real time live feedback is super, super critical and really, really nice. And, Now that that's a, i I love how you, you know, you ran at that problem and, and are providing people that feedback and the quick processing there on device to get those, to get those results.
[00:23:28] Jonathan Stephens: Yeah. For the, for the people who may be watching this on YouTube as opposed to just listening, I'm sharing on my screen where you have this red covered area on an image. And that goes away as you capture an object. And I think there's a lot of instant gratification when you're scanning something, you just, you're like, you're seeing it come to life and you have that confidence.
Okay, well at least I saw it. I, you know, it's car's pretty easy to cover, but you know, you, you, I'm trying to do a, I tried to do a whole room and you're like, okay, did I get that corner over there? Or did, did you get behind or between things? That feedback's magic. I,
[00:24:02] Jared Heinly: it's amazing. It's amazing. I absolutely love that, that feedback there just Yeah, you, you see, as you start to scan, you see it come into, you know, full color, you know, where everything else is sort of just kind of, you know, hidden there in the background and red what I mean, as, as you were, you know, thinking about creating a 3D scanning app, what were, what were some of the hardest parts of designing that experience, designing that app to make it easy to use so that anyone can just pick it up, you know, and do a great, great 3D.
[00:24:31] Keith Ito: Yeah, I You, you know, I think. A lot of the challenge was around Yeah. You know, most people haven't 3D scanned anything before. And trying to communicate to, to people, you know, like how to scan and, and you know, what what sort of motions? I know you you guys did a, the whole video on like good motions for reconstruction but you know how to communicate that to people, not just in in words, but also in you know, through the UI and through the feedback that you're getting as, as you scan like a, like a big challenge.
Like you would you know, I think when you, when you, when you just hand like a phone to, to people and say, you know, scan something, you know, the, maybe the natural inclination is just to stand in one spot and, and, and do a lot of panning around. But then you know, if you, if you do that, you don't get a whole lot of, of baseline, right?
So it's hard to do like photogrammetry based on that. And so you know, you really wanna encourage people to to, to go around and you know, scan thing from all angles. And I think you know, I think having a, a UI where it's kind of fun to make parts of the, of the scene disappear or sorry appear out of the. Out of the striped area is, is one way of doing that. You know, I think you know, just kind of making it so that you don't have to think about what you're capturing as you're doing it or you know, what the right technique is. And, and just being able to, to, to have that come out of the out of the experience I think is really important.
And that's something that Yeah, I spent a lot of time on like, iterating on, on different things until one that I found one that seemed to work with people so,
[00:25:52] Jared Heinly: Yeah. Yeah. That's impressive.
[00:25:54] Jonathan Stephens: I'd say what, what really resonates with me and the struggles of having a good app experience to get people to, to. Do the best practices to model something to the, to their best ability is that Apple almost ruined it for us by making this such robust visual tracking system for the LiDAR that you can stand and can literally stand in the middle of the room and get a somewhat sane 3D reconstruction of a whole entire room.
and not really move besides whatever arc your, your hands taking with the phone. But that's the wrong thing to do if you want to get a really good quality scan. So, you know, how do you, how do you say, well, that worked, but that's not what I want you to do in the app. So you know, I even room plan, if anyone has heard room plan where it gets you basically the, the primitives of the room, you get like the wall shape, you get like a room layout. At first I was trying to walk around the room and it was like, no, no, no. You can just stay there where you are and just sweep your hand across the room and it catches all these visual anchors and just works . And I was like, oh, this is dangerous. Because again, we're teaching people to in one spot to make a 3D model and that's not gonna help you if you want to get that good quality.
So good on Apple for making it work, but also. It has its own set of unique challenges now for someone who, so you don't okay. So with that, and then talk about de you added detail mode. I'm looking at your blog. So like a, in 2022, like a year after you had a little after year, after you had launched this, you added detail mode, you added actual photogrammetry besides the LiDAR.
Was that a challenge just to also say, okay, now you gotta, you have to actually walk around an object? I will share on my screen again too, just like a shoe you had from detail mode. It's striking how much more detail you get, but of course that's only if you, you do something correct.
[00:27:51] Keith Ito: Yeah, definitely. Yeah. I think it is you know, a challenge to get people to Really walk around the the, the, the mono, like I, to a certain extent, you know, showing the stripes helps because you'll, you'll, you'll, you'll see like around the corner that, that there's still stripes over there and be able to, to come to the other side.
But, but yeah, it's, it's interesting that, you know, you do kind of want People to capture things a little bit differently for lidar versus photogrammetry. And you know, in, in, in the future, I think the, the capture you do for, for nerfs will be also a little bit different because there, you know, you get all the V dependent lighting effects and so you really need to Get even more views or, or views of, of, of the scene from different directions so that so that you can get all of that you know, all the different like specul highlights and, and things like that.
So I, I think it's an ongoing challenge to to, to like both like educate people on how to, how to actually capture, but also building out the UIs that encourage them to capture in kind of the, the right way or the way that will yield the best model.
[00:28:49] Jonathan Stephens: And when I think about NeRFs, one of the magic of a NeRF is you get all the context of everything behind the object that you captured. So you say, okay, well we have an object and it's also in this location. But when you're teaching someone to say, model an object with photogrammetry, you're usually telling them fill most of the screen up with that object.
Because everything behind it's not gonna help you unless it's, you know, you're trying to use those as visual anchors. But now with the NeRF, I'm like, tell people stand further back. I want to see the scene ,you know, and it's just, you know, yeah, you're right. Each, each little mode is different. So lidar from an iPhone is different from photogrammetry.
and Nerf. So , how do you, how do you have a UI that tells everyone what to do in the right time? That's where I think augmented reality, those ar overlays will be very helpful as well. That apple's added for like room plan. It'll say, move your camera down, find a corner. You know, adding some guided visuals in the app in the future could be very helpful.
[00:29:44] Keith Ito: Yeah, definitely. I think yeah, more, more active guiding of people to, to areas that they haven't haven't scanned or that, you know, maybe they don't have you know, enough viewpoints of a certain area. Just being able to Call that out to the user. You know, being able to, to en encourage them to, to, to go to that spot because you know, you'll get a better reconstruction if they, if, if they go and do that.
I, I think these, these are all like really kind of interesting and exciting, like open challenges. Like how do you, how do you do that without, you know, without being like annoying where it's like, okay, now where you're like, kind of commanding these, or you know, now do this, now do that now, now do now do this. Right.
[00:30:19] Jonathan Stephens: Yeah, well you took the Magic Leap template and I think now the, the, like the Quest Pro that I have does it where they almost gamify trying to try, you know, when they map out your room and it's kinda like hide and seek, go find the next hidden anchor in the room. And by doing that, you're moving to the areas that haven't been mapped by the.
By the unit. And so you know, it's kind of fun and then it's mapped like, oh, I could, I completed something. So how can you ga , how can you gamify this? Give them, give them something engaging?
[00:30:49] Jared Heinly: Yeah. And it's such a, and it's such, it is a value, like you mentioned there. Yeah, just the, you know, guiding someone to go through a space and then adapting that for the particular technology or for the particular, you know, for the particular application saying, oh, I just need to map my room. You know, that might be a different, You know, motion pattern and then, oh, I need to map the objects on the tables in the room, you know?
And then so just having, you know, those different experiences or like even said there like, yeah, with the Nerf versus the Lidar versus the photogrammetry, each of those are gonna have their own sort of subtle differences. It's nice understanding. that common technology in the background, the underpinnings of, well, how does photogrammetry work?
How does you know nerfs? Oh, nerfs need poses from photogrammetry. Oh, okay. You know, and so understanding that technology, you know influences those solutions. But, but yeah, like, you know, like you were saying, Keith, with, with the lidar and just adding in those visualizations to help guide the user and turn it almost into a game thing.
Oh yeah. I see a little bit of red over. Let me go over there and, and get rid of all the red in the scene, you know, until I've, I've captured that object that I've, I've want. Yeah, it's a fun experience. So continue. I mean, maybe let's continue in the line of thought. Okay. You mentioned detail I've heard, you know, you're now, now Scaniverse supports non LIDAR devices as well, doing some pure to know, pure image based photogrammetry.
[00:32:00] Keith Ito: That's right.
[00:32:00] Jared Heinly: That's awesome. So how, how did you how did you achieve that? What, what were some of the limitations or challenges of running, you know, an image only photogrammetry pipeline on a, on a mobile device?
[00:32:11] Keith Ito: I think the, you know, the biggest challenge was just taking a photogrammetry pipeline, you know, can be very complicated. You know, there's a set of stages where you you know, you do feature detection, so you're kind of finding landmarks in the, in the image.
You match them, you figure out where all the, all the poses. Are where all the cameras are. You know, you do like. bundle adjustment to refine, you know, there's, there's a whole, you do depth mapping and then, and then the meshing. And so there's a whole large number of stages that you kind of have to squeeze down a little bit to fit on a mobile device.
And you know, with, with modern mobile device like we were talking about earlier, it, it really is almost like a supercomputer in, in your pocket. So that makes it easier. But you know, there still are limitations in terms of like how much how much RAM is available and how much and just how much work you can do in, in a, in a short period of time.
You know, I think a another challenge is that you know, whereas it, it, it might have been acceptable to run a photogrammetry pipeline overnight, like on your desktop computer. You know, if you told someone just, just leave your. Phone here, you know, for a few hours and, and you know, it'll be done.
Most people wouldn't be too, too happy I mean if it, even if their battery even lasted that long. So just figuring out like which trade offs to make in order to Get the pipeline to run and say about a minute or so or, or less was, was kind of like one of the biggest challenges and like figuring out like what is, what is actually important for, for quality and what you know, what do people care about in terms of the model.
And, I think we were talking earlier about optimization, right? And, and going just really deep into optimizing things and. You know, getting getting speedups that way. Like, it's, it's really interesting that, you know, as a programmer, you often think about, you know, like The algorithmic changes are, are, you know, like you know, big o notation right?
To where you're making these hu like very large changes to, to make something faster. But I, I, I found that like, oftentimes it's, it's these little you know, tweaks or, or just, you know, figuring out how to, how to fit. Something into the cache or figuring out how to move some computation from the CPU to the GPU and, and, and, and parallelize it in that way that can often make like huge differences in terms of performance.
And so it's, yeah, it's, it's a, it's a really. Fun process, I think. And it's also something that you, that you're never done with, right? So you know, if, if we could make certain steps in the pipeline faster, we could, you know, maybe increase the depth resolution or increase the number of you know, the, the, the amount of detail in the mesh.
And so you can kind of go as deep. Into optimization as as you want or as as time permits. Right?
[00:34:35] Jared Heinly: Mm-hmm.
[00:34:36] Keith Ito: think, I think if we, if we, if I had infinite time, I, I, I might just like, you know, sit in, sit in a room and, and, and just, just optimize until you know, at the, the end of the world or something. But but you know, there's obviously always other stuff to do, so.
[00:34:49] Jared Heinly: Yeah. Yeah. And, and that's, and that's, I, I, I totally agree with you said there. I mean, it's like, yes, A lot of times people may think of you said, oh, I, I need to change the algorithm. But it's like you said, well, okay, yes. And yeah, maybe you can, you can change the algorithm, but then you hint it on, it's like, well, what changes to the algorithms can I make?
That aren't, you know, are still gonna achieve the detail and the quality that I need. You know, if, if at some point if I say, oh, I can just reduce the number of feature matches between my images, at some point you're gonna have too few feature matches and you're gonna lose connectivity, you know, between those, between those images.
And so there, there's certain trade offs there to the accuracy and quality. But then as you said as well, there's this, you know, also just, you know, coding and engineering things that Oh yeah, make better use of the cash, make better use of the hardware and the parallelism. You know, tho those tweaks too.
That is, that is a lot of fun.
[00:35:33] Keith Ito: Right, and of course there are, there are, you know, algorithm changes you, you can make as well. I don't, I don't wanna make it sound like, like
[00:35:39] Jared Heinly: no, but it's, it's, I'd say, but I think both. Both are, yeah, both are very, very equal and valid. You know, there's a lot of times in use of my own code, you know, I'm optimizing the reconstructions performance. Yeah. There, there's algorithmic changes where I'm like, oh yeah, instead of. You know, and log in, I've, I'm using a, you know, a linear algorithm instead, you know, or I'm gonna, you know, sls a slightly different method that, you know, is, is a bit faster, you know, in terms of its computational complexity.
But then there's also just a whole host too of, you know, engineering you know, tweaks that, like you said, you can just spend, spend quite a long time finding evermore ways to, to make that code more efficient. Yeah.
[00:36:14] Jonathan Stephens: And then I, I always like to take the, the viewpoint of the non-scientist, non-engineer and say, how, how can we train a user to, to be more efficient in capturing? Because if you take a twice as long to capture something, you'll have twice as much footage to to work with. And that might be not ideal. So, you know, there's that.
And how do you, I guess, engineering, augmented overlay. Guide someone through a scene efficiently or around an object, make sure they, they capture just enough, but not too much. So
[00:36:44] Keith Ito: Yeah, definitely. Yeah. I think, yeah, it's, it's interesting how like the, the, the UI or UX kind of plays in with everything else, right? You know, especially as an engineer, you, you often think about the algorithms and you know, and, and, and how you're gonna write the code. But, but I mean, equally important to achieving performances.
Like how do you guide the user to, to do, to, to kind of capture things in an optimal way so that you can, you can do it efficiently. So yeah, definitely.
[00:37:07] Jonathan Stephens: Yeah. Well you know, I don't keep you too long at I did wanted to ask you a few more questions. So what are your thoughts on the, the, the, the where the future of this d do you of 3D scanning in general of your app? I mean, you don't need to tell us what you work on next, You know, where do you see this going?
Because to be honest, iPhones and lidar and 3D capture is actually pretty new on a mobile device. I stock bought reports that we run, we've been doing that for a long time, but it's always been sent to the cloud. This is like new that we actually have on device. You don't have to send anything up.
Where do you see that future? You, you, is it, is it new hardware you guys can be putting on smart glasses? I mean, where do you see this going as an industry as a whole? Not necessarily even your app.
[00:37:51] Keith Ito: Yeah, I, I think it's a, like a super exciting time because like you said, like being able to you know, capture and reconstruct on a, on a mobile device is like very new. In terms of, Like the fidelity of, of, of capturing stuff that, that's also moving very quickly. You know NeRFs came on the scene relatively recently and are, are just advancing really rapidly.
You know, there are other just ways of improving you know, of getting just improving the, the depth mapping or constant advances in, in being being able to, to generate depth maps better, of being able to. just create more accurate and and, and, and visually appealing scenes. And so I think and I think we're actually just getting started on, on, on that front, right?
If you look at like photos and videos, those have been around for, for a long time. And and you know, they're still making advances there in terms of like computational photography and, and, and like stabilization and, and things like that. You know, on the 3D side it's you know, very new.
And so I think there's a long road ahead of just improving fidelity and improving quality, making it easier to, to capture stuff. I, you know, I think like we were saying earlier, right, the, the educating the user or getting the user, finding some sort of user experience that allows the user to generate a good capture quickly every time is, is, is, is, is really important.
And, and so there's a lot of. In that area. And then finally, you know, I think just being able to. It's great to be able to capture something, but really it's what do you do with that capture, right? Do you, how, how do you share it with people? How do you you know, kind of take what you've created or what other people have created and kind of mix those up and create new content and you know, share experiences and, and, and kind of create this you know, meaningful stuff in, in, in, in your life in 3d, and. You know, I think for, for a long time, 3D content creation has been really hard. You, you know, you needed to have certain skills in terms of like modeling stuff or, or being able to actually create stuff that, that, that, that look good. And you know, I think. Again, it's a really exciting time, not just for 3D capture which is, which is ex you know, getting better every day, but also in terms of like generative AI to, to generate 3D models or to, to be able to, you know, construct scenes for you out of, out of like out, out of just text for example.
It's you know, I. We're on the verge of this really, like, exciting time in 3d. And, you know in addition, right, all this display technology is coming out too. You know, we have VR headsets now that are actually you know, pretty good and , in terms of like ar and mixed reality you know, I, I know Apple is coming out at some point with their mixed reality headset and. So I, I, I think all these things play together and are, are kind of working together to create this really exciting 3D ecosystem that, you know, maybe a few years down the road will be be, will all be able to experience. So,
[00:40:38] Jared Heinly: Yeah, no, it is, it is. It's an exciting time. It's exciting time, like you said. Yeah. The convergence of hardware and software and machine learning and vision and cameras and sensors and displays. You know, there's, there's so much, you know cutting edge things happening in all those fields. And then the convergence of that is, Yeah, it's, it is awesome.
[00:40:56] Keith Ito: Yeah, definitely.
[00:40:57] Jonathan Stephens: So one thing we, we didn't talk about, and I don't know if you want to, it's up to you if you wanna talk about it, but I know a year, about a year and a half ago you were, you joined Niantic and all of a sudden this app became free for pro users. That that was also, that is another amazing thing, is the quality of the app you've built and it's free.
It's. It's not , it's not like these photogrammetry software that used to cost thousands of dollars and, you know, so is, can you talk about that that alliance or is, is, am I gonna be able to scan objects and then have it in a niantic game or experience? Is that a future, is that something you can even talk about?
[00:41:33] Keith Ito: Yeah. Yeah, I can definitely talk a little bit about that. So for, for, for, for people who, who might not know so Niantic is the is kind of a, a maker of AR games. Probably best known for Pokemon Go. But What people might not know is that Niantic is also building like a very detailed map of the world.
So the thing about like, like their games, like Pokemon Go, is that you know, you're not playing just in this virtual world. You're playing out in the real world, right? It encourages you to go to places in your community or to new places and to to, to, to, to, to, to walk around and, and explore the real world, and to play at those, those those places.
And so part. You know, being able to bring gameplay everywhere is like building like a really detailed 3D map of the world. Like I was talking about Google Maps earlier and you might have, you know, streets and, and businesses there, but we're talking here like, you know, down to the centimeter level so that you can you know, if you're at the park or something, you know that this is a bench right here and this is, this is a, this is, this is a fountain or whatever.
And. Yeah, a a while back you know, kind of early on in Santa V's life, I was focusing mostly on, you know, how do you capture stuff in the real world and, and create this digital copy of it. And they were like, You know, there's also this other aspect of it. You can kind of think of it as like maybe a two-way street, right?
Where on one end you want to take stuff that's in the real world and create a virtual copy of it, but you also wanna take this digital stuff that you have, whether it's content you've scanned, or content that, that you've created you know, in 3D modeling software and bring it out into the real world.
And have experiences with other people that center around that content. And, you know, it turns out that scanning is. Kind of important to both sides of that equation, right? So if you're capturing stuff you know, you're capturing it by scanning, but if you're trying to get content into the real world you know, your device needs to know what that world looks like.
And so you need to be able to have this like centimeter level Virtual map or map of the map of the world so that you know that when you place the content here, it's gonna stay there. And with, if someone else comes and, and wants to look at it, it'll be in that location. And so one of the things that Niantic is building that is super exciting is this lightship VPS, which is a visual positioning system.
So it's taking. You know, the, basically the entire world and mapping it down from, from scans at kind of like a centimeter level of accuracy. And that's what really enables you know, a lot of this content placement it's going to be, or it is available to developers now. So that's that. You know, if you are building a your own game or your own 3D experience or AR experience, you can, you can kind of leverage all this work that Niantic has done to be able to Take this map and, and just place your content there and to, to build like immersive ar experiences that that, you know that your users will enjoy and that that'll be really fun and exciting and that are, you know, actually tied to real places.
All right. So you know, I think we, we we, with, with VR and, and ar it can be It can be very easy to get into this state where, you know, you're kind of just immersed in your own world and not paying attention to, to the world outside of you. And I think what makes Nxs approach to this exciting and different is that it really seamlessly tries to, to, to blend the real world and, and, and the digital world together into this one space where like everybody is.
[00:44:51] Jonathan Stephens: Mm-hmm.
[00:44:51] Keith Ito: And, and so, yeah, so get, getting back to how Scaniverse into this picture. You know, we're providing the scanning piece to that is it's going into the, the Lightship platform. So, so it's kind of Niantic. Developer platform for, for for this. So if, if you're a developer you know, you'll, you can scan or you can scan areas to, to map them for VPS and kind of use the same engine as Scaniverse to do that.
And that's in, in beta right now. It should be launching later on pretty soon actually. And then in addition you know, we're keeping we're pushing forward with Scaniverse and, and developing new features because it is Just a really great way to you know, to figure out like, like we were saying earlier, like what the right UI for scanning is and to, to get people to you know, how do people, to figure out like how people want to share content and how people want to use this scanning technology.
And so yeah, we're kind of going forward on both fronts and it's it's a really exciting time to to be doing all of this.
[00:45:47] Jonathan Stephens: All right. Yeah, no I, I think people should look into that lightship VPS system as well. It's, I think that is gonna have a lot of implications for the 3D world moving forward. I, I, I personally feel and some of us at our company feel that the augmented reality world, that mix of real and virtual is going to be more our future than this virtual reality world, which has its place as well.
But I think where we're gonna see most advances, I would say world changing things are gonna be that, that mix of what you're, of, what you're scanning in the real world and placing it in the real world, interacting with it in the real world because it's, it's a more social thing. I can see I, I'm, sometimes I don't get it and then my kids are like, check this out, check out this AR thing.
Like, well, they're really into it, so I know that's what they're growing up with and can expect from a 3D world. And. Yeah, and I know they are leading the way at Niantic with that. And Pokemon Go is a great example of that, you know, 3D mixed with real world and maps and everything. I love geospatial 3d. It's all there.
[00:47:00] Keith Ito: Right. Yeah, it's, it's, it's, it's really interesting like how all these different, like technologies are, are coming together to like create this experience that like, hopefully, like we'll all be able to share it together.
[00:47:10] Jonathan Stephens: And, and they're, you know, they're capturing all our imaginations with games. And that's, that's again, also, I think entertainment's gonna be the key. I can get, we can get applications running in industrial plants and manufacturing and, you know, industry 4.0 is the term, but that's not gonna capture everyone's hearts and imaginations.
It's, it's gonna be through games and entertainment and sports or whatever it is. That's what Niantic I feel like has done so well is, Hey, let's make this fun for everyone and show them what's possible, and then eventually it'll bridge into more of your life.
[00:47:46] Keith Ito: Yeah, definitely. Like find, finding the fun and getting people engaged is like that important first step because I, I think a lot of times people might not understand technologies until they've used it themselves and they, they've used it in an experience that that is really meaningful to them. And so I think in a lot of ways you know, games and, and, and just social aspects of of things are, are like a great way to.
Kind of get things in people's minds, so, so that you know, they can expand later to these other other contexts.
[00:48:16] Jonathan Stephens: Yeah. Well, Keith, it's been it's been great having you on this show. I just wanna also ask if people want to keep up with what you're working on is, would, will you suggest people follow you personally on, on a Twitter or some social platform or how, how can people keep track of what you guys are up to or is is scanner versus website where, where to go?
[00:48:37] Keith Ito: you can visit the Scaniverse website at scaniverse.com or follow us on Twitter. We are just at Scaniverse,
[00:48:43] Jonathan Stephens: All right. ,that's easy, and I'll make sure we link those in the show notes for people to find. And of course you can find the app easily by just searching Scaniverse in the app store. And like you said, it's free now, right? So yeah, have fun with it. You don't even need, you don't even need a lidar, which is, which is exciting.
[00:49:00] Keith Ito: That's right,
[00:49:01] Jonathan Stephens: So, so thanks a lot, Keith. Thanks. Thanks for joining us as well. Jared. I know your, your insights are valuable because you've also spent a lot of time working in the world of, Mobile based 3D reconstruction. And it's fun to have you guys as pioneers on here paving the way with Thank you for coming.
[00:49:20] Keith Ito: yeah, and thanks for having me on.
[00:49:22] Jared Heinly: Yeah, thanks Keith.
[00:49:24] Jonathan Stephens: That's a wrap for this episode of Computer Vision. We hope you enjoyed it and learned something about 3D reconstruction on smartphones. Don't forget to subscribe to our podcast on all major platforms, including Apple Podcasts, Spotify, and Google Podcasts. You can also catch all of our episodes on our every point YouTube channel.
I want to give a huge thanks to our co-host Jared Heley for providing valuable insights on our show. And a special thanks to Keith Ito for sharing his experience building Scan averse. I'll be sure to put a link to the app in the show notes. Thanks for listening, and I'll join you next time on Computer Vision Decode.