Computer Vision Decoded

In this episode of Computer Vision Decoded, hosts Jonathan Stephens and Jared Heinly explore the various types of cameras used in computer vision and 3D reconstruction. They discuss the strengths and weaknesses of smartphone cameras, DSLR and mirrorless cameras, action cameras, drones, and specialized cameras like 360, thermal, and event cameras. The conversation emphasizes the importance of understanding camera specifications, metadata, and the impact of different lenses on image quality. The hosts also provide practical advice for beginners in 3D reconstruction, encouraging them to start with the cameras they already own.

Takeaways

Smartphones are versatile and user-friendly for photography.
RAW images preserve more data than JPEGs, aiding in post-processing.
Mirrorless and DSLR cameras offer better low-light performance and lens flexibility.
Drones provide unique perspectives and programmable flight paths for capturing images.
360 cameras allow for quick scene capture but may require additional processing for 3D reconstruction.
Event cameras capture rapid changes in intensity, useful for robotics applications.
Thermal and multispectral cameras are specialized for specific applications, not typically used for 3D reconstruction.
Understanding camera metadata is crucial for effective image processing.
Choosing the right camera depends on the specific needs of the project.
Starting with a smartphone is a low barrier to entry for beginners in 3D reconstruction.

This episode is brought to you by EveryPoint. Learn more about how EveryPoint is building an infinitely scalable data collection and processing platform for the next generation of spatial computing applications and services: https://www.everypoint.io

Creators and Guests

Host

Jared Heinly

Chief Scientist at @EveryPointIO | 3D computer vision researcher (PhD) and engineer

Host

Jonathan Stephens

Chief Evangelist at @EveryPointIO | Neural Radiance Fields (NeRF) | Industry 4.0

What is Computer Vision Decoded?

A tidal wave of computer vision innovation is quickly having an impact on everyone's lives, but not everyone has the time to sit down and read through a bunch of news articles and learn what it means for them. In Computer Vision Decoded, we sit down with Jared Heinly, the Chief Scientist at EveryPoint, to discuss topics in today’s quickly evolving world of computer vision and decode what they mean for you. If you want to be sure you understand everything happening in the world of computer vision, don't miss an episode!

Jonathan Stephens (00:00)
Welcome to another episode of Computer Vision Decoded where we dive into the complex and quickly evolving world of computer vision.

And today I have, of course, my co-host, Jared Heinle, a PhD in computer science and knows lots about 3D reconstruction. But today we're going to dive into something that crosses over that we all need to make something into 3D. That's a camera. So today's the camera episode. And I'm really excited because while Jared has a lot of technical, technical information about how like distortion and cameras work and how they work together.

I spent a lot of time or have spent a lot of time in my career in the field using different cameras that he's built algorithms to reconstruct three environments off of. So we kind of have different backgrounds, different perspectives. And at the end of this, the goal is we will go through different types of cameras, starting with a kind of a consumer based camera up through a more professional and then maybe even think specialized cameras. And we won't spend as much time on those because less people use them.

But to give you an appreciation of what's the difference and maybe which one's the best for you, I can tell you there's no right camera for anyone. Would you say that's true, Jared? Yeah, so, well anyways, welcome Jared. Thanks for joining me on another episode. I'm really excited about this one because we keep talking about things I think people can take home and use and try to implement in their work and we kind of gloss over the camera and it all kind of starts there. And I think a lot of choices that you use

Jared Heinly (01:11)
That's true. That's true.

Yeah.

Jonathan Stephens (01:33)
in reconstructing a scene from 3D, be it a mesh with textures, be it point clouds or SLAM, even the camera can really dictate what you can get away with and how easy it is to process the data.

Jared Heinly (01:49)
Yeah, yeah,

totally, totally. is a key piece in the process, the thing that's actually capturing that imagery.

Jonathan Stephens (01:57)
So, all right, let's jump into this.

So the format I'm gonna do is I'm gonna start out with a type of camera and then we're gonna talk about what's the strengths, what are some of the drawbacks of using that camera and then what would then be that camera useful for and it can be kind of limiting, say this camera's only good for certain things or things like that. I would say even if you only had a smartphone which we'll start out with,

pretty much do most of the things we're gonna bring up in this episode. Just there'll be some drawbacks to using it versus some other cameras. So Jared, let's start with the smartphone. I just wanna say like, what to you off the top of your mind, what is the best strengths of using a smartphone based camera?

Jared Heinly (02:41)
man, well, it's always there with you. mean, it's so, form factor's great, you can put it in your pocket, it's always there right with you. And because it's a consumer device, camera manufacturers, smartphone manufacturers spend a lot of time making those photos look nice. So you want a good looking photo, you don't have to fiddle around with lot of settings to get a good looking photo.

Jonathan Stephens (03:01)
Mm-hmm.

So true. At this point, it's less about what's the right button and more about composition on a smartphone, and it will figure it out for you. We don't even know, I'll take a picture of my dog and it'll recognize it's an animal and it will put it in a portrait mode for animals to better accentuate like the hair on it. I didn't tell what to do. Let's use the machine learning in the background.

Jared Heinly (03:32)
It's so much, there's so much, I think we might have touched on this in previous episodes, but talking about computational photography, like you said, there's so much machine learning happening there, image processing, you take photos of your dog, I take photos of the food I make, it's like, wow, that photo looks better than the food sitting there in front of me, you know, it's accentuated and high added, you know, that color or some of the contrast and, you know, brings out some of the detail and things and really makes it stand out. ⁓

Jonathan Stephens (03:38)
Mm-hmm.

Mm-hmm.

Yeah, I'd say some other pros to using this camera or a camera on, let's say I have an iPhone, so I have three different cameras. So that's the great thing is I got a wide, I have a telephoto and I have, call it normal. I think it's still wide. There's a wide and ultra wide and telephoto lens and they're all, they're all prime lenses. So when you're zooming on these, you're not actually physically zoom, you're not like changing the optics and changing the focal length of the camera. Is that correct? You're more.

Jared Heinly (04:13)
That's a wide, wide, ultra wide idea.

Jonathan Stephens (04:29)
Switching which camera is prod is being used. I know they're doing some other tricks like that. There's like a 2x Which is using the quad? sensor

Jared Heinly (04:29)
Correct.

Yeah, it's still like

a cropped portion of the sensor. yeah, when you're zooming on smartphones, it's not actually moving the lens elements. It's just changing which part of the image is.

Jonathan Stephens (04:49)
Okay,

so then this is going to cross over to the other cameras too. What is the importance of using a prime lens, which these are?

Jared Heinly (05:00)
a nice benefit of a prime lens for purposes of computer vision, because that's what we're talking about here. As I will say, it's nice because it gives you more consistent focal length. And so if all of your photos are taken from the same focal length, ⁓ there's some benefit to the reconstruction settings and reconstruction process. When you go and try to solve for all of the poses, the position and orientation of those images around your 3D scene,

Jonathan Stephens (05:11)
Mm-hmm.

Jared Heinly (05:29)
If all of your images have the same focal length, then ⁓ you can add that as a constraint in the Bundle Adjuster, in the thing that's actually refining the positions and orientations of the images as well as the points in the scene. And so having a consistent focal length or a consistent camera that's taking all those images ⁓ is quite nice.

Jonathan Stephens (05:42)
Okay.

And so a smartphone, I think you have maybe have a Samsung, I have an iPhone. When you take a photo, does it save that information about which lens and all those different called intrinsics, right? Of each image, are those saved in each photo you take?

Jared Heinly (06:08)
It is to varying degrees. So like when I take a photo or you take a photo, ⁓ there is EXIF data, ⁓ E-X-I-F, which is the standard metadata. It's a key value pairs typically, but standard metadata associated with your photo. And so we'll say, this was taken with a focal length of three millimeters, but in a 35 millimeter format, that was...

a 20 millimeter lens or something. So it will tell you information about the focal length that was used. It'll tell you the shutter speed, the ISO, so like the sensitivity of the sensor that was used. And it gives you some other standard attributes telling you about what were some of the basic settings that went into capturing that photo.

Jonathan Stephens (06:44)
Mm-hmm.

Okay, so it does give you some, I think as we, I we'll bring up some of the enterprise kind of drones, some things they have in there that's beyond that you get, but as for an iPhone, you get some basics. So I do know though, when you take a video and let's say you zoom in with a video, I didn't even know if the video's in general, they give you some, they tell you shutter speed, things like that, but that might even be variable. Things get more complicated.

Jared Heinly (07:04)
Yep. Yep.

That's, and that's,

yeah, yeah. Because they're, again, my understanding is some of it's just limitations of file format. Like the JPEG, JPG file format, or Apple uses ⁓ HEIC, high efficiency image codec for some of their photos. Built into that format is this notion of metadata where you can associate all of, the make, the model, the focal length, all these things we just talked about that went into capturing the photo. For video, ⁓ there's not,

Jonathan Stephens (07:34)
Mm-hmm.

Jared Heinly (07:50)
is my understanding is there's not quite that same standard to say that, here's how you associate metadata with your video. ⁓ I think when you're in the Apple ecosystem, Apple does some tricks and says, yeah, you've recorded a video. We're going to store metadata along with that video, either in a separate file or in the database, so that when you're viewing that Apple video on another Apple device, you can get some metadata. ⁓ But then when I take a video with my Android Samsung phone, share it with you.

Jonathan Stephens (08:13)
Mm-hmm.

Jared Heinly (08:19)
It just looks like a video. not getting all of that. That metadata is not necessarily transmitted or contained within that video file. So yeah, there is some limitations and trade-offs there.

Jonathan Stephens (08:23)
Mm-hmm.

Yeah, and I know you have to be careful when there's all these special modes now they put in. So that's, I guess, one weakness is there is some finicky little thing. So like if I take a photo with an iPhone, it does a live photo or does just a photo or a raw photo. So there's like different options and the live one is like taking a little video and then probably doing compositing, trying to pick the best image out of that. So if I go to download those on my phone,

on my computer and I bring up file explorer, it's gonna show me all these little video snippets and not just the photo. So you have to be careful with that. And then the raw, I know there's raw, which is from an Apple phone is great. You're getting like the full sensor. Color might not be what you're used to because they use like an Apple Pro raw. So if you're trying to get like for, if you are a professional and for photogrammetry, raw matters to you because you.

You're trying to get these textures and everything to look good. You're trying to expose information in the darks and the lights that might be lost. so can you tell us like what that is? What is a raw photo? Why? How that helps you get that information?

Jared Heinly (09:33)
Yeah, because that raw photo typically has the raw information. So when you capture a photo, a lot of times that sensor may be recording a greater amount of information than can be stored in a JPEG file. So like, for example, typically in JPEG, there's all sorts of compression that's going on.

But like one standard technique is to say that, I only have eight bits per pixel, eight bits. So a value between zero and 255. And so, what is the brightness? How much red is in this pixel? Zero to 255, that's all that you get. And so there's only 256 unique values for the brightness of the red or the green or the blue. Now it gets even more complex because JPEG separates the luma from the chroma. So like the brightness from the color and each of those are compressed separately and so you can lose more information.

Jonathan Stephens (10:08)
Mm-hmm.

Jared Heinly (10:31)
So to go to a file format like JPEG or even HEIC, you are losing information. And so RAW tries to preserve that. And so it's going to give you more than eight bits per pixel. You might have 10 bits per pixel. So that would be like 1,024 unique different values for the brightness of the red, the green, and the blue. RAW a lot of times can also give you...

more information about how the photo was captured. I'm not, haven't dug into Apple's, know, raw file format, but like each camera manufacturer may have their own format. You know, I know there is some standard formats like DNG. I think it's Adobe DNG. When I, I haven't used it in long time, but I used to have a Canon DSLR. And so it would save its own Canon raw file format. It like a CR2 or something, you know, but like each, each camera manufacturer has their own potential.

Jonathan Stephens (11:11)
Mm-hmm.

Right. yes.

Jared Heinly (11:27)
raw file format that's going to contain as much information from the sensor as maybe as well as the metadata that the camera was aware of when it captured that so that you can take all of that data and figure out afterward how do I best want to process that information to go after what I'm trying to process or trying to accomplish.

Jonathan Stephens (11:45)
Mm

hmm. I think they're an interesting because if you are in a high dynamic range scene, especially I think like outdoors or maybe I don't know. Someone recently posted ⁓ their post processing of raw imagery inside a cathedral where you have lots of shadows and lots of bright light coming through these big stained glass windows and up in the giant flying buttress rafters, they're able to pull this detail out that they didn't get in a JPEG, which was really cool because then they were able to get a much much more

Jared Heinly (12:02)
Mmm.

Jonathan Stephens (12:15)
detailed and accurate photogrammetry output. However, it's like, knew you have to be careful with that because not all software can take every single raw type of image. You need to know what you're doing. Usually you're doing something in Photoshop and just make sure if you do some adjustments, you got to adjust every photo to be able to match or else you're to have these mismatches. If you're just trying to get probably raw geometry, like a point cloud, it might not matter. But if you're trying to get textures or in a Gaussian splats, if you have

Jared Heinly (12:23)
Yeah.

Jonathan Stephens (12:42)
If an area goes from light to dark, you get what's called these kind of like floating as operations because it doesn't know exactly what's supposed to be there or things kind of like pop into different colors and you're like, Oh, what's going on there? Well, you know, for one view, it's supposed to be brown for the other view is supposed to be light brown. So it doesn't, it doesn't know what it's supposed to be. Um, you run into those kinds of issues. I do know someone who told me once it's like, like, do you do a global adjustment when you're doing things to raw? And they said, no, what I try to do is find like a bunch of.

Jared Heinly (13:00)
Mm-hmm.

Jonathan Stephens (13:10)
picture like groups of pictures will taken from a similar area and then you will do a global adjustment to all those. And then the next one you might have where it's a little darker, but of some of the same area. So it's like, OK, I need to adjust those to match the other batch. So you can do batches, but you don't necessarily need to adjust every image all at once because sometimes a photo was taken in a darker part of a room or something like that where it's done some compensation in the camera itself. So it.

Jared Heinly (13:34)
Interesting.

Jonathan Stephens (13:38)
At that point, you probably know what you're doing with the raw photo.

Jared Heinly (13:38)
That's it. Yeah, yeah. I didn't

even think about that. Because I imagine that then opens the door now, because it's moving beyond computer vision. But like the artistic side of it, it's like if you were trying to document the interior of a cathedral where one side of the cathedral is much darker than the other, it's like, what is your desired goal? In your final 3D model, do you want to represent that and see that? Wow, when I look at this model, this mesh, this Gaussian splat, whatever it may be,

Jonathan Stephens (13:56)
Mm-hmm.

Jared Heinly (14:05)
this half of the cathedral is much darker than the other, or are you trying to achieve a consistently illuminated result ⁓ where it just all looks equally illuminated? ⁓

Jonathan Stephens (14:13)
Mm-hmm.

Right. I think if I was preserving

it for like architectural preservation, I'd want it to be globally illuminated. And then later, if you were doing it as an artistic output, you could, there's all these illumination options within software. can add lighting and things like that. But yeah, you're right. It depends what you're trying to do. Are you trying to reconstruct it for an art piece? You probably want it as is or, ⁓ you know, save options. So I'd say if you're on a smartphone and you're trying to run around with raw photography, you're probably

Jared Heinly (14:30)
Yeah, yeah, yeah.

Jonathan Stephens (14:45)
using the wrong camera. You should probably be using that. You should probably be using more of a mirrorless or we'll get to that. That'll be next or DSLR. I don't know if those are popular anymore. I think Sony kind of replace those for the most part for every camera manufacturer. But OK, so so the like the weaknesses is we're kind of talking about ⁓ you don't have the best control. There are some third party apps that you can have on these that let you control it.

a little bit more and there's some control within each you know native software for cameras but you're not controlling a minute you know not manipulating like a focus ring or anything like you got with a nicer camera ⁓ not always the best in low light and why is that well can you explain why that wouldn't work so well

Jared Heinly (15:30)
It's just to physics and being able to capture photons, capturing light. And it comes down to, because then the smartphone, the physical size of the camera sensor, so the thing sitting behind that lens, in a smartphone, that sensor is pretty small. It's millimeters. in order to, in low light conditions, there's not a lot of light. There's not a lot of photons bouncing around. And so in order to...

Jonathan Stephens (15:33)
Mm-hmm.

Mm-hmm.

Jared Heinly (15:57)
Typically, in order to do well in low light, you wanna make your sensor as big as possible so that you can capture all of those photons and still make sense of what the scene looks like. But when your sensor is only so big and it's really small, you're not capturing as many photons. And so you're gonna end up with a noisier image because there's not enough signal. Your signal to noise ratio is out of whack. There's not enough signal, not enough real photons hitting that.

Jonathan Stephens (16:20)
Mm-hmm.

And so like if you, you mentioned ISO. So ISO is the thing that was like, if you push it up, it'll brighten your image, but you also then expose more noise in it, right? So you get these like little, it looks like dust in it.

Jared Heinly (16:33)
Yeah. Yeah, because you're just multiplying.

You're like, hey, whatever photons I am seeing, let me just multiply. So anytime I receive one, let me pretend that was two. Or anytime I receive two, let me pretend that was four. So you're just multiplying that signal that comes in. But yeah, that native sensor itself is not... You put your sensor in a dark room, it's not outputting a perfectly black image. thermal noise.

Jonathan Stephens (16:42)
Mm-hmm.

Mm-hmm.

Jared Heinly (17:00)
You know, there's just that you get just slight noise in the image that then is always there. And so you typically want to just, you know, you want to have enough real light to overwhelm just the native noise that's present there in the sensor.

Jonathan Stephens (17:07)
Okay.

Yeah, that makes sense. ⁓ Which we'll get to sensor sizes a little bit later, but so they're just small on these. So I've seen some of these crazy ones now coming out. I think like Xiaomi has one that they put like a one inch sensor on it in the back is like a like this ring on my phone. If you have you're listening to the if you're watching the YouTube version of this video, I have this ring on my Apple case where you can charge. But that's like the size of the camera. But it's still not going to.

approach the quality of a several thousand dollar, you know, professional camera. He just can't. And then also, I think they do a lot of lying in the literature on some of these cameras, because for example, there's a 48 megapixel camera sensor on this iPhone 16 Pro. And I think some of the Samsung's have, I think they've reeled them back, but some were 200 megapixels, 100 megapixels. Then

Jared Heinly (17:48)
Yeah. Yeah.

Jonathan Stephens (18:13)
that means your pick, the size of each pixel is gotta be like microscopic. the difference, so they're having to do tricks. So they use what they're called like a ⁓ Bayer quad filter or what's it called? So they're doing tricks, pixel binning, right? So they're taking like the average of four pixels to make one pixel. ⁓ So if you're thinking you're buying a

Jared Heinly (18:20)
So is tiny.

Yeah, or like, like, like, yeah, pixel binning, you know. Yep.

Jonathan Stephens (18:41)
48 megapixel camera, you are technically, but I'm getting 12 megapixel photos out of here. I can get a raw 48 megapixel image out of it. But then again, I'm going from a like four megabyte file or I can't remember what they are off an iPhone till like a 50 megabyte file. So, you know, be careful there too. You're going to have to have lots of room on your phone. Um, when you start running around taking photos and raw videos, even worse. mean, yep. So, all right. So what are, so then

Given all these weaknesses and the fact that it fits in your pocket, which is kind of the big benefit to an iPhone and it's user, I'd say it's user friendly, right? You don't have to become an expert. You have one. probably already know how to use it pretty well. ⁓ What are these good for? What do you what do you like these for?

Jared Heinly (19:29)
I I like the smartphone again because it's portable, easy to work with, ⁓ programmable. As a computer vision researcher and also as a software developer, it's nice being able to write software that can run on the iPhone itself and have that direct quick feedback from the camera. So that enables applications like augmented reality or real-time 3D reconstruction where someone doesn't have to go out to take a bunch of photos.

Jonathan Stephens (19:49)
Mm-hmm.

Jared Heinly (19:59)
offload it from the camera, import it onto their laptop or upload it to the cloud. It's like, no, that you can do processing right on those images as they're coming in to accomplish some task, either guiding the user through a capture or showing them the results in real time. ⁓ It's nice having the processor and the camera in one piece and being able to write code for that.

Jonathan Stephens (20:19)
Mm-hmm.

Yeah, I like to think you're not buying just a camera. Well, I mean, obviously, or a phone, you're you're more than anything, you're buying a computer, a really good computer that fits in your pocket. So you can build apps on it, like you said. ⁓ And and it has its pack full of sensors. Which which you can tap into. So this one has a lidar or structured. I don't know if it's true lidar, but it's got a lidar, a depth sensor on there. It's got two, one on each side for each camera. It's got three cameras. It's got

Jared Heinly (20:27)
Yeah. Yeah. Yeah. Yeah.

Yeah. Yeah.

Yep. Yep. Yep.

Jonathan Stephens (20:52)
⁓ inertial sensors so it knows if it's like turning or pivoting. It's got a GPS on there. So you start to think how many sensors are in this tiny phone. It's incredible. you, Jared, being a developer can leverage those through APIs right from Apple or there's ARCore for Android and then speed up.

Reconstruction right because now you already have a good idea of perhaps where that phone went as a trajectory and where the photos were taken and you can probably Speed up that reconstruction process quite a bit

Jared Heinly (21:29)
Yep, exactly, exactly. so, and that general concept is applicable to other metadata as well. Like you see your drone has GPS position, or if you had photos that were geotagged and had that GPS embedded in the EXIF there's ways to use that as a constraint to help accelerate 3D reconstruction. But then, yeah, even more so with all those inertial sensors you just described and whether you're using a SLAM system, the on-device pose tracking system like ARKit or ARCore. ⁓

Jonathan Stephens (21:45)
Mm.

Jared Heinly (21:59)
those pose estimates can be used to accelerate ⁓ 3D reconstruction process.

Jonathan Stephens (22:05)
Yeah, and you get depths with your photos, things like that. That can just speed things up. So these are great. I've been doing lots of Gaussian splatting and of course we all know that. My favorite app ⁓ for that on this, I don't like to necessarily promote one app, but there's only really one app that does it on device and it's Scaniverse. you can go out and you can create 3D reconstruction or you can do like a Gaussian splat.

Jared Heinly (22:08)
Yeah, yeah, yeah.

Jonathan Stephens (22:30)
right here on the phone, takes like a minute to process and you can actually process over and over and get it better and better, it does more rounds. And so then you can learn how to make good captures without having to then go upload the images to a service and waiting or it's just you find out right there and you can go try it again. Same for photogrammetry, I think the iPhone now supports it where you can do photogrammetry right on the iPhone with their own object capture. there's just.

You can do lightweight computing on the phone, which is really cool. And you can go through trial and errors before having to go sit at a PC or upload it to a cloud, which is big. And then like we power recon 3d at every point and you're doing, but detailed 3d reconstruction and a point clouds right on the device as well. If you need to for safety and like data safety, you don't want that data leave a device. Well, there you go. So you have a decent computing platform.

All right, so I feel like we've talked iPhone and smartphones, I guess, pretty well. I think that's because everyone at least has one. Start there. If you're learning 3D reconstruction, I would start there. You don't have to figure out lenses or flying a drone camera at the same time, taking pictures. ⁓ So let's then move to then, I had camera props, but I don't have the DSLR on my desk just because... ⁓

Jared Heinly (23:41)
It's a no-brainer.

Jonathan Stephens (23:55)
I'm using it as my camera here, but I do have some lenses. I have some interesting lenses. So ⁓ let's talk about that. So I say the biggest strength already to a mirrorless is the fact I have two lenses here. I have a prime lens and I have a, this is a fisheye lens. ⁓ And so you can shoot, can use like, there's no limit to the number of lenses. Well, there's a limit, but it was usually price, but you can.

You can mix and match lenses on a mirrorless camera with it like or a DSLR because they're detachable for the most part. There's some that are come like all in one. But ⁓ Jared, what else would you think is a big pro to using one of these cameras?

Jared Heinly (24:29)
Mm-hmm.

well the DSLR, mean like to contrast the smartphone typically yeah, your DSLR is gonna have a much bigger sensor. And I'll say DSLR slash mirrorless, maybe we're grouping theirs, you but just in this class of camera, you know, you've got a bigger sensor, so you'd have better low light performance. ⁓ I like, yeah, you're touching on the lenses so you can swap out the lenses. So it's the same sensor, just different lenses compared to the phone where it was, I've got three cameras, which means I have three lenses and three separate sensors.

Jonathan Stephens (24:41)
Mm-hmm.

Mm-hmm.

Jared Heinly (25:03)
but now it's one sensor that then can be adapted to capture different types of photographs based on that lens. ⁓ Typically also now that my sensor is bigger and I've got these big lenses, ⁓ now it can hurt 3D reconstruction and hurt photogrammetry, but for artistic side of things, you can have more control over your depth of field. And so you can blur out the background easier when you actually have a sensor that's bigger.

Jonathan Stephens (25:28)
Mm-hmm.

Jared Heinly (25:32)
Because in order to blur out your background or blur out the foreground and really have a sharp plane of focus where everything else is blurry, you just need a big sensor and big lens. And so that's just something that's only really accomplished with that DSLR sized thing. ⁓

Jonathan Stephens (25:43)
Mm-hmm.

Yeah.

So I want to take that line of focal plane as well. So a term I've thrown out in tutorial videos and just my tech evangelism is when you are capturing scenes for 3D Gaussian splatting, big, a big pro of that is that it doesn't recreate just the object in front of you. It's going to recreate the entirety of the scene that you capture. And therefore I always tell people you want as deep ⁓

Focal plane is possible. You want everything to be as sharp as can be because even even in like game engines You can render this and then bokeh the background digitally You don't have to actually do it in capture, but if it's not focused to start it will never ever be in focus later ⁓ and so like I was playing around this one this camera is a ⁓ Fish eye lens. So it's got it's almost hard to see in the picture, but it's like it's like a bubble almost and you get this 180 degree

It's not like the circular, it's, forget the name of it. It's been like, it's a really wide capture, but it fits just a normal rectangle in the end. And it's really easy to get everything in focus because it's got what's called a hyperfocal point. Can you tell us what that means? Like, cause I've had a hard time explaining to people what hyper, do you even remember that from your days? Yeah.

Jared Heinly (27:09)
man.

Yeah, the hyperfocal distance.

think it's... Okay, don't quote me on this. I'm sure go look it up online, go ask ChatGPT. ⁓ But I think it's a measure. Typically, it's like when you have a lens, you can look up what's the hyperfocal distance. And it's the point at which like, if I focus on, you know, five meters away, it's like at that point...

Jonathan Stephens (27:20)
Ha

Jared Heinly (27:36)
Everything is going to be essentially in focus. It's kind of telling you, think at what distance does all of a sudden everything look like it's in focus or at what point, what setting should you set to maximize the clarity of your entire scene across all depths? ⁓ It's almost like focusing at infinity. Yeah.

Jonathan Stephens (27:57)
Mm-hmm. It's kind of like when you take those landscape photos. Yeah,

yeah, exactly. The same thing you're saying. Either the camera is like the landscape symbol or the infinity symbol saying like, hey, if everything's far away, it's all gonna be sharp. But if you have like one thing in front of you, it'll probably focus on that and then make it a bit like invert. So I always tell people if you can do that and like get far enough away from the objects that it all is in the hyper focal range, it's gonna look a lot better than if you try to have like.

Jared Heinly (28:08)
Yeah. Yeah.

Yep. Yep.

Jonathan Stephens (28:23)
something close in the foreground, it wants to either make that blurred out or all in focus in the background blurred out. There's something to do with like this fisheye lens where I think the hyperfocal distance is like a couple feet from you. So unless you're like right in front of something, it's all gonna be exposed or within the same focus.

Jared Heinly (28:31)
Absolutely.

Yeah, yeah, yeah. Because that depth of focus, like what amount of the scene is in focus changes, okay, I'm operating side. If my camera's off on one, okay, now you can't see this if you're on the podcast, but I guess what I'm trying to say, I'll do it verbally then, is if I'm focusing on something a foot away from me, it may only be plus minus a half of an inch that's in focus.

But then if I focus on something 10 feet away from me, it might be plus minus 6 inches or plus minus a foot. So it's like the farther away you are focusing on something, the greater range of things in a real distance that will be in focus. And so when you're focusing on something a mile away, all of a sudden it's like, wow, now everything's in focus. Because that depth of focus has now increased to encapsulate or capture your entire scene.

Jonathan Stephens (29:29)
Mm-hmm.

Yeah,

⁓ it's kind of a term that people don't run into, but they probably intuitively know that if you're far away from everything, things can all snap into focus, but you can look that up for your camera and for that lens. ⁓ What that distance is that if everything is beyond this distance, it's all going to be in that range then. And to me, that's a really helpful tool if you're trying to get the whole scene in focus. And I've also noticed even in photogrammetry where I don't care about the background, but.

Sometimes the background is something for it to track over time in images. so having that somewhat in focus too will help. Sky doesn't really help, but maybe there's some buildings or something in the background with unique geometry that you can track. And so if you watched our last episode, we kind of talked about that. It's like, you have these unique different patterns and features that we wanna capture as we go along. I've even seen that when people will try to put... ⁓

Jared Heinly (30:37)
Yep.

Jonathan Stephens (30:42)
like an object on a turntable or ⁓ opposite, put an object in a room and they walk around and they're trying to just do the object in the background is like all white walls or something. Like I'm not getting it to work. And I realized, well, the object itself doesn't have a lot of unique features, nor does the background. So I've now been paying attention when people are showing these volumetric captures. I noticed they have ⁓ not just the camera rays, because all the cameras are the same. have, say,

Jared Heinly (31:00)
Mm-hmm.

Jonathan Stephens (31:10)
50 of the exact same camera. Well, that's not very unique, but what they'll do is they'll have a bunch of April tags, which look like, if you're listening, it looks like a bunch of ⁓ QR codes and they have unique ones all around the inside of the capture rig. So it is giving it something unique to register between the cameras. And it's like, that's smart. ⁓ Because the thing in the middle, especially if that's moving, like they're trying to do a video, things like that. ⁓

Jared Heinly (31:31)
Yeah, yeah.

Jonathan Stephens (31:38)
that thing in the middle might be moving. So they need the background to be unique, but a bunch of cameras aren't very unique when they're all the same, at least. ⁓ So then we get in the weeds here, but these are really, they're really useful because you get really good, you have really big sensors. bigger dynamic range. So that means you can get more brights and darknesses in one shot. You can take, there's all these raw options. There's all these different profiles you can put on them after the fact to like,

correct to get them nice and exposed through different software. The lenses are great. I really like these high distortion fish eye lenses as I'm starting to see some of this reconstruction technology come out that actually supports it. why wouldn't I be able just to run around and just use one of these Jared? Why does distortion matter? Why does that break things down? Whoops.

Jared Heinly (32:34)
Yeah,

it makes it harder for the reconstruction algorithms a lot of times, because a lot of times in a 3D reconstruction, we use a technique called the pinhole camera model, which treats the camera as a single point in space. And then every pixel in the image is just a ray. So a line segment, but you're not called a ray because it can go out into infinity. So it starts at that little point, that pinhole.

Jonathan Stephens (32:47)
Mm-hmm.

Mm-hmm.

Jared Heinly (33:03)
and then shoots out through the pixel of the image and then it goes out to infinity. That's just a straight line the whole way. And so that geometry is really easy to model. Going from one pixel to the next, it's just straight lines in slightly different directions. Where distortion becomes problematic is now when I go from pixel to pixel, it's no longer just the same straight line. It's...

Jonathan Stephens (33:10)
Mm-hmm.

Jared Heinly (33:30)
Oh, well, this pixel is actually shifted a little bit in a different direction. know, and so the especially when you have that, that why like that fisheye lens, those pixels along the edge of the image, you know, they're bent, they've been bent to look in very different directions. Now you can still try to model and say, no, my light's coming in from this direction and it hits my sensor. But now there's not the same grid of uniformity that a lens without distortion.

Jonathan Stephens (33:57)
Mm-hmm.

Jared Heinly (33:59)
would have provided. Because when there's no distortion, I have this nice grid of just X and Y, it's really easy for me to figure out which way the rays are pointing. But when there's that distortion, now all of sudden the reconstruction algorithms need to understand that, because it needs to understand that mapping between camera position and the color that it's seeing in these different ray directions.

Jonathan Stephens (34:19)
Mm hmm. So you

need you need I have seen you do it. You calibrated these lenses and you can get the distortion like a probably some sort of mathematical model to try to describe the distortion along the different X, Y coordinates of a picture on that lens. So, you know, towards the edges, things are you'll see it. It's like a warped and then towards the middle, things look I know. I think I think of like I'm from photography. Think of like a map projection where

or Antarctica can look huge or small depending on how it's distorted. Same with an image, right? It'll stretch. Even if you put the ⁓ ultra wide on these smartphones and take a picture, sometimes notice some people's like their feet and their legs and the top of the screen, things are like just a little bigger than they should. And it's just distorting the edges of the camera. All right, so I wanna keep moving this along, so there's several cameras I wanna talk about. I'd say the biggest weakness to these is that there's a learning curve, right? So you gotta learn how to use the.

Jared Heinly (34:53)
Yeah.

Jonathan Stephens (35:17)
I mean, you got to learn the exposure triangle properly. You got to learn how to use your camera lens. You can put everything in auto, but I suggest kind of learning some of those different modes within a camera. I'd say another big weakness is if you're taking video and running around with it, a lot of them have what's called rolling shutter, which I think we'll talk about with the drone because that's where it really applies. But like, for example, the one I have here, if I move in, Sony is got a really slow sensor readout on.

this model, I know if you have a very expensive muralist or DSLR, it might have a mechanical shutter, which I can we'll talk about, but I get it's like if you ever take a picture of a, you'll see these pictures like propellers and they look like they're bent. That's rolling shutter. So it's reading out the image sensor as something's moving and it'll warp it. And that's a no go. That's not good for photogrammetry. There's again, I've used software that I'll say, ⁓

It can compensate for that, but it's never as good as just getting an unwarped good image. So I'd say that's a big drawback is that some of these have big sensors of slow readouts. And if you're trying to move while you're taking photos, it's not going to be ideal.

Jared Heinly (36:29)
Yeah, yeah.

like to summarize it. like the big sensor has better capabilities, more options, but that learning curve, there's more opportunity for you to end up with a bad result.

Jonathan Stephens (36:37)
Mm-hmm.

Yes, yes, you can almost magnify the errors in those. But I think what they're good for is professional work. If you are, if you're getting paid for some of this 3D reconstruction, you're trying to make a beautiful architecture preservation or progress photos for some construction thing, then you're trying to reconstruct it. And you want it to look as good as possible and be able to work with that dynamic range being outside or inside. It's your best. It's your best option.

Jared Heinly (36:49)
Absolutely.

Jonathan Stephens (37:11)
But if you're at that point, my hunch is that you've already viewed those cameras and you're already on that learning curve towards the I know how to use it really well. ⁓ And there's a lot of people out there. I think I spent time on LinkedIn. There's some people on LinkedIn that are just incredibly knowledgeable. And I guarantee if you reach out and you ask them a question or two, they'll point you in the right direction towards settings for specific cameras and things like that. And we don't really use them much at work. We're out there reconstructing.

Images faster scenes fast. So, you know, these are slow not fast. So All right, so i'm gonna move on this one i'm not gonna talk too long on but action cameras because i've seen some really cool reconstructions of action cameras So think gopro or insta 360. I don't have one those on my desk I don't have a prop, but I do have a couple um Right off the bat i'd say like the big pros of those again, they have these big wide field of views, which is great

⁓ so you can capture a lot in, one, one, one image, but also they're really easy to like attach to things. can attach them to a car. could attach them to a robot. can attach them to myself and, and use them as like an, ⁓ what do you call it? It's like a, it's like just always there capturing things for you if you want. And, I think another big cool one about GoPros, they have an open API so I can control up to, I forgot how many you can control like a hundred GoPros.

with an API through a low energy Bluetooth connection thing or whatever it is. have different, I think through wifi and through low energy Bluetooth, it's less cameras. But the fact that I could get, we'll say 50 GoPro cameras, all capturing images or video in ⁓ all time synced through an API is a big deal. So there's a lot of reasons why these cameras are useful, but like to me, what's the biggest drawback then? Why don't we all just run around the GoPro?

Jared Heinly (38:41)
That's awesome.

Yeah, I I was gonna say too, just with that wide field of view, a lot of times that comes with the distortion like we just talked about, you know, with that fisheye lens. Because a lot of times in action, photography action videos, know, sports videos, you know, wanna see that wide field of view. You wanna see wherever that person is operating. Because again, like you said, it's, you might just set this up to capture and just ⁓ record whatever, you know, you're seeing, you know, and... ⁓

Jonathan Stephens (39:17)
Mm-hmm.

Mm-hmm.

Jared Heinly (39:38)
A lot of times with Wide-Field-of-View comes distortion. So that is a downside for 3D reconstruction.

Jonathan Stephens (39:44)
But Jared, I just clicked

the button and distorted all problem solved. Or problem created.

Jared Heinly (39:48)
Problem solved, yeah. Yeah, both. Like

it can help. that again is the camera's inbuilt lens undistortion. it, depending on what it is, it most likely just has some hard coded undistortion parameters. So it says, the camera manufacturer knows that yeah, most of their lenses have a certain amount of distortion in them. And so it can just do some math and map those pixels.

to make it look like there's distortions. Now, by undistorting it, a lot of times, yeah, you'll lose some of that field of view. It's gonna crop off some of those outermost pixels that would have been highly curved or distorted. ⁓ You can also introduce weird artifacts where if that manufacturer's undistortion formula doesn't exactly match reality, well, now you're gonna be warping the pixels in yet a different way. And so you may...

Jonathan Stephens (40:29)
Mm-hmm.

Jared Heinly (40:45)
get close, know, undistort the pictures, but now there may be some other kinds of weird distortion in there ⁓ where pixels have been slightly shifted, you know, based on sort of a nice grid mapping. Some may be shifted left and right and up and down and, you know, stretched and squashed in subtle ways. Like, not to a human, the photo looks great, but to that 3D reconstruction algorithm, which is trying to get pixel perfect alignment and estimation of the parameters, ⁓

Jonathan Stephens (41:02)
Yeah.

Jared Heinly (41:13)
It can sometimes complicate that.

Jonathan Stephens (41:15)
Mm-hmm. I know there's more than one way to stabilize an image with software and some are more destructive than others. Like the basic just cropping and trying to keep it all like closer and keep that shake is okay. Yeah.

Jared Heinly (41:26)
Yeah.

Well, yeah, like motion stabilization,

talking about image stabilization techniques, then, yeah, that's going to change what part of the sensor it's cropped from. And now you're moving over pixels that came from different parts of that lens. And so there's different amounts of distortion in different parts of that lens. And there's a lot of weird effects, and especially when you throw in what I liked you brought up before about the ⁓ rolling shutter and reading across those pixels out at different points in time across the sensor.

Jonathan Stephens (41:37)
Mm-hmm.

Jared Heinly (41:57)
Now this throws even more variables into the mix.

Jonathan Stephens (41:59)
Yeah.

And I tend to find these aren't the best. Again, they're just like an iPhone or sorry. I keep saying iPhone because that's what I have, but a smartphone, they don't have the greatest dynamic range. I know they're getting better and better just like a smartphone. A lot of that's kept up as well. ⁓ But they're yeah, they're highly distorted. They're just they're just easy to put a lot up. And if you're trying to do an array and you're not going to be moving around with it. And ⁓ I've seen some pretty cool capture arrays with.

Jared Heinly (42:12)
Mm-hmm.

Jonathan Stephens (42:28)
They'll use 19 or 20. If there's one of a guy who's he's doing like a it was a student. It's a project at a research project where he's acting like a bartender. And I think they only use 19 GoPros and they're all time synced and he's taking video. And then they've used either stills to make just 3D scenes or they've done dynamic 3D reconstructions off of that. It's a data set. Now you can. an open data set they made at one of these universities. And it's great because they just use 19 GoPros and it was relatively inexpensive than buying

19 iPhones or 19 DSLRs. And they all came synced. ⁓ there's kind of like a fringe camera for fringe benefits of what we can help. But I would say, yeah, they're great for creating quick arrays or I wouldn't call them my go-to camera. So anyone add on top of those?

Jared Heinly (43:24)
No, I think you touched on it. I think you touched on it.

Jonathan Stephens (43:25)
Yeah. All right.

So the next one is a drone. Everyone thinks these are fun. I have this tiny one. I mean, this is the this is I think this is two bottles old now. Come out with a new one every other year. This is like the DJI. I don't know. It's a small one. So if you're listening, I'm holding up on other mini but pro drones and it's got an interesting camera on it. So if I pull off this giant clip, it it it has a it comes with a gimbal, which gives you this nice stabilization.

but also on this gimbal, it can turn the camera from horizontal to vertical. So you not only have one orientation, you have two orientation options from that camera. And I'll talk about why that's a big pro, but I'd say let's start with the biggest pro of this is you're just giving your camera wings. stabilized. So it's not only is it can go somewhere you can never get to.

Jared Heinly (44:16)
Yeah. Yeah. Yeah.

Jonathan Stephens (44:24)
You know, this just unless you like, what, get a helicopter or an airplane, there's no you're never going to get the top of a building, things like that at a distance. So drones are perfect. Drones are what's the other big pro drone off top of your head, Jared? I can think of several.

Jared Heinly (44:35)
Mm-hmm.

man, okay, yeah, so you can move a lot of times they have contrasted back to the smartphone. Usually the drones have a bigger sensor. ⁓ That's nice. ⁓ Now again, this might be some niche benefit, but drones, especially DJI, they embed sometimes extra metadata in the images. And so if you take a photo, yes, there's standard EXIF, but then also there's

Jonathan Stephens (44:48)
Mm-hmm.

Mm-hmm.

Jared Heinly (45:07)
I'll call it extended XF, XMP. There are some other extra tags that DJI embeds in their imagery. That can be nice. That'll tell you sometimes, well, what was the orientation of the drone? What was the orientation of the gimbal? ⁓ And you can use that to enhance the reconstruction, adding constraints. ⁓

Jonathan Stephens (45:23)
Mm-hmm.

you can use the spatial mode in like CoalMap or you know, leverage the GPS coordinates to speed up the initial structure from motion reconstruction. So there's some benefits even just having GPS tags that are somewhat correct.

Jared Heinly (45:39)
Yup.

Yep. Yep.

Yeah, yeah. And usually those will be more accurate than what your smartphone would have given you. The smartphone GPS is usually within a few meters. Now the drone's GPS is similar, like it's still using the same GPS constellation of satellites. But a lot of times the drone is fusing its inertial sensors with its ⁓ understanding of how is it moving through the world to report a GPS coordinate that's more accurate, it's a little bit more filtered.

Jonathan Stephens (45:56)
Mm-hmm.

Mm-hmm.

Yep.

Jared Heinly (46:20)
⁓ and so.

Jonathan Stephens (46:21)
In my recollection of my geospatial days, being up higher, again, it's like you're seeing the same satellites, but you're not bouncing anything off a building or anything like that. So you have a clearer view of all the satellites. that, I mean, if you're up there at 200 feet versus at the ground, I've done GPS with the the rover rods and you go right next to a building and it's going to misread your position just because of bouncing off the building, things like that.

Jared Heinly (46:28)
that's true.

That's true.

Jonathan Stephens (46:48)
So you have a clear signal up top. It's a big benefit. And then I'd say like you're seeing the X, the XF plus or whatever you want, the enterprise level of these drones, well, some of them will even tell you the camera and they have embedded the camera distortion model. So you don't even have to do that yourself. You calibrate it, they tell you it. They can even undistort them for you if you want. So you get all this great data.

Jared Heinly (46:56)
Yeah.

Jonathan Stephens (47:17)
But you pay for, but if you're doing professional work can be really helpful to get a really quality output. ⁓ I'd say another.

Jared Heinly (47:24)
And DJI,

like on that note, like I've looked at a few, know, at that XF, that extended XF data for some of the DJI drones. And it does look like DJI is doing sort of a per drone calibration. Now again, it could be a small sample size. I haven't looked at, you know, thousands or hundreds of these things, but at least out of you know, multitude of data sets that I've looked at for a particular drone, it's like each one had a slightly different.

Jonathan Stephens (47:40)
Mm-hmm.

Mm-hmm.

Jared Heinly (47:53)
set of distortion parameters. And so that is nice, you know, and that it is, it does come with a base set of calibration parameters.

Jonathan Stephens (48:00)
Yeah,

I'd say another big pro to these is that you can, again, if we're going up to the enterprise level, so you're doing this for professional work, enterprise sounds expensive, but they're actually probably in the same price range as just a mirrorless camera. Nowadays, you can start in the enterprise world in like 3000, $4000. That may be higher or lower depending on where you live and things like that, but you can get mechanical shutters then, in which,

Jared Heinly (48:30)
Yes.

Jonathan Stephens (48:30)
is we're

getting back to that rolling shutter rolling. We get back to that rolling shutter problem where things get warped. have a mechanical shutter or a global shutter. I don't know if this is the same or equivalent, but you're basically reading out the whole sensor at once. So you're not giving it time to warp the images as the camera is moving and taking photos. Is that correct? I'm not sure if global and mechanical are the same, but they achieve kind of the same results.

Jared Heinly (48:52)
Correct.

Yeah.

Jonathan Stephens (48:59)
It doesn't really matter.

Jared Heinly (49:00)
Yeah, I

was gonna say, cause I think I've seen edge cases where it's like, it's a mechanical shutter, that's, yeah, it's B they're still relying on like, yeah, yeah. Like the shutter in my DSLR, know, it's like, it's a shutter. It, you know, it goes up and down. That's a mechanical shutter. Yeah, it's a curtain. Like an iris, you know, opens up, you know, versus the curtain. And so there it's, that is a mechanical shutter.

Jonathan Stephens (49:17)
Mm hmm. It's a curtain shutter versus like the whatever the one that's like it's like Iris. Yeah.

Jared Heinly (49:29)
but it's backed behind the scenes. It's like, ⁓ they're only, they start and stop the readout of the image intensity in a global ability. when that curtain drops, they can start reading pixels and then they can raise it and stop reading pixels. Again, yes.

Jonathan Stephens (49:38)
Mm-hmm.

Yeah, you're looking for that global shutter. I know like red

cameras were doing that a long time. I've been doing that for a long time. Some of these higher end professional camera, DSLR mirrorless cameras also do that where they, you know, they're reading out the whole sensor at once. ⁓

Jared Heinly (50:00)
Yeah. And

for video, that's hard. Like for video, I don't even know. I haven't looked at this, yeah, like maybe Red does it, but like for video, having true global readout, I think is hard. But like a lot of times, yeah, but with the photos, like if you're taking photos, yes, absolutely. There are...

Jonathan Stephens (50:11)
Mm-hmm.

I think the

cinema cameras start to do that, where you're starting, you're not paying 5,000, you're paying 20,000. But you're probably not buying that for 3D reconstruction. I think RED was the cheapest is why I bring that up. do know that Sony, think, don't quote me on this, but one of their mirrorless cameras had, I think it was their A7 Mark III, they had.

Jared Heinly (50:19)
Yeah, that's... That's true. That's true. That's true. No.

Yeah.

Jonathan Stephens (50:42)
reduced the sensor readout from like 21 milliseconds to eight milliseconds. And then on the four, they went back up to like 20 because I think that sensor was really expensive and they found that people really weren't like willing to want to like, like that wasn't important to them. Especially because you can compensate for a lot of rolling shutter and software now and things like that. But yeah, that was really expensive sensor. They put on that one model of camera and it limited not completely, but it really reduced that rolling shutter effect by having a quick.

Jared Heinly (50:47)
Whoa.

Jonathan Stephens (51:11)
read out, it was really expensive for them to put in that camera. So, all right, so drones are great. You can fly, I'd say the programmability of them are almost the biggest benefit right there. can say if you have even the cheapest of the, I think like for $500, which is not that cheap, but shoot, drone for $500 won't crash. It'll fly itself practically and you can tell it to do like just

Jared Heinly (51:12)
Yeah, that's fast.

absolutely.

Jonathan Stephens (51:41)
basic programmable things like a loop around a point of interest or fly straight line, things like that. But then as the price goes up, believe in a little bit, you can start to do things like fly gridded pattern, stop at every photo or keep moving. ⁓ Some of these higher end ones there now you can capture it as a lightweight 3D model of an object. And then you'll say, okay, now come up with the optimal ⁓ flight path and it'll capture that for 3D reconstruction.

Jared Heinly (51:57)
Yeah.

Jonathan Stephens (52:12)
You're taking, mean, that's the greatest thing is I can say I want a photo every 70%. I want 70 % forward overlap and 70 % side overlap on every photo. And it will pretty closely nail that every time versus a human, it gets messy. You gotta just take a photo, take a step or two, take a photo, just kind of intuitively figure that out or take a video and then extract a bunch of image frames, which has some plus and minuses to it taking a video versus an image. So.

Jared Heinly (52:40)
I'm so glad you brought up that programmability because that is such a big benefit to it. We've spent so, we had a whole episode on it and we touch on it every few episodes talking about image capture best practices and talking about, like you just said there, how much overlap, when I take a photo, how far should I move? How much overlap should be between those two photos? I wanna make sure I don't rotate in place. There's all of these things that you should be thinking about when you're capturing images for 3D reconstruction.

Jonathan Stephens (52:47)
Okay.

Mm-hmm.

Jared Heinly (53:09)
put the drone when you can program its flight pattern that abstracts all of that away and does it in a nice consistent matter. And so, yeah, that's a really great benefit.

Jonathan Stephens (53:17)
everyone

always asks, like I show like a really cool 3D reconstruction or a really cool Gajan splat scene and like, what camera you use? I was like, it's just a drone. That's all that matters is because I programmed it and it was either taking continuous pictures at like five miles an hour where you had almost no distortion or, or, it was, it was paused and taken photos, whatever it is. I always made sure it was moving really slow and there was no shake. I didn't have a human shaking the camera between shots. And then,

Jared Heinly (53:27)
Yeah, yeah.

Jonathan Stephens (53:47)
Why I wanna say it's a big thing about taking a portrait on this versus always wide, I know a lot of people also like to do like say a building or ⁓ just something that's kind of vertical in nature versus wide. ⁓ You can actually be a lot closer to that object. Let's say in a Gaussian splat, you want to get the entirety of the object hopefully in one shot because if you're seeing like it can make some problems. In photogrammetry, not as big a deal but.

Let's say I'm trying to do something that's more vertical than wide. If I put in portrait, the arc in which I have to fly around it can be a lot closer to capture the whole thing in one shot. Then if it's wide, I have to get much further back because the field of view is now more vertical than wide. And so that can make a huge, huge difference right there as well. So there's a little bit more flexibility now with these. I'm noticing more and more of them have these like turning gimbals or even now like them.

Matrice or Matrice, the Mavic Pro 3, whatever the they're like starter enterprise one will have like a telephoto and an ultra wide. They have more than one lens on there like an iPhone or a Samsung where they have more than one. So you have all this flexibility for what you're trying to do. So, yeah. So they're good. They're good for capturing big scenes, buildings, things you can't get around. I'd say if.

Jared Heinly (54:58)
Yeah.

Yeah, they're becoming very powerful, very powerful platforms.

Jonathan Stephens (55:12)
I say there's never a bad day buying a drone ⁓ if you wanna get into it, because you can also have fun with them, but also know that you can't just fly them anywhere. So you have to know your local laws. ⁓ Yes. All right, so I'm gonna spend, so we've almost been here on an hour, so let's just speed through some of these special cameras, because I don't think people use them as much, but I have one on my desk here. It's a 360 camera. They stopped making this one. I liked it because it had a one inch sensor, ⁓ which was a big deal.

Jared Heinly (55:18)
Yeah.

Jonathan Stephens (55:42)
but ⁓ there's new ones like the X5 that this is an Insta360, okay, if you're listening, it's an Insta360 RS, one RS, one inch 360 edition, I don't know, it's their one inch sensor one. They stopped making it, but their X5 now, like is getting better results with the smaller sensor and just computational photography. And so you're getting a 360 image, which they can stitch together the two spherical lenses and

Jared Heinly (56:10)
to have the two hemispheres

and yeah.

Jonathan Stephens (56:12)
Yeah, and so those

are really like the big pro to this is I can capture an area with a lot less imagery. So the time to capture a scene is really reduced, but I'm in the picture. I can put it on a helmet and put myself underneath it. If you're listening, I'm actually putting like on top of my head. I can put it on it. I have like a selfie stick which disappears in the imagery, but I'm still in it. So you either have to mask yourself out. So by capturing everything you're in it.

Or you have to like put this dingle this thing from a drone or put it on a robot, but then this I mean come on I Like them I love playing with one ⁓ I found out if you're outside is hard to get a lot of parallax if everything's really far. So you got to really move But to get that parallax, but if I'm in like a room, they're really good So I think that's where they're excel is and like even a large warehouse room where there's a lot of things that are

relatively close, but if you're outside, everything's really far, you gotta make a lot further movement than you think to get a parallax shot, which we all talked about in the last episode is kind of paramount.

Jared Heinly (57:24)
And I'll harp on that too. I just think like the greatest strength for me I see is being able to capture so much imagery in a single pass and then even more so like what you said, indoors, know, I'll in constrained environments where I'm trying to capture, you know, a limited space, a constrained environment, which is hard for me to move around, move side to side, rotate around. You know, it's very easy if I have just a normal, you know, 60 degree field of view, wide angle camera.

I've got to do lot of arcs back and forth and move in and out to capture all the surfaces of that room or that environment or that tight quarters. Whereas with that 360 camera, just you get in, you get out and it's seen everything that was there. Yeah.

Jonathan Stephens (57:54)
Mm-hmm.

Yeah, yeah.

and then I run into, excuse me, then I run into sun issues. So if I'm outside and it's sunny, get sun flare. It'll capture all the like little smudges or anything on that lens. The lens is really exposed. So they're, finicky, but they're cool because they're kind of like your shortcut to fast capture. you have to, all again, the images are all super distorted. not most, most three reconstruction won't take them natively. got to like.

cube map them or there's different techniques.

Jared Heinly (58:38)
There's different There are some algorithms. think OpenMVG supports processing spherical images or that 360 image natively. They could do that through algorithms or like you said, they're trying to cube map it, rendering pinhole style camera, rendering normal looking photos from that 360 footage. ⁓

Jonathan Stephens (58:43)
Mm-hmm.

Yeah.

Then you're going through

a lot of steps. So you're stitching together distorted images, which is gonna have some inherent problems. And then you're gonna then project panel flat 2D images out of a 360. So you can run into some problems. So I wanna put that out there, play with them. I thought I had one more thing I wanted to add on top of these that were really cool. It's escaping me, but yeah, yes. ⁓

Also, this is a 6K camera, 6K video. Sounds like a lot of pixels, right? I mean, you're taking 4K on a normal camera and it takes a 21 megapixel still, sounds like lot of pixels. But what you don't realize is that you're stretching 21 megapixel images across 360 degree field of view versus, you know, a very small field of view. So your pixel density is lot smaller too. So.

If you're doing video specifically on these, you're probably not going to get a really high resolution output, which is fine. I've found that it works really good to get point clouds of things where I don't care about getting perfect textures, but it will not look great if you're just trying to get, you know, good texture and things like that. ⁓ All right. So the next one is I had fish eye cameras, but we already talked about that. You can get just fish eye cameras, which are really like the GoPros, things like that. They have a huge field of view.

⁓ I think they're really useful because you can put them on robots and robots and either them to see, and you know, there's like these cool slam algorithms that use that, you know, wide field of view. So if they're trying to figure out where they are, is there moving around? Those are great. I would love to do like a, get a robotics researcher on here and talk about that sort of camera system for sensing. Yeah.

Jared Heinly (1:00:47)
Totally. Yeah, like the task of perception, like

what setup, what camera system, what algorithms you're doing for real-time robotic perception versus what we've kind of been talking about, which is more photogrammetry, 3D reconstruction. Yeah, different things. That would be fun.

Jonathan Stephens (1:01:02)
Yeah. OK,

I have another one here. If you can't see on the screen, it's an OAK-D Lite sensor. So it's it's the size of what it's like. I don't know, slightly bigger than a Kit Kat bar, but I even like it. I mean, like one of the bars within a Kit Kat candy bar, you know, like one of the break me off parts. It's probably twice as wide, maybe. And it's got a left and right camera. It's hard to see in here. It's got a left and right camera and then a middle cameras. The middle camera, I believe, is a 4K

RGB camera and then the left and right cameras or it's a stereo pair So already knows like in the camera how far those are and they are Black and white there's no color out of those, but I get I get an RGB D Output from this so I get red blue green and a depth but depth up to a certain distance because we've talked that there's like a baseline and if this was to say three times as wide I could probably get further back, but it can't it's like

your eyes, you only have depth perception to a certain point. And then at that point, you really just know depth based off of your brainness, knowing what things look like far and wide. So is this a stereo camera or an RGBD camera? I don't know. I didn't know what to put this in because I think some RGBD cameras have an actual active sensor like an iPhone. RGBD mode is like you have a camera and then you have the lidar.

Jared Heinly (1:02:11)
It's both.

Jonathan Stephens (1:02:21)
and you see you can get an RGBD, I guess an image, that's an image. So this is a stereo camera, which is really cool. And this one even has a heat sink on the back because it's got an onboard compute. So it can do onboard image recognition and OCR, all those different things that it can do like right on here. So then it's sending like the post-process data to it. So these are really cool. It's really special. So the big pro something like this is again, like robotics.

or you want to maybe it's attached to robotic arm sensing and it needs to know something relatively close for grasping objects and detecting what the object is. At this point, you're probably a computer vision researcher or robotics engineer and you're going to be diving in this. This one connects over, I'm not trying to sell these, but I promise this one is a USB-C. Some of these will be PoE, so they're power over ethernet. So there's just different things you can do with

Jared Heinly (1:03:17)
Mm-hmm.

Jonathan Stephens (1:03:20)
these kind of specialized cameras and you can just put them on things and it's so small. mean, it's crazy. It does have a computer on it, I guess, but it's not like a Raspberry Pi on board where I can put an operating system on it. So Jared, do you have any thoughts on those kinds of cameras, what you like them for?

Jared Heinly (1:03:23)
Yeah.

Yeah.

I always could say, to me, like I liked what you said about the robotics applications. To me, coming with some of the research background, sensors like that accelerate research. And so whenever the Microsoft Connect, Xbox Connect sensor came out, great, research labs were buying those things up because now instead of having to spend time implementing stereo depth estimation algorithms based on imagery,

You can just buy a single off-the-shelf sensor that gives you depth for free. And now as a researcher, we can spend our time figuring out, assuming we have depth, now what do we do with it? And so you can kind of skip over all of these algorithmic challenges and just start with depth for free and run with it. And so a lot of, yeah, it's really nice for that research space and just being able to get data for free and run with

Jonathan Stephens (1:04:20)
Mmm.

I've noticed in a lot of research, I love to dive into research papers, but I've noticed a lot of them will be saying like, start with RGBD data provided. And it came from a sensor like this, or I've noticed they're now, I mean, again, we can cheat our way into depth. So using like these depth anything models. So they're using a machine learning to just estimate the depth of the scene within a bunch of images or video. So it's like, ⁓ all right. So you're skipping the structure for motion to get the depth in these images.

Jared Heinly (1:04:37)
Yeah.

Jonathan Stephens (1:04:57)
That's fine with me. But I notice they're not coming usually from these. just have them. ⁓ And I guess like you said, the Kinect sensor, the Intel RealSense sensor, those have more of an active sensor and a camera pair. So you get all kinds of different ways, I guess, to get depth. ⁓ So they're really useful. I like that for research. I definitely see a lot of them. Or you're not buying one of these to do photogrammetry. ⁓ You're buying one of these to...

Jared Heinly (1:05:14)
Yeah, different depth sensing technologies,

Jonathan Stephens (1:05:26)
know the depth of a scene and sense. Machine perception, I think, is more of what we're talking about here. Okay, so that's what they're good for. The big pro cons is, I'd say the big cons on these guys is usually they're lower resolution.

they're really single use, you're probably buying this because you know what you're gonna do with it. I bought this off a Kickstarter campaign and I was just wanting to know how they worked. And so this one was I think $72 on Kickstarter plus shipping ⁓ years ago. But you can buy these, I mean these are cheap, you can still buy one of these for under $200 or I think they different brands for like $100. So you usually probably have multiple ones of these for different things. ⁓

Jared Heinly (1:05:49)
Yeah.

Jonathan Stephens (1:06:14)
Okay, so we only have two more cameras. Why don't we talk about event cameras? I think that was, I don't even know if we need go over that again, Jared. That was really, that was really a, oh, well, so tell us again what an event camera is, if you remember.

Jared Heinly (1:06:28)
Yeah, so event

camera is totally different than, well, it's totally different than other cameras. So normally when we've been talking about cameras up until this point, we're talking about a sensor, a lens, and then the output you get from that is a color image, a 2D grid of pixels that have color information. ⁓ And we as humans, we love that. ⁓ But what those cameras, ⁓ these cameras are kind of bad at.

Jonathan Stephens (1:06:46)
Mm-hmm.

Jared Heinly (1:06:58)
is perceiving really fast motion, really fast change. It's like, sure, you can say, oh, I can shut my shutter speed to a thousandth of an image. Great. But what if what I'm trying to observe is moving at a ten thousandth of a second or a hundred thousandth of a second, know, really, really fast. And so that's where event cameras start to come into play. And so event cameras are still as a sensor, a lens, you know, I'm looking at a grid of locations.

Jonathan Stephens (1:07:12)
Mm-hmm.

Jared Heinly (1:07:26)
you in front of the camera, but what you get is not color, but you get changes in intensity. So when a pixel changes enough in terms of, it was black and now it's gray, or it was gray and now it's white, you get an event. know, that, that, that pixel all of a sudden turns from off to on. Uh, and so your camera gives you this stream, asynchronous stream of pixels turning on and off, telling you where motion's occurring.

Jonathan Stephens (1:07:32)
Mm-hmm.

Jared Heinly (1:07:54)
And maybe how much, you don't even know how much motion, you just know for that pixel, how much did its intensity change or that if it changed. So very different. You don't get any pretty picture to look at. It's more like robotics style sensor, tells you which pixels are moving in the image and it sort of a robot to respond really rapidly to what it's seeing. So really special.

Jonathan Stephens (1:08:01)
Okay.

Mm-hmm.

Okay, so those are really special. I'm guessing they're probably

even less likely you have one of those than one of the stereo sensors.

Jared Heinly (1:08:25)
Yeah, yeah, you'd have to be a

researcher to have your hands on one of those.

Jonathan Stephens (1:08:30)
Okay, and I do apologize if you hear my cat meowing in the background. And then, okay, so I have one last camera. is, well, I'm taking two types of cameras in one, but they're kind of achieving the same thing. I had thermal and multispectral. So what they're doing is they're not looking purely at the RGB spectrum of light in which our human eye can perceive, and it's looking at ultraviolet, it's looking at.

Jared Heinly (1:08:34)
You

Jonathan Stephens (1:08:59)
red edge, mean there's all these different things. Okay, I have seen reconstruction of things. I've seen, I think a Nerf once made with thermal cameras. It is possible, but it really to me doesn't really make sense, you know, for, but maybe I'm mistaken what they're really used for. mean, you ever see those really used much in 3D reconstruction or?

Jared Heinly (1:09:22)
Not per se. Because a lot of times I think what you said, like, if I have a thermal camera, lot of your challenges there are just the resolution of the data you're going to get from it. I think even just like, you know, either hardware or a of times just regulations limit the resolution of the imagery that you can get from a thermal camera. know, and so you're operating on 320 by 240. You know, so less than a million pixels, less than a megapixel for any sort of thermal imagery you're getting compared to normal visible light RGB.

Jonathan Stephens (1:09:32)
Mm-hmm.

interesting.

Jared Heinly (1:09:52)
Yeah, you've got multi-megapixel, tens of megapixels to work with, and so that's easier data to operate with.

Jonathan Stephens (1:09:54)
Yeah.

And

I think too, a lot of times like thermal cameras, you really need to know what you're doing when you get into those. Hopefully you can still someone who knows because like thermal cameras are going to see a thermal range and they have a sensitivity to change. So like a pixel isn't necessarily like a degree of change in temperature or something. It's it's got some sort of range threshold. So like a lot of times we'll only even pick up things unless it's over a certain temperature or things like that. And I've played with I've played with with.

Ken coming from the geography background at Emory, my university days, we looked at multi and hyperspectral imagery from satellites, things like that. And drone or probably back then wasn't drones. It was probably like a mountain airplane. But that's cool because they can do they can they can stitch together a bunch of images and make orthomosaics of spectrums of light and then be able to know perhaps the vegetation health of something. If something is dying, you would even know.

what specific plant species things were off of, based off of the reef, the color or the, you know, the reflectance of different bandwidth of light coming off of them that you couldn't you couldn't necessarily see, but you would get a signature. So everything had a unique signature. So you could say, this is all grassland and there's deciduous trees over here. And we even what type of trees and types of grass. So it's more again, getting your academics and science and things like that. But I'm not seeing three reconstruction off those.

Jared Heinly (1:11:23)
video.

Jonathan Stephens (1:11:26)
Again, but I thought I'd mentioned them. They're quite interesting cameras and yeah, they can do some cool things. And there's an economy of scale too. They're not making a lot, so they're really expensive. So, you know, it's expensive probably to manufacture them when you're not making millions of them at scale. So, all right. I think we covered all the major cameras. I'm sure there's some cameras. Someone's going to comment on these episodes and say, but what about this camera? What about that camera? But,

I think that covers it all. Well, thanks, June, for jumping on this episode. I always say, one thing I didn't have here, I had on my notes, said, so at the end of the day, when you're choosing which camera to use, I'd say, if you're just new to the 3D reconstruction journey, start out with a camera you already own. Hopefully, it's already a smartphone because that one, again, idiot, not, I didn't use the right words. That one is low barrier to entry to get good enough images. If you have...

Jared Heinly (1:11:59)
Absolutely.

Jonathan Stephens (1:12:23)
If already taking professional photography for fun and you want to move up into 3D reconstruction of like meshes or Gaussian splats and like that, I highly suggest start their smartphone, figure out that, then move to that camera because you're just adding layers of complexity, even if you already know how to use those types of cameras. And then drones, if you have a drone, play around with the capture patterns first. I think I've only rarely have I ever freeform flown something in captured images.

I typically just stick to the pre-programmed flight paths. ⁓ And then the rest of the cameras, yeah, you're probably a researcher at that point, but, or 360 cameras. ⁓ There's great YouTube resources. If you go look on YouTube on how to map things with those, you'll find that there's only a handful of videos, but they're all very useful. So anything you want to add on top of that, that's typically my...

Jared Heinly (1:13:14)
And that's great advice. The only thing that I'm just coming back to now, I could have said this at the beginning, but it's like, well, what is a camera? At the end of the day, we've talked about all of these things, and a camera is just an angular light sensor. It's telling you from this camera's perspective, in this direction, I saw this color of light, or this frequency distribution of light coming to me. And so all these different cameras are just different ways to map

Jonathan Stephens (1:13:37)
Mm-hmm.

Jared Heinly (1:13:43)
that hemisphere or that 360 degrees or whatever subset of viewing directions it is and mapping that down to a set of pixels that then we can use and represent later. So, yeah.

Jonathan Stephens (1:13:54)
Exactly.

It's sometimes it's less about the camera and just type and it's just learning to use what you have, get the best out of it.

Jared Heinly (1:14:01)
Yep, yep. But that's,

I love your advice. Use the camera you have because it most likely is a smartphone and that's gonna get you pretty far.

Jonathan Stephens (1:14:09)
If you've bought a smartphone in the last five years, you're going to have a fantastic camera for getting a lot done. So I didn't mention this too. If you do have a mirrorless camera, I always tell people opt for the at least for gaussian splats, but also I know photogrammetry You want like a wide one. You want like a you want like a 14 millimeter to 20 millimeters. You're getting this nice field of view. If you're trying to get around with like a 200 millimeter lens.

Jared Heinly (1:14:14)
Yep. Yep. Yep.

Jonathan Stephens (1:14:37)
You don't get much in that camera, so you end up having to take much more photos. So I guess we didn't. There's some pros and cons to different lens choices and things, but yeah, wide view, sharp images, even exposure. Use the camera you have. There you go. All right. Well, thanks for joining us on this episode. ⁓ I hopefully our next episode, would look I'm looking to get some people on this. So again, if you're listening to this far to this very end of this and there's

Jared Heinly (1:14:55)
There you go.

Jonathan Stephens (1:15:06)
someone you know in the world of 3D reconstruction and computer vision, which you think would be a good guest, we're gonna start reaching out to guests, because I think we need to start expanding back out to getting some experts. We did mention OpenMVG, we even had the inventor of that, the creator of that, Pierre Moulin on here in the past. So we've had some really good people bringing different perspectives, and I'm looking forward to bringing people on again like that. So anyways, well, thanks for joining me, Jared, and I'll see you in the next episode.

More episodes

Chapters

Creators and Guests

What is Computer Vision Decoded?