Behind The Bots

In this interview with Michael Mayer and Theresa Gresch on the Behind the Bots podcast, we learn about their open source photo management application called PhotoPrism. PhotoPrism uses artificial intelligence to automatically tag and organize your personal photo library. It can detect objects and faces in your images to make them easily searchable. With a focus on privacy, PhotoPrism runs locally so your photos never have to leave your own computer.

Michael and Theresa discuss how they got started with the project to apply their backgrounds in physics and biology to artificial intelligence. They share some of PhotoPrism's key features like facial recognition, location mapping, duplicate detection, and more. We also learn about how PhotoPrism handles video files, the challenges of working with constantly evolving formats, and their future goals like simplifying the setup process. Michael provides his perspective on the exponential growth of generative AI and ethics around synthetic media. Overall, an insightful look into an AI-powered tool that can help you gain control over your personal photo collection.

PHOTOPRISM.APP

https://www.photoprism.app/
https://github.com/photoprism/photoprism
https://twitter.com/photoprism_app

FRY-AI.COM

https://www.fry-ai.com/subscribe
https://www.youtube.com/@TheFryAI
https://twitter.com/lazukars
https://twitter.com/thefryai

Creators & Guests

Host

Ryan Lazuka

The lighthearted Artificial intelligence Journalist. Building the easiest to read AI Email Newsletter Daily Twitter Threads about AI

What is Behind The Bots?

Join us as we delve into the fascinating world of Artificial Intelligence (AI) by interviewing the brightest minds and exploring cutting-edge projects. From innovative ideas to groundbreaking individuals, we're here to uncover the latest developments and thought-provoking discussions in the AI space.

Michael Mayer: Yeah, so I have a background from the university side, like in physics, but I grew up at the computer science department in southern Germany. Ever since I'm working in IT, and that was just something I didn't have a lot of experience with because I work on mostly business applications, I would say web services, but mostly like for enterprise customers.

And so those were pretty boring, so to say. And so I wanted to work with more modern technologies, try new things, and then also apply my mathematical background to it. And so that was the ideal combination.

Theresa Gresch : Yes, for me, I also had scientific background. I originally studied biology. I did my masters in new biology. This is also where my interest in AI comes from, because they have some parallels. And yes, in my studies, I realized that the working conditions in research are not really what I wanted for myself. And I had a strong interest in technical things. So I started to work as a QA engineer after my studies, and then switched to software product management and worked a couple of years for agencies and startups before we started working on photoprism.

Hunter Kallay: Very cool. So you mentioned photoprism. That's your project that you've been working on. Can you tell us just a little bit about what that is? What does it do?

Theresa Gresch : Yes, photoprism is a privacy and user friendly AI powered photo management app. So the idea is to give our users a tool that respects their privacy and that puts them in control of their data. And we use AI to make it easier to manage large amount of photos. So after you tell the software where your photo library is located, a lot of things are happening automatically in the background.

So we detect cases. There's an image classification that adds labels to the images. We read a lot of metadata out of the image files itself, but also out of file names or folder names.

Then we have a rescue coding that adds additional metadata based on the location where the photos have been taken. There's a duplicate detection that makes sure that people don't have the same image tries in the library. Files can be related to each other. For example, if you do like time burst photos, so a lot of photos of the same scene, or you have a raw photo and different edits of it, such as a black and white or a CPI version, those files get stacked automatically. And yes, we do collect a lot of data around these files and use them to display the files to users in different ways. So normally if you organize files, a lot of people do it like in the Windows Explorer or the time left from Apple from Apple. And there you can just browse your files by folders in photo prism.

We do have different pages. So you do have a search with all of your files. Then you do have a section where your files are sorted by the objects that are on the files. You have a section where you can view all photos where a certain person is on. You have a map where you can browse photos by location. We also do filter out non-photographic or low-quality content such as memes or screenshots you have on your phone.

Michael Mayer: Yes. Yeah, so I would say like the original use case or the first goal we had is that the software is working for our own pictures. Because we both, especially me, we have lots of pictures on our disk, completely unsorted. We figured it's faster to write the software than to sort it manually. And for her it was, I'd say, maybe the cat pictures so that she could find all the pictures with her cats on it.

Ryan Lazuka: Yeah, it's awesome. I run into this problem all the time on the iPhone is like, over the years, iPhone's been out since, I don't know, what, 2000, I forget, pretty 2010. Just to be able to search, everybody's got so many pictures on their phone or via the iCloud or via on their phone, on their local phone. It's so hard to find anything. And if you have to find anything that's six years old, you have to scroll forever. Now you can search for some things on the iPhone, but not compared to what you guys do in photoprism.

So it makes it really convenient to be able to go in and search for all cat photos or all photos of pizza or all photos of a particular person. So that's really cool. Is that sort of the motivation? What motivated you guys to start building the project?

Theresa Gresch : Different things. So I wanted to do something with AI and fight the cat pictures.

Michael Mayer: I guess it was really the main thing. So the first recognition we did was for the fight the cat pictures. And then all the other, the software supports many different labels.

But this was like the first kind of use slash a test case we have. And also what's also important, besides the privacy aspect, is that with our software, you can just point it to any directory and then search all the contents without having to upload them to the cloud. Because many people think it's more convenient if you have a cloud search or a cloud application. But if you have a lot of pictures for whatever reason, some local disk, and it's not very convenient if you first have to upload all of them to the cloud because it could take hours or days even or maybe weeks depending on how much you have. So this is much faster to do a local search.

Hunter Kallay: You mentioned privacy. One of the things that people are scared of is uploading their pictures onto these sort of websites and stuff. But you know, they're nervous about it. Who's going to see my pictures? What measures do you take to protect privacy on photoprism?

Theresa Gresch : So first of all, the user decides where his photos are stored. So he can install photoprism on his private server, on the server he owns, like by a hosting provider or on his computer. And photos never leave this location the user shows. So the artificial intelligence we use such as image classification, artificial recognition is running completely local. So the files never leave the system and no data is sent.

Michael Mayer: And that's also what makes it fast because we don't, also for the recognition, we don't have to send the picture data to the cloud because then you have like a round strip and then the indexer stalls and needs to wait for the data. You could of course like do it in some asynchronous way where you like first index the files and then in the background update them or something like that. But this way you can do it everything locally. And then when the file is indexed, it's actually completely indexed and it's not like some data is missing. And then you need to do another scan and another scan and then wait for some background worker until everything is done.

Hunter Kallay: Sounds like a great way to organize your photos. I was just on a few longer plane rides and one of my hobbies on plane rides is trying to sift through my phone to see what photos I like, what photos I ignore and all that. So I'm like, well, I forgot about this picture or something like that. So I really like this tool. I think it's very cool. When did you launch this platform?

Theresa Gresch : So we started working on it on 2018. And I think we had the first stable release end of 2020.

Ryan Lazuka: So you were in on this before the whole AI hype took off last year? Exactly.

Michael Mayer: Yes. We're doing this for quite a long time.

Hunter Kallay: Like these type of models, they don't exist for a very long time. But these were like the first usable models that you could use for free also. They were not completely proprietary.

Michael Mayer: And they're just getting better. So our plan is also to update the models from time to time. So users have like additional features that can detect additional types of pictures.

Ryan Lazuka: Cool. And one thing people can do right now, if you go to photoprism.app, there's a demo there that shows you exactly what it looks like if you had it on your computer. And it's really cool to play around with. So you can go there right now, photoprism.app and play around with that live on their website. It's just a demo, but they will show you what it would look like on your computer.

One thing that's back to the privacy thing is people don't realize too, I don't want to scare people, but if you have all your pictures up in even iCloud, something that you think is secure via Apple, if your account got compromised, someone could get in and get all your pictures pretty easily. I just heard a story about someone in the tech space in Silicon Valley. He's a big wig guy. And he said that it's very easy, right now, to if a hacker really wanted to get into your Apple account, they could because there's ways where they can turn on some kind of thing on your phone where there's some kind of method to get into your account on Apple that hasn't been patched yet in terms of like a setting that's meant for disabled people and hackers are sort of exploiting that. So you just got to be careful anytime you put anything on the cloud, it could be if your account gets hacked, it could mean real trouble for you. And with this, everything's locally. So you don't have to worry about that, which is super nice.

Michael Mayer: Yeah, it's actually the same thing with Google. It's currently being exploited, something with their authentication tokens. And it's from what I know, it's started spreading on YouTube this week. Okay. And then they're like infecting additional other accounts with this and then also post a link or what it is there. And so it's perhaps. And then also, like, like besides this, it's also possible that your account gets blocked by Google, because they think you have like, whatever, pornography in your account. And then you might lose access to all your pictures forever. And they don't do anything. If they don't provide any kind of like personal support for private users, you just lost. Right.

Ryan Lazuka: Like for Google, if you ever got forbid you had problems with your account, there's no one to call and help you through it. I mean, there's no Gmail support or Google account support.

There is if you're, you know, I have a paid tier, but even then, you know, it's not, it's not good. So if you had problems, like you said, if your account got flagged because they're scanning all your pictures and they think you're doing something mischievous, which you really aren't, you know, your pictures could be gone forever. So it's, it's kind of scary, you know, but it's cool that tools like these exist. Now, photoprism, it's an open source project, correct? And how do people go about using your project? Can you just go through the basic steps on how they get it installed on their computer?

Theresa Gresch : So we have different ways to install it. So some NAS providers have apps in the app stores. That's the easiest way. The other way is through Docker compose. So we have a step by step instruction on our homepage. And you can follow it to install it on your server or computer.

Hunter Kallay: I wanted to ask you more generally about AI and photography and imaging. The last few years, we've, last year particularly, we've seen deep fake images being created, different advancements and AI photography. What's your general opinion on that sort of thing? What direction do you think it's headed?

Michael Mayer: Everything is happening so fast as well. We also have an open AI, like a paid account where you can try their different models. It's amazing. I'm not sure if we want to get involved in this generative AI thing because there's so much potential for abuse, obviously. Either you could create a pornography of friends or something like that, or kids you have pictures for all for politicians. I think it's best to leave that to the big companies who have a whole department who can take care of these issues.

Because otherwise, I think it could also be easy to get sued and then you lose a lot of money and then you get this for free at the end of the day. It's just about worth it, I think.

Hunter Kallay: Yeah, it's kind of amazing and scary at the same time. Ryan and I use some different image generation apps, mid-journey and Dolly we've experimented with. They're so fascinating because it's like, I can create anything that I want, but then at the same time, it's like they are dangerous and it is just how you use them.

What purpose is you're using them for? Yeah, I'm kind of the same way as you, Michael, is just sitting back and watching this stuff unfold. It's very interesting to see what's going to happen with all this image generation in the next couple months and years.

Michael Mayer: Yeah, also, I was kind of like a newspaper when I finished school just for a few weeks. But what I learned is that the pictures you take, they should document the reality kind of in a proper way and there's some ethics attached to it and using AI to create realistic photos. It just doesn't seem right for me.

Hunter Kallay: Yeah, there was a report that just came out on Monday and it said that one of the biggest worries that people have about AI's impact even on the economy is its ability to create these pictures and this content that can mislead mass amounts of people. People can make political decisions or they can make business decisions or personal decisions based on images they see or content they see or generated articles that aren't true. That can be very dangerous because a lot of these images look so real. The content seems so real and it's almost impossible at times to differentiate between is that picture AI generated or is it not?

Theresa Gresch : That's what I'm going to say. I'm one of those two. The differentiation between AI and real photos.

Michael Mayer: Yeah, that's also, by the way, it's also bad for developers and especially open source developers who don't have that many resources. I've just heard from the author from the Curl Library and the Curl Command Line Tool and he's receiving lots of bug bounty requests, bug reports for security issues.

Many of them are made up now. People ask JetBPT create a bug report to find a security issue and write a nice bug report from me and then they try to get the bounty for it. And it creates so much additional work and you cannot recognize it just by looking at it.

Ryan Lazuka: So people are creating issues on GitHub that are fake? Yes. Wow.

Theresa Gresch : There's a bug bounty program. For example, there's open source projects that are used also by Microsoft or Google and then there's a security issue that impacts us and we're paying money to get it fixed. So that drives people to report these kind of security issues or try to find them and if they're not small enough, they use AI for it. We also thought about having some issue bounties or bug bounties but that's one of the things that why we decided not to do this so that we don't get these fake reports. Sure.

Ryan Lazuka: That's one of the things that people aren't familiar with GitHub. If you have a project on GitHub, that's one of the really hard things to keep up with is issues because especially if you're a popular product, it's hard to keep up with everything because a lot of the issues might not be, they might be from an end user that doesn't know much about the project and they're creating an issue that really isn't an issue and it's hard to keep up with those as an open source developer. One of the things that is kind of fascinating about images that people don't know is and I'm sure you guys know a lot about this is when you take an image, there's a lot of metadata on an image. So what is it called? XIF data? Is that how you say it?

Michael Mayer: Also, yeah, there's all types of metadata but this is like the XIF is the traditional one. It's around like since the 90s, it started with the digital cameras from what I know and then there's XMP which is a newer format which can also be included. So one of the things that our software does is actually extract the same fields or similar fields from different types of formats and then merges them or figures out which is the best information to use.

Theresa Gresch : This is one of the challenges because there's no real metadata standard. So every tool that is creating photos, so every render is writing two different fields. So there are common fields that are used a lot but it's not the same for our vendors and they're sometimes used for different purposes.

Ryan Lazuka: Yeah, so for people that don't understand, is basically every single phone and camera out there, when you take a picture, it's got this metadata in there automatically unless you turn it off? Is that true? Is that how it works?

Theresa Gresch : Yeah, so it depends on your phone settings. So for example, you can turn something off such as saving the GPS coordinates and also other cameras can do it depending on which device you use and settings. But normally at least taking updates or the camera device which lens has been used is safe in different ways.

Ryan Lazuka: One of the fascinating things I learned last year is that there's apps out there that let you upload a photo, like say, I think it's called like an XIF app, if you don't feel lack of a better term because I don't know the exact name of it, but on the iPhone you can download these XIF apps and you can pull up a photo in there and even a photo of someone else sends you and depending on if they have their, if that user had the XIF data turned on or off in their iPhone settings, you can find out where they took that picture from. So it's almost, it's like tracking, it's kind of like tracking in a way that a lot of people don't know about, which is crazy. So like for example, I played this game with my parents the last couple years where they would send me a picture of where they were and it would be like a photo in the woods, right? Because they like to go on hikes in the woods around us in Cleveland, Ohio, where I live. And every time I immediately put it up in the XF app, find out where they took the picture and reply back, you're at this park and they were just blown away.

They're like, how do you do this? You're stalking me? Are you tracking my phone? I never told them how I actually did it, but it was always fun to be able to guess where they were. So there are ways to find out where a picture was taken, depending on if the user had their, that data turned on or off in their iPhone settings. So is there anything else in that XF data that's kind of interesting for the end user to know that, you know, that's, that that data is like, and is there any kind of specific data that's tied to pictures that's very interesting that the end user, you know, someone watching this podcast might not know about serial numbers.

Michael Mayer: So the, like the usual smartphones, you know, they take pictures that they don't have like serial numbers and the metadata at least most of them. But from like the regular cameras, especially like professional cameras, they usually have like a serial number for the, for the body and for the lens.

So you can identify them. And if you know this is like who this number belongs, then you can like, if you, if you lost your lens, if it got stolen, then you can basically search the internet. If somebody like posts the picture with like the same serial number, for example, and also like the kind of first everyone can identify if the picture was taken with the same lens or not.

The only thing is it could like, of course, it could be fake because it's not like there's no signature right now that would like guarantee that the data you find is correct. But this is actually something and this is also related to I wish they're working on right now. So like the big vendors like like Canon and an icon, I guess. So that there's a digital signature in the, in the picture. Just like a like a web server certificate when you like open the website of your bank. So you know, it's the bank you're talking with. And so when you take a picture with one of those cameras that supported, then you have a digital signature and then that guarantees that that's like a real photo taken with like a Canon camera, that the metadata like the serial number, but also the coordinates, for example, or the timestamp is correct.

Ryan Lazuka: Awesome. So is it something that can't be tampered with by someone?

Michael Mayer: It's at least not easily. I don't know how I make, I guess they have they haven't encrypted somewhere in the camera. So you cannot, obviously you need some kind of kind of signature key for this. I guess it's somewhere in the hardware. So you cannot easily extract them.

Hunter Kallay: I was just going to say, I mean, Ryan's probably the better one to talk about this, but I'm curious about the tech side of your photo prism platform, kind of the underlying tech stack and what that looks like, particularly with your search, because it seems like you can search up different things in your photos. Is the AI trained on, you know, image data? How does that sort of thing work?

Michael Mayer: So the the tech side of we're using a goal in the back end. But the model we use, it's like a pre-trained model. It was created for like a competition originally. So it's like trained on like a standardized image set. because you always always need some training data and then also uses like standardized terms that are called image net and word net. It's like something you do in AI, everything is kind of standardized so that you can compare the performance or connect them like with other models and you don't need to like start from scratch and like figure out like how it's working or like kind of images can classify as this is pretty much standardized at least for the models that are then published on the internet. Cool.

Ryan Lazuka: So when you your tech stack, it's a combination of AI and the metadata or like those the two most important factors of how to find a cat for example in your picture.

Michael Mayer: Yeah, so it's all glued together in the back end. It's a server application. It's a programming language maintained or originally developed by Google. They're also using it for many of their back end services. The good thing is like it's very fast compared to let's say a PHP or what else you could use also faster related to Python or other script languages.

And the downside is however that it wasn't like originally meant to be used in this way. So usually if you have these types of applications there written in either Python or JavaScript from what I know or from what I've seen in practice. I think the only ones who use the Go programming language but it was important to have like a very clean source code.

It's very well testable. So because security is very important to us and the performance is very important to us. So we make this trade off but it also meant a little bit of additional work because it didn't like easily work with everything we wanted to do with all the models.

For example, like we had to compile our own version of the library and then also contributed to the Google projects, to the TensorFlow projects so that others can also use it this way.

Ryan Lazuka: Is it just you two working on the project Michael? It sounds like you're doing most of the coding. Is that true?

Theresa Gresch : Yes, the core team is the two of us and Michael is the developer. Test information and QA but I'm not developing pictures.

Ryan Lazuka: You get the fun job of saying hey this is wrong and fix it for me. Say if I upload 100 pictures of my personal pictures from my phone up to photo prism on my local machine, does your software look at each image and identify objects in each one of those images as they're being uploaded? Like how does that work?

Theresa Gresch : So this is a process we call indexing. So it's running over the pictures and during this indexing all of these things happen. So the I-Model is doing the image classification and it's like labels like dog, cat, nature. So it's not a real object detection. So it's not detecting single objects like wheel or small objects. It's more like a classification and also facial recognition is done on the images. So yes, every image is analyzed.

Ryan Lazuka: Okay. So you say it's an object that isn't detected. How does it, it's indexing instead? What, how does it know what's in the image?

Michael Mayer: Basically, let's say there's like different types of models they all, they all have to do with like, or can be applied to image data. And then there's like these, these general classification models that like look at a picture as a, a picture as a whole. Like, it could be like, like a cat in front of a window or something. And then you have, there could also be other things, but they might not be like important for the, for the, for the image as such. And then it's like getting classified as like a, like a cat.

And then maybe also like 20%, you get like probabilities like 60% cat and 20% window or something like this. Whereas these, the models that do the object detection, their originally, the original main use case was for self-driving cars. So, because this is, this is a completely different use case, because the cars, they needed, so they can see like small objects at the end of the road.

So they know they need to, need to break because there's like a person. So these types of models, they usually at least the ones you get, get for free, they're usually trained with objects related to, to, to this, to this use case for the self-driving cars. So they typically they detect rocks, they detect, they detect traffic lights, they detect people and other cars, bicycles. So this is, this is the different object they usually recognize. Okay.

So it's completely different than what you guys are doing. Yeah. We thought about also adding this, but yeah, the main issue we had with this was that the free model, free models that were available at least at that time.

So we're going to check this again when we have a bit of time. But back then it was mainly for self-driving cars and like it didn't look like a, like something that provides a good, like value to our users.

Theresa Gresch : Because it's not really what you expect if you have like the scene where we have like maybe your kids in front and somewhere in the background, there's like a traffic sign. You want to find the photo base on your kids and not because you're searching for a traffic sign. So this is not how you would approach to find this photo.

Ryan Lazuka: Okay. So is your, is your software look for the most, what you think is the most important thing in the photo and then for lack of a better word, tag that? Exactly.

Theresa Gresch : This is the reason why we chose the image classification of an object detection. So we plan to use additional models or better models in the future because as you say, it's evolving a lot of space. Sure. But we also need to take into account which models we can use because there are restrictions due to the servers of the hardware our users use. So you cannot use like very large or resource intensive models because most people don't have the hardware to run it. So it's always a trade off to see.

Michael Mayer: Yeah. And it also took us, I think at least half a year, I would say, until we had a feeling like for the model we currently use had a feeling what the probabilities mean because we have like different thresholds for each category of a kind of text. For example, like the cats, the cat pictures, they might be reliable.

When the model says like 30%, it might be very reliable. But like, like the trees and windows. Yeah. There's like different types of animals. They all have like different thresholds where like a human would say, okay, this is what the picture is about. So you cannot like just say, okay, if the probability in output is like 50%, then we label it as that. But instead we say, okay, this is something so like, so distinct. And this works really well.

And like 10% probability is enough. At least we add it as a keyword so that you can find it. But then there's so there's like generic things like windows or doors. And generally I don't want to want my pictures to be like about doors or windows, even if there's one in the scene. So those generally will have like higher thresholds and get like ignored. If there's like a 10% chance or 5% chance that there's a window, then like just ignore it.

Ryan Lazuka: Yeah. Do you have like a list of like a database of things that you guys put a less emphasis on like doors and windows? Is that hard coded in the code?

Michael Mayer: It's actually something that also other, we were kind of the first ones to do this. And it's actually something that other open source developers took from us because it was a publisher of Github. Some of them asked or asked us if they can use it. And of course, so you will also find the work we did like another open source project now.

Ryan Lazuka: Awesome. Awesome. And what is there a, I didn't check this out. Is there a paid tier or are you guys looking, is this a for profit company? Just had an has open source code in the background. How are you guys, if you're looking to make it profitable, how are you guys looking to do that?

Michael Mayer: In the meantime, we originally had everything for free, everything open source. But then like learns that this is not sustainable this way. So what we have now is like, I would say like 98% free and open source. And then we have memberships like two membership tiers like the essentials and plus we call it, where like users can sign up to support us. And then they, for example, they get like additional map types that we also pay for. So they like these are like commercial maps that we give them access to, accept a lot maps, like pre maps, when you're like living in the mountains. So you can, can see the height of the mountains where the where you took the pictures there. It's not not everything is flat. Awesome. Stuff like that. And then also some additional admin user interface, you don't have to use the terminal to do certain things.

Ryan Lazuka: Okay, so it's a big, which is a very awesome everything's open source in the year and have maybe there'll be extra features on top of the open source project that will be paid tiers. But the vast majority of your project is open source and free, which is awesome for the end user. Exactly.

Michael Mayer: Especially everything that that makes us kind of unique and especially all the machine learning everything is open source.

Ryan Lazuka: Very cool. Now, do you guys have it just you two working on it? Are there other developers that are helping you out, you know, for free? Like, you know, open, because since it's an open source project, do you get pull request across a decent amount of time? Or how does that look?

Theresa Gresch : So we look at full requests, and we do have like, I would say a handful of contributors who regularly that I two or three times a year do something. We don't get a lot of one time contributions. Some people who have answer questions in our chats, or have other community members solving their specific problems, or people who have good translations. So initially, we do all new translations with the bell, for example, but if people find they want to improve something in their native language, then they can improve this.

Ryan Lazuka: Like out of all the people that you've talked to the user product. Is there anything you see similar like for use case examples? I mean, everyone wants to use it for privacy. That's an obvious one. Are there any other things that people are using photo prism for that? Maybe the general person might not think about be thinking about?

Michael Mayer: For everything, like somebody asking on Twitter or X, as it's called now, is like, I think he's like a trader and like manages his charts.

Theresa Gresch : So screenshots of his chats.

Michael Mayer: Yeah, I think there's also many artists. Some of them, they do restaurants or something like something you cannot host on Google, or you like, might get in trouble. I haven't seen the pictures. But they said, maybe they cannot like use the regular cloud hosting because all the features are going wild. And so they need to self host it. Although it's like like drawings or something, right? Yeah, they need to self host it because they cannot cannot use cloud search for it.

Ryan Lazuka: Well, yeah, there's a fine line. I mean, someone might have, you know, artistic nude pictures or something like that, that Google might not want. But they're totally fine because there are. So it may be an example like that, that people would use on their local, that they could use on their local computer with photo prism, but not with like Google Drive or Google photos or something like that. So what's the next step for this project? Do you like in the future, what are you, do you have any goals over the next year or two?

Theresa Gresch : So we definitely want to grow our team. So currently, it's the two of us and developers expensive. And as we're growing organically, it just takes time until we can afford to hire someone to support us on a regular basis. This is one goal, then we have a lot of features in the pipeline. And another big point is simplifying the installation, because currently, you kind of need to have some technical skills and you need to have a server and this is something which we want to simplify so that more users can use the software.

Michael Mayer: Yeah, absolutely. So ideally, you can like just download a binary also on Windows and then just to install and then also want to provide our users or at least our members like with their own sub domain or something like the certificate so that they can securely share their pictures over the internet. Because this is possible right now, but it's like not fully automated. So you need at least some technical skills to get it done. And we think like once this is automated, then you just need to click the user base would also like pros essentially.

Ryan Lazuka: So you'd be able to from your local computer, share a picture with somebody else, like across the world like via a link and that would link to your local computer. Is that how it would work?

Michael Mayer: Yeah, so we could for example, like provide like a proxy service or something similar also to a software where it's like a VPN that the other person connects to our server and then we act as a proxy and make it available. Yeah, awesome.

Ryan Lazuka: That'd be very cool. And this works. What if someone uploads a video? What happens with that? Do you index that as well? Or is it just for photos?

Theresa Gresch : No, videos work as well. So we detect if the video can be played natively by the browser because so you open it in your browser and if a video format is natively supported by a browser, we don't need to do anything with the file. We just do the normal things. We expect a still image where we do like the facial recognition, the image classification on the metadata out of videos.

It's slightly different than from photos. And then we can play the video. If the format of the video is not supported, we will transcode it to a format that is supported by the browser so that the user can play.

Michael Mayer: Yes, actually videos are like a large part. Initially, we wanted to because we don't have that many videos. We wanted to focus on photography, but all users said, okay, you have a great application. But I'm not using it without the video support. We do like a simple video support. Then as the best, the transcoding is so slow. We need hardware support for this. And now we're like supporting all the formats and hardware transcoding with Intel, AMD, with NVIDIA graphics cards. Yeah, obviously. It's a nightmare. A lot of work that we wanted to avoid initially, but we listen to our users if they want us, then sure.

Ryan Lazuka: Yeah, I'm sure it adds 100% of the work on your end. Just the last couple of nights on my end, I do YouTube videos, like educational ones. And I've had the hardest time. My computer just lags and it drops frame rates.

And it took me, it's been, it's still a problem. It's because the encoder is a problem. It's because it's on a Mac.

And it's just, it's a nightmare. So I can only imagine how much more work that would be on your end, just because it's a never ending cycle of information that you have to learn about videos. There's a lot of technical details about videos to be able to implement it. Implement that into your project as well.

Michael Mayer: Yeah, absolutely. And then also what's especially complicated are the hybrid formats, like live photos, but also there's something like motion photos that selected like a live photo, but an Android. And these are like videos that are in the same file as the JPEG image. And so we developed a custom parser that opens the file, then skips everything until the video starts and then plays the video from the, from the image file.

Ryan Lazuka: Wow. That's crazy. Are you talking about like the images that like on Apple, if you hold your, if you hold the image down and it starts playing, is that what that is?

Michael Mayer: Yeah, exactly. But Apple, Apple is at least so nice that they create two files, one image file, and one video file that you can like open separately. But Google like saves everything in one file. And then so, and if you, if you just open it regularly, it just looks like a regular JPEG image. So you cannot play it.

Ryan Lazuka: So is it, it's two, for the iPhone one, it's two files in one file.

Michael Mayer: Is it an Android? It's like, just like one file.

Ryan Lazuka: Got it. All right. Interesting.

Michael Mayer: So it's basically a custom format they made up for this. It's not, it's not usually supported by anything. So we have to develop our own parser for this own handler. Because this is not something that you can usually play.

Theresa Gresch : And of course it differs between different windows. So the Google motion photos do behave differently than the one Samsung, for example.

Michael Mayer: Yeah, they can also be like different formats in these single files. So it's a real nightmare. It was also fun. It was a challenge.

Ryan Lazuka: Yeah, I bet. But all that like stuff like this is coming out, like there's new formats and files all the time. It's not like it's just going to end one day, right? Like there's going to be something next week that will probably come out that you have to implement.

Theresa Gresch : Some ongoing challenge. Yeah.

Hunter Kallay: And how many users do you have right now?

Theresa Gresch : So I would estimate around 40,000.

Michael Mayer: Yeah, like we have probably many more users. But these are like the, because we serve the gear data and add the maps for our users. And so we have like a bit of traffic data.

So we can estimate how many instances there are. And so there's like every day around 40,000 active instances. But then there's like or servers. But they can also have on each instance, there can also like be multiple users and so on. And some people probably don't have it connected to the internet. So those won't be detected at all.

Ryan Lazuka: I think the how we found you guys is on GitHub. You're one of the top projects in the AI space in terms of being trending. So that's, that says something a lot right there. It's hard. It's not easy to, you know, get a lot of stars and pull requests and things like that and download. So awesome work. Because usually the projects on GitHub, in my opinion, are the best ones because they've got the best coders behind them. And if they're doing well, and other coders or programmers like them, that says a lot about the project. So awesome job. Thank you. Yeah.

Michael Mayer: Yeah, it's been a lot of work. And like, we both have like a scientific technical background. So that was actually like the most challenging part about this was figuring out like how to make it like sustainable, how to organize the community. Like, although like being resilient, when like, when like, you know, some some users there, when they're not satisfied, they're getting like really, really angry. And this is like, sometimes a real challenge when you like need to focus on something, work or something really complicated. And then like, like one of your users screams to you like, because of the frustration, and you just you just need to like to find a way around that, also be nice and like, don't don't let it impact your work. And like, so usually, so you can can do what you like, what what you're probably doing, and don't need to like, take some time off to cool down or something.

Ryan Lazuka: Yeah, it's got to be frustrating sometimes, like, sometimes, I think everyone has this problem where you, you're just you wake up in the morning and you're just putting fires out all day, and not getting really what you wanted to get done. So it's probably like that on steroids for open source project.

Michael Mayer: You get so much positive feedback. So basically, I will say 99% of the feedback is extremely positive. But then it comes like this one negative thing or like, somebody like just had too much coffee.

Ryan Lazuka: Yeah, it's crazy how like, those little negative things of 99, 99% of things are positive. We tend to focus on the negative sometimes, you know, so it's just like, it's awesome to know that your 99% of your stuff is is positive and being welcome welcome by the community. So that's great.

Michael Mayer: Yeah, I think it's just just human. And like, usually, like, in like a big company, like that's like technical support stuff for takes care of it. And the fact for us really is that like, we're doing it everything on our own. So we need to need like, way to like, like, you know, this and so that like the one thing that happens like, like, this one task doesn't like impact what you like want to do next. Definitely.

Ryan Lazuka: And it's not like when you're running a GitHub, any open source project, you're you don't have any time off really, right? It's not like you work till 5pm and shut down like you're always working.

Theresa Gresch : We're always working. And even if we have run through all my mess in the evening, and in the morning, we have a lot more because we have users all over the world. So from all different time zones, also in the chat, there's like, all around the day, questions coming in, even though I'm sleeping.

Ryan Lazuka: Is there anything other than photoprism.app that you guys would like to promote or anything you'd like to promote? Now's the time to do it.

Hunter Kallay: So be sure to check out photoprism.app. Very cool. They got a demo on there that you can check out. If you haven't already, we'll put the link underneath. But then also subscribe to Ryan and I's weekday newsletter fry-ai .com and get weekday news, the latest in AI along with some cool tools and community engagement. And also subscribe to behind the bots, where you can see all different cool interviews with different developers.

More episodes

Chapters

Creators & Guests

What is Behind The Bots?