The Secret Sauce

In this episode, Marcus Eagan, a seasoned open source contributor and former employee of MongoDB, joins the show to discuss his journey in the world of open source and his current projects. Marcus shares his experiences working on various open source technologies, including Lucene, Elasticsearch, and MongoDB. He also delves into the emerging field of vector search and its significance in the AI landscape. The conversation explores the challenges and opportunities of open source collaboration, the impact of commercialization on open source projects, and the importance of open source in driving innovation and transparency. Join us for an insightful discussion on the power of open source and its role in shaping the future of technology.
ā˜… Support this podcast ā˜…

What is The Secret Sauce?

Come join us as we discuss everything open source with guests that are pillars in the industry. Welcome to The Secret Sauce.

Bdougie:

Welcome back to the show. We're about to learn the secret sauce. Excellent. Marcus, thanks so much for the drive up. Which you're are you South Bend?

Marcus Eagan:

I am. Palo Alto.

Bdougie:

Palo Alto. Okay. Mostly. Stuff happens down there. That's for sure.

Marcus Eagan:

Yeah. Definitely.

Bdougie:

Excellent. Yeah. So we we have conversations with folks in open source about their journey, what they're working on. And I'm actually I'm I'm curious about what you're working on now because you've only just recently left Mongo.

Marcus Eagan:

Yeah. What I'm working on now is, not something I talk to people about a lot, but what it is, essentially, is a way to protect people from a new wave of threats, emerging threats. And there are lots of there are lots of new paradigms in computing that are going to challenge the existing systems, and I think that we are finding our way. And we'll definitely be open source. That's for sure.

Bdougie:

Okay. Cool. I'm looking forward to when that comes out, I'll definitely be watching that. But I did wanna talk about your journey. So could you quickly intro who is Marcus and, like, what do you what tell you tell us about your background.

Marcus Eagan:

Yeah. So I grew up in the city of Detroit. I start I learned coding from this program called DAPCEP, which was like, It's

Bdougie:

like a boot camp?

Marcus Eagan:

Yes. Well, no. It's like a pre college engineering enrichment program for, like, 11 year olds.

Bdougie:

Okay. Yeah. Precollege 11 year olds. That's definitely precollege.

Marcus Eagan:

Yeah. For sure. I forget what what the acronym DAPCEP, d a p c e p, stood for, but I know something about, you know, preparing kids, just nurturing that itch who want to build. And then and while I was in college, I got really into open source and networking, like computer networking. And that kinda led me to security in a roundabout way, basically trying to provide Internet in Brooklyn to people for free, and that has a lot of implications.

Marcus Eagan:

And you I never quite got there. I think that's a that was sort of it wasn't a nonprofit thing. It's something Starlink probably needs to do for

Bdougie:

the world. Yeah.

Marcus Eagan:

But, I learned a lot about, the foundations of networking there. And so I started a company that was pretty much a lot of open source technologies pieced together, which is what a lot of companies are.

Bdougie:

Yeah. I mean, modern day development is, you know, fine if someone solved a problem. So you can start solve the harder problem.

Marcus Eagan:

Right.

Bdougie:

Exactly. So, like, what sort of sort of open source did you use to attempt to solve the problem?

Marcus Eagan:

So we were using OpenWRT, which is a it's a Linux distribution targeting, embedded routers. And and so a lot of people use it. A lot of people, hobbyists mostly flash their routers using OpenWRT. And, there's a lot of support from the hardware community, the hardware companies that because they don't they want, this software to work on their chips, on on on their devices. And so we use OpenWRT, and we used Suricata.

Marcus Eagan:

Suricata was the multithreaded implementation of Snort. A lot of people know Snort in the world. It's pervasive at this point. And, Suricata, people thought it was a little ridiculous to try to build this with open WRT because it was designed for 1 gigabit per second throughput. And when we got started, most people in the United States at least didn't have more than maybe 20 or 50 megabits per second, of throughput on their home network.

Marcus Eagan:

But then shortly after we got started, Google Fiber rolled out 1 gigabit per second and Kansas City, somewhere in Texas, maybe Atlanta. I mean, it just started to pop up, and then other providers also were releasing 1 gigabit per second and laying fiber. And so then it didn't seem so silly.

Bdougie:

Yeah. Yeah. Yeah. And you mentioned, OpenWRT. I always said open wert is the thing I say out loud.

Marcus Eagan:

A a lot of people say open wert.

Bdougie:

Yeah. It it's funny about, like, when you see an open source, you see the things, like, spelled out. And, you have your own version in your head because you don't because everything's so async, you just communicate through forums and through comments and stuff like that. So I I I laugh in my head when you said open WRT, and I'm like, open

Marcus Eagan:

work. Yeah. It kinda reminds me of, MySQL. A lot of people say MySQL. Yeah.

Marcus Eagan:

Yeah. But most people say MySQL, but then, like, people who are just reading it, they may not know, they're not they may not be in the crowd where people are talking about this. I think just I try to be flexible on pronunciations.

Bdougie:

The the pandemic, pan pedamics. Pedantics in in how you say things in open source. Like, as long as there's, like, vowels in there, let's just all let's all focus on the problem, which is let's use the stuff.

Marcus Eagan:

Right. Right.

Bdougie:

Yeah. So, you were building and staying for, I guess, getting Internet in in places like Brooklyn. I know you from Mongo, and I know you from Search and your contributions in Solar. So, like, how did you get from that point to getting involved in open source on that other end?

Marcus Eagan:

That that's a fantastic question because, it's really roundabout. It's like so I built in in our company, we use MongoDB and Elasticsearch a lot. And we never paid MongoDB Inc or Elastic Inc, the companies behind those technologies. But on the other hand, I went on later in life to so I ended up going to grad school in Michigan, for information retrieval. It was my focus, data science.

Marcus Eagan:

It's like before NLP and transformers were super hot, I was trying to learn these things in an academic setting. I left after a year. I wasn't there long. Didn't stay to get any degrees, but, I went back to Silicon Valley because I wanted to work on open source where I thought things were happening. And so, I started to work at this company, Lucidworks, on focused on open source and community building.

Marcus Eagan:

And Lucidworks was the main corporate sponsor of Solr at that time, and I I think they still are. And Solr shares a kernel with Elasticsearch. And what I mean by that is there's a an indexing technology, called Lucene. It's an Apache top level project. A lot of people don't know about it.

Marcus Eagan:

They just use Elasticsearch or Solr because it's just an index, an indexing library, really, in Java. And and so you get the HTTP, like CRUD capabilities from Solr and Elasticsearch as well as the distributed capabilities. And so and working on in working on solar, I started to learn more about Lucene deeply. Even though I had used it in my company and just took everything for granted, all the complexity, and, my my goal because it it's such a mature code base, and it works really well. My goal was to make some small incremental improvements that made it easier for people to work with it or made it safer for people to work with it.

Marcus Eagan:

And even those incremental improvements in a code base that mature can be very challenging.

Bdougie:

Yeah. Cool. Yeah. I'm sorry. Go ahead.

Marcus Eagan:

No. Is that is that a is that good? Do you want me to get to the Mongo piece?

Bdougie:

No. No. I mean, that's that's good. And I was I was gonna dig dig into sort of the the thing you had mentioned that you never paid for Mongo. You never paid for all these tools.

Bdougie:

Like, did you ever how did you come to terms of that? And, like, do you do you think there's a a problem in open source where folks will take it to a certain level where I'm gonna learn how this thing works and ins and outs? And, like, it is open source at risk at this point where people aren't paying for things?

Marcus Eagan:

Yeah. I I think that's a a great question. And the way I reconciled it with myself, because I I didn't wanna be fair as and do the right thing as far as I can tell, because open source is at risk, is I work on Lucene, and so Elastic, the company certainly benefits from that, because they use it, and they don't pay me. Yeah. And that's fine.

Marcus Eagan:

And they pay I mean, they're definitely leading in a big way in terms of keeping that project on the cutting edge. There's folks in different place is also working on it, but, I admire Elastic's commitment to Lucene and its investment in Lucene through employing several of the core contributors, several of the committers. But I went to work at MongoDB, and my onboarding was very short. You know? Because I knew MongoDB already.

Bdougie:

Yeah.

Marcus Eagan:

Like, the to give like, if you're in a technical job, you have to do, like, this technical training when you land. Yeah. And it's like, oh, really? I have to do this stuff? Like, Like, I already know how this works.

Bdougie:

But you had been training for the job this entire time.

Marcus Eagan:

Exactly.

Bdougie:

And that's the thing, actually. So I I know you joined our Discord, and you're, like, taking a look at open sauce. Like, that's the thing that we're trying to instill in our users is, like, go build with the builders. And, like, there's no promise for a job at the end of the day. But, like, I think there's an opportunity for folks to come out of boot camp.

Bdougie:

You need to level up to whatever the modern stack or the modern technologies is. Open source is that pathway in that journey.

Marcus Eagan:

Definitely. You know, like, I, a lot of people ask me, like, well, how did you get into, this job, or how did you get into this company? It's like, I was just working on the technologies that they were using, and so they gave me a call. Yeah. And

Bdougie:

But, I mean, that can't go understated because, like, that's, like, you're you're wearing the Steph Curry jersey. Like, I I do love this, it was this, so I may get it wrong, Chinese

Marcus Eagan:

Yeah. That's true. Characters.

Bdougie:

Yeah. So, like, shout out to the Bay because it's representation of all cultures. So Right. Exactly. Nice nice representation.

Bdougie:

But, like, if you wanna get the NBA, the thing you're gonna do is go practice and get in front of people who are gonna scout you and pay attention to you. So that pathway is structured where you go to college, you play. And then from college, you get drafted. And if you don't get drafted, you go play in Europe or somewhere else or Canada. But when it comes to engineers, that's disjointed.

Bdougie:

Like, there's a pathway in the college, but what a lot of people don't realize is a pathway in open source where you can just like you pay play pickup games over here in Bushrod Park, with, like, Steph Curry's name on the court, get noticed. Pick up games that open sources, like, hey. I found an issue. I'm gonna pick this up, and I'm gonna show my skill. And then so the day you walked into Mongo, and they're like, hey.

Bdougie:

You have a job. Here's the laptop. Have at it. You have that experience already.

Marcus Eagan:

Right. Exactly. And and people were like, well, why would I go work on something for free? Why would I Exactly.

Bdougie:

It's a question that gets asked all the time.

Marcus Eagan:

And it's like, well, I was introduced to MongoDB, and they hired me in less than 5 days. So that's why you should go work on open source. And

Bdougie:

We need to clip that. That's, like, so succinct. Like, that's exactly the reason, but, like, the past performance does not predict their future growth. So, like, your pathway, the Mongo, how long ago was that when you started?

Marcus Eagan:

So I I I worked there starting a little more than 3 years ago.

Bdougie:

Okay. Yeah. But, like, 3 years ago is different than today. I think folks might say, oh, I'm gonna go trip to Mongo, and I'm a get that job. I think where we miss is there's other projects where Mongo was 3 years ago that you could be a part of and see that sort of growth and scale.

Marcus Eagan:

Yeah. That's exactly right. I think there's you know, the one of the challenges with open source is, like, once a company gets so big, it's hard to get involved with the their projects. They kinda the projects themselves kinda be even though they're open source, they're really the company's projects, and only contributors are from the company. And so the I think the best projects to get involved with are the new ones.

Marcus Eagan:

Like, the

Bdougie:

young ones. 100 like, that's that's the thing that we so we built this tool called hot open sauce dot pizza, and it was really for new projects. Like, as it was appearing on GitHub, it has less than a 100 stars. Like, those are the times when you could learn when the documentation you can learn the documentation as it's being written. Yeah.

Bdougie:

And, like, my benefit of my career is I went to React in 2014. That's great. I was able to watch the entire ecosystem grow around that first experience, and I don't need to go deep dive in, like, 7 years worth of documentation because I lived it. Right. And I think that could that can't go out understated as well.

Bdougie:

Like, learn newer projects and take a chance and then help

Marcus Eagan:

out. Yeah. I I I I think I think that's exactly right. I there's the like, the stars are a tricky thing because I think to people a lot of people, they signal, something. But I have found, like, some of the most advanced technologies that I have seen in open source don't have a ton of stars, and these are used by really sophisticated companies.

Marcus Eagan:

And this even though the project seems small and the company is small, like, there's a lot of projects out there that I think are are great ones to take a look at and people should get involved

Bdougie:

with. Yeah. 100%. Yeah. Like, this is a there's so much opportunity there, and I I'm super glad that you're sharing your story so folks can, like, take this and then apply it to their story, not sort of like the road map.

Bdougie:

Again, don't need to contribute to Mongo and then 5 days later, have a job. Right. Like, that's not what we're trying to preach today, but it's more of, like, be in the situation where you can get called up and have that opportunity. So I I guess the question, your time at Mongo, you focused, well, I don't know what your full focus on, but I know you eventually were on Atlas.

Marcus Eagan:

Yeah. So I my main focus was search. And any company and this has been for a while now. Whenever I take a job, I tell them I need 20% of my time to work on open source. And they're like, what?

Marcus Eagan:

Like, any open source. Like, whatever I whatever I want because that is what makes me so valuable to you as an organization. As I have my hands in these pots, I know how other companies are doing things, what challenges users are are encountering. And, like, that just needs to be a part of my time because I still want a life outside of work. I want 20% of my time to be go work on open source.

Marcus Eagan:

And at MongoDB, like, they hired me and brought me in to really define a search strategy. Like, have a a cohesive vision, road map, and product offering that really elevates their platform pitch. Right? And it's still early days for Atlas search and Atlas vector search. Both of those, I'm responsible for.

Marcus Eagan:

A lot of a lot of what that was in the early days was bringing in the institutional knowledge of Apache Lucene that I garnered from my my time at Lucidworks or my my time building my my own company, Nodal Security back in the day, and and working with these technologies at large scale.

Bdougie:

Yeah. So the you'd mentioned this previously about, like, the complexity of commercially backed open source projects. Like, do you feel like there's a point where projects can't take more outside contributors, or is that something that the company should always defend?

Marcus Eagan:

Yeah. This is a great question. I think that it depends on on your business. Right? Like, there there's some companies that, like, they have a symbiotic relationship with outside, companies, outside vendors, outside participants.

Marcus Eagan:

A great example of that is canonical, which is like canonical needs collaboration with the hyperscalers, the cloud providers. And the cloud providers need collaboration with Canonical because Ubuntu is a beloved and really awesome operating system. And, like, the chip manufacturers, their drivers have to work with Linux, and so they have to collaborate with Red Hat, IBM. And but that's not the case for every company. Some companies are in a highly competitive space.

Marcus Eagan:

They work on mission critical systems, and their collaboration needs to be predictable and cohesive and scheduled. Everybody's not like that. Every project's not like that.

Bdougie:

So I I guess I I don't know. You're you mentioned Elastic uses Lucene. Elastic had a really interesting journey when it comes to open source. And, I a lot of folks are familiar with, like, Amazon's approach to Elastic and now has a competing product because Elastic was open source. So, like, in the day in in the world of Elastic now sort of, I guess, battening down the hatches a little bit or maybe going a little bit more closed, and less sort of outside contribution.

Bdougie:

I don't know what the what the story is there today. Like, there might still be taking lots outside contribution. I have no idea. I don't pay attention that closely. But is that another risk for these open source companies?

Marcus Eagan:

Yeah. I mean, it's and I don't know if it for like, from my perspective, the way I view this is that we're all in this together. We have to try to solve the world's problems. And someday, some some somebody from another planet's gonna land, and then we're all gonna be on the same team.

Bdougie:

Yeah.

Marcus Eagan:

And so, like, I don't really think about competition the same way most people do. I think we're all just in the sand having fun in the beach. And the interesting thing about elastic, you mentioned leucine. Right? Leucine is the is the kernel of elastic is there's a lot of leucine contributors at Amazon.

Marcus Eagan:

Yeah. So, like, Amazon employs a lot of them. Amazon uses Lucene a lot. And so I think Amazon and Elastic have a shared fate, so to speak. Not I mean, Amazon doesn't have any faith, but, like, it's bigger than most governments.

Marcus Eagan:

All governments probably at this point. But, I think Amazon Amazon search the search teams, which are there are many recommendations. Like, they are they are in this with elastic. And so I think competition is healthy there, and the the outside contributions that go into Elastic's kernel, Lucene, are not even owned by Elastic. It's owned by Apache.

Marcus Eagan:

And so I think those are gonna continue, but there's definitely there's definitely risk. But the you know, there's contributors from Lucidworks, as I mentioned, MongoDB for sure, a couple people, at least or at least one person. There's also people maybe at Apple that are working on this Bloomberg, LinkedIn, Salesforce. So there's a lot of folks in the mix on Lucene, and Elastic's not paying those people, so they're doing okay.

Bdougie:

Yeah. And I think it's it's one of those things that we're I don't know if it's specifically the rising tides racing all boats situation. But the fact that everyone has a vested interest in Lucene's advancement, keeps maybe the scales at bay. Yeah. Yeah.

Bdougie:

Which I know I'm just, like, anecdote on top of anecdote at this moment. So people probably like this. Please stop. But I guess what I'm getting at is, like, there's probably there's, like, there's a balance of scales. And I think when there's, open source helps level the playing field for a lot of ways.

Marcus Eagan:

It does. I mean, it would be very difficult for someone to start a company that is competitive with an Amazon native service, an AWS native service, but it's possible Yeah. Because of open source. And Amazon's AWS still benefits from that because people are gonna spin up these new services in their cloud and pay them for the VMs. So it's all good.

Bdougie:

Yeah.

Marcus Eagan:

Like, everybody everybody's gotta be fine. If you don't execute, you are susceptible to disruption or but and you're not if not constantly innovating, then you're susceptible to disruption, but I think most people, or some people are gonna be fine.

Bdougie:

Yeah. This might seem out of left field, but, like, do you feel like there's a world where Oracle comes back into the mix of people like, people use Oracle. They pay attention to the stuff that they ship, but, like, Oracle is not at the level of an Amazon or Microsoft.

Marcus Eagan:

That's right. That's right. I think the, well, for Oracle so Oracle's super reliable. Oracle is behind I mean, they are the corporate sponsor of Java Yeah. Right, which has been in the mix in this entire conversation because And the warriors as well.

Marcus Eagan:

That's right. And the warriors. Because, you know, Lucene, Solar, Elasticsearch, all of those are written in Java. You know, Spark runs on the JVM via Scala. So, like, I think Oracle is fine as well, and they have this very strong open source connection thanks to the Sun Microsystems acquisition years years ago.

Marcus Eagan:

But they I've seen companies move to the Oracle Cloud Infrastructure that are serious. And so I mean, with serious workloads. So I think that Oracle is going to make a push. I don't know what that's going to look like, but I'm not betting against them. Just Yeah.

Marcus Eagan:

You know, some companies are good companies, and I don't bet against them. You know? Yeah. Where some companies like Oracle, in my view, is so essential. You know, the database has a lot of staying powered switching costs.

Marcus Eagan:

I do think the switching costs are dropping, though, thanks to

Bdougie:

Yeah.

Marcus Eagan:

Open source.

Bdougie:

Yeah. Yeah. A 100%. I think there's a lot of great tools that help this sort of I I don't know if the world of multi cloud is still a thing because I think people have been burnt a little bit by having too many cloud providers. So we are seeing some consolidation of, like, okay.

Bdougie:

We're we're an Azure shop or AWS, and we're back to that stage. But the world of, like, being able to switch over, I think we're now building those dev tools to see, like, a a different vision or a different control plane. But so you you spent your time at, Mongo doing search. There's now and you also work in the AI tool, the vector search tool.

Marcus Eagan:

Yeah. Yeah.

Bdougie:

So, like, can you explain vector search for folks who are just catching up today?

Marcus Eagan:

Yeah. I can. And I can tell you, like, when I first when I first altered a document and I had to defend it and, MongoDB HQ in New York City, a lot of people were like, what's this? Come on. This is, like, oh, about 2 years ago.

Marcus Eagan:

Almost 2 years ago. Wow. And so, like, I saw this wave coming, and it's because the what vectors what vector search is is really just calculating, the similarity of, different ideas, concepts, or objects, through a unified representation or unified format. So you can imagine if if we represented you as a dense vector, dense vector is just an array of floats, and you represented me as a dense vector, and you represent this microphone as a dense vector, our our arrays of floats are gonna look a little bit more similar than the array of floats that captures, microphone. And the way that those floats are derived is it's the output of transformers.

Marcus Eagan:

Large language models are the ones that everyone's talking about today. And if you wanna learn about the transformers architecture, send, attention is all you need. It's a paper. Yeah. 2017 probably.

Marcus Eagan:

Who who who support 2018. Support that paper.

Bdougie:

Googled it? Okay.

Marcus Eagan:

It's Google Research, which to your question about his open source under threat, this is what scares me. It's, like they they got a lot of competition from that paper and from open source in their work. And it's like, I don't want this to slow down innovation or stifle science because we're gonna benefit tremendously from this technology. But, yeah, that that's vector search is just, like, similarity calculating the similarity, of different people or ideas or text that's in this format. Dense vectors.

Bdougie:

Okay. And so Atlas has their vector search tool. Elastic has a vector search tool. But now with this wave of AI and OpenAI, there's now I don't know if there's, like, the when you crack open the atom and you can see all the things inside the atom. At this point, like, you have, like, a Chroma.

Bdougie:

You have Quadrant. You have, like,

Marcus Eagan:

Milkman. We Weeviate, Milkman.

Bdougie:

Yeah. So, like, now there's, like, a a competition that's hyper specialized on the vector part when they have all these incumbents. They're doing search. And I think what we're seeing right now in AI is, like, when Notion ships their AI feature, all those cool AI startups from last year, it's like, okay. What's now how are you competing against Notion?

Bdougie:

So I guess my question is, like, is it still Mongo and and Elastic's game for Vector, or, like, should we be paying attention to these new and upcoming folks?

Marcus Eagan:

Yeah. I I that's a good question, and I I think it depends on your use case.

Bdougie:

Yeah.

Marcus Eagan:

Right? Like like, should you pay I I think about it. I used the JSON example recently on Twitter.

Bdougie:

Yeah.

Marcus Eagan:

Uh-huh. So the a founder of a Postgres company was like, is there like, what's gonna be act 2 of these vector DBs now that Postgres has it and all the other databases have it. And it's like, all the other databases also have JSON, but MongoDB is a real company. It's a big company, and there's a lot of copycats, like JSON focused databases. And what I tell people is there's room for specialization at the appropriate scale or complexity of your app.

Marcus Eagan:

And so so the the one I'm most familiar with is definitely Weeviate. I also know Millvis pretty well, but I'm most familiar with Weviate, and, like, that vector database is designed for really large scale. Like, if you have a lot of tenants, right, you can put 60 almost 60,000 tenants on a single shard, on a single piece of hardware. That that is something you cannot do with these leucine based systems. Alright?

Marcus Eagan:

It's just not gonna work. I mean, maybe, but it's gonna you're gonna have out of memory problems and other issues. But I think that MongoDB and Elastic are great companies, so I think they're gonna be fine. And and developers can use those. They can use PG vector, and then there's gonna be there's a class of companies or a category of use cases that really benefits from the specialization and the design considerations of, or quadrant or Milvus.

Bdougie:

Yeah.

Marcus Eagan:

Chroma Chroma, I I I'm starting to understand what it is, but I think calling itself a database is a little it's a little weird for me to to reconcile because I'm still learning how

Bdougie:

they

Marcus Eagan:

are a database. I like it, though. I mean, it's a great project.

Bdougie:

Yeah. Yeah. You should reach out. Jeff's local. And, we

Marcus Eagan:

In Oakland?

Bdougie:

No. He's in SF. We actually just had him on the podcast.

Marcus Eagan:

Oh, fantastic.

Bdougie:

So talk through that whole trajectory of how they're working on that problem.

Marcus Eagan:

Yeah. I I think I think that, again, Chroma is is great for it's really friendly for Python developers, which is important. These others these these other systems are are less friendly. Maybe they're getting more friendly, but, I'm excited about all of them. I think they're all going to grow.

Marcus Eagan:

I see applications and and opportunities for each of them. There's also like, Redis has a vector offering.

Bdougie:

Okay. I did not know that. Yeah.

Marcus Eagan:

And so, like, they all have a sweet spot, and you have to play around with them and try to figure out which one is the best for you. But I think they're gonna be around for a long time. It's not gonna be a winner take all things sort of like JSON.

Bdougie:

Yeah. Yeah. Exactly. Yeah. This so I taking a full circle, you had mentioned didn't pay for Mongo, had had to come to terms with that.

Bdougie:

The one thing that I do know as we were exploring Elasticsearch for for our product, like, Elasticsearch can get expensive. Like, there

Marcus Eagan:

Very expensive.

Bdougie:

Cost. So, like, they're they're fine. They're making money. Mongo definitely knows how to make money. Yeah.

Bdougie:

And I think their ask their Atlas program or their product itself is a really good on ramp into eventually spending more money Right. With Mongo. But, like, they're tongue in cheek. But, like, seriously, open source projects figured out how to monetize the the product. The thing I guess the one thing I take a step back on is, like, if the vector search tools, a lot of them don't have pricing.

Bdougie:

They're still pretty early and just coming out the gate in the last couple years. I wonder, are they price competitive with the last I mean, this is all speculation. I don't know if you have an answer to this. But I imagine, like, when all this pricing for things like all these AI tools, like, we're now seeing OpenAI actually reducing their pricing recently. I don't know what the reasoning was for that, that announcement.

Bdougie:

I didn't know they cut, like, things in half for sure.

Marcus Eagan:

Yeah. So so there's a a few things going on. One of those things is there's definitely margin compression. Margin compression because there's so many competitors.

Bdougie:

Yeah.

Marcus Eagan:

Right? And then there's also this precipitous drop in the cost of training these models.

Bdougie:

Yeah.

Marcus Eagan:

And people are still spending a ton of money on training models and on GPUs and and trying to build up their their AI moat, I guess, based on their data. And the the money, I think, in the short term is definitely in GPUs, but in the long term, it's gonna be on the CPUs, like inference and and real time systems. And these I don't think these vector database companies need to be focused. Like, if you over rotate on revenue, which is an output. Right?

Marcus Eagan:

You over rotate on revenue, you might miss the opportunity to to to win over developers and gain their trust. And these systems need to be open source. They need to be auditable. This is this is too important a technology. People are not exactly sure how all these systems work and how these things work.

Marcus Eagan:

And so I think for the public sake, we need these tools to be open source so that they can be audited and evaluated and scrutinized.

Bdougie:

Yeah.

Marcus Eagan:

I don't trust these closed source ones.

Bdougie:

Yeah. Yeah. I mean, that that is a the the closed source to black box, like, those situations are always not not concerning it. It's interesting, especially when your competition's like OpenAI and now closed source. There's certain extraneous things that they're they're doing that are still open.

Bdougie:

Very little, I guess, is is is worth looking at. I guess, GPT, 3 might get open sourced again or not again, but open sourced because they've moved on. So let's just all learn off that thing.

Marcus Eagan:

Right.

Bdougie:

But then you look at the llama v 2 that just came out from Meta. Like, now the the the question is, like, wait. Now Meta is catching up in all the head start that OpenAI had. Open source is way for them to catch up because you get adoption, interest, feedback, research much faster than, like, hey. Pay me, you know, the couple cents, per minute or compute or, I guess, the tokens is what OpenAI has.

Bdougie:

Pay for the tokens, and then you'll learn how to use this. Like, llama v 2 is, like, self host it, do what you wish, build it in public. Like, we're all gonna learn together.

Marcus Eagan:

Yeah. I I think that's the right approach. I think a lot of people have been heavily critical of, Facebook and now Meta because of, you know, some accusations or some of their usage of data or people's just general privacy concerns, they are doing the right thing by taking all that information that they've gathered over the years and open sourcing the model. So I don't know exactly what how this is gonna play out. I don't know if you saw on on hugging face the uncensored llama 2.

Marcus Eagan:

No. Oh, yeah. So some people have, have fine tuned these models or changed some parameters, changed some tune some hyperparameters, and now there's, like, uncensored llama twos out there, and people are I mean, they're saying crazy things. And so we need to be, like it's not gonna be computers versus humans. It's gonna be good computers and good humans versus bad computers and bad humans.

Bdougie:

Yeah.

Marcus Eagan:

And so we need to keep an eye on these systems and understand how they work and put all the experts together so they can see it and and pick it apart because it's it's just too significant. This technology is and I'm not all LLMs. I'm not one of those people. I mean, I've been using LLMs for a long time. I'm I'm not or language models in general for a long time.

Marcus Eagan:

I'm not, like, on the hype train. I'm just saying pay attention, dig in. If you have one area of expertise, apply your area to this to understand.

Bdougie:

Yeah. I mean but you spent in your early part of your career in, like, understanding, like, threat vectors and security and stuff like that where now the federal government has a directive of folks who interact in open source or contribute to open source, so but are in a sanctioned company sorry, country rather. So like, China, Crimea, etcetera. Can't do government contracts if you're using open source that touches that situation. I forgot what the, there's a paper that went out a couple years ago, and everyone was kinda, like, circling the path.

Bdougie:

I don't actually know when the deadline is for that to be enacted. Is that something that's in your purview and, like, what you're paying attention to though?

Marcus Eagan:

Yeah. It it is. I think about this a lot because, like, my most recent contribution to Lucene was around, It was something obscure related to related to a query parser. I can even remember now. It's so long.

Marcus Eagan:

It's a few months ago, but the person that I worked worked on it with was it is based in Russia. And, obviously, I want to defend my country, support America allies more American allies and, defend our values. These these things are really important to me. But the the job of maintaining these mission critical systems that handle this large scale sort of transcends geopolitical boundaries. Like, we need all the help we can get.

Marcus Eagan:

And, like, this the the committer, the PMC member from the Lucene, group that merged this PR. I don't think he's thinking about I mean, I don't know. Actually, he's closer to conflict than I am. Maybe he is thinking about this a lot based on the news. But, like, he and I were just working on software.

Marcus Eagan:

We were just solving a scientific problem, and he's working with everybody. Like, everybody's working with him. He's just one of the people. And there's and some of the other projects, there's definitely people in these sanctioned countries, and, I think they're you know, we're safer in the fact that the software is open. So we know how it works.

Marcus Eagan:

They know how it works. We all know how it works. And, one one more thing about it is, like, one time, I went to a I made a change, like, a small documentation change in this project called IoTDB, which I was playing around with a long time ago. And I also made the change in Chinese, but I used, like, Google Translate. I know a little Chinese.

Bdougie:

But I was gonna ask you. You know Chinese?

Marcus Eagan:

A little bit. Okay. But,

Bdougie:

Ni hao. That's that's that's what I got.

Marcus Eagan:

That's good enough. You know? And so the you know, there's people on that project who are based in China, and I think we're building these systems for the globe. Like, the the whole world runs on Linux, everybody. Doesn't matter what side you're on and which, you know, nation you represent.

Marcus Eagan:

I represent America, and I love America, but I'm still collaborating with scientists everywhere. Period. Engineers everywhere. Full stop. Because I have to.

Marcus Eagan:

Because there's there's not that many people. There's, like, 90 there's, like, 95 solar committers. There's, like, 30 of them that are really active. These people live in all they in all different countries, but this project is powering everybody's world. Like, global markets.

Marcus Eagan:

Like, it's critical to global markets, critical to, every health system that's digitized, which ours is only partially digitized. Like, it's critical to a lot of commerce. It's critical to research and and academia. So, like, we're all in this together. It's just humanity trying to solve problems.

Bdougie:

So Yeah. Yeah. I mean, there there's a lot. As, like, the the one contributor to Mongo, to Solar, to Waseem, like, your your pathway, there's a lot of things that you you collaborate on, but also impact. And, like, understand that you're a global citizen, that you there's an opportunity for folks to have documentation in Chinese or or etcetera.

Bdougie:

Researchers coming from other place. Like, I think sometimes we kinda get lost in the sauce on that and, like, forget the interest in how how broad this impact is.

Marcus Eagan:

There there was 1, one story, and there's many stories like this. There was one one kid, and he was a young engineer in Bangladesh, wrote a, a search engine for hadiths, and posted it on Reddit, and it went viral. And it was built with MongoDB. He open sourced it. He wasn't using search.

Marcus Eagan:

He was using, like, text matching, regular expression. And I went to the GitHub and opened an issue and dropped in some code and was, like, telling him this is how you can fix some things. Then we had a a Zoom call, and then he made these changes, introduced some new features, got some new capabilities, improved performance, and we wrote a he wrote a blog about it. And after that, he got hired by a pretty sweet company in Amsterdam. Wow.

Marcus Eagan:

And, like, that made me really happy. That was a big thing for him. And, I've had this occur with a few different people in a few different places all over the world. Like, I work with kids in Colombia, kids in Venezuela, kids in Nigeria, kid 1 one one kid in Zambia. I think Zambia and just many other countries where people are con you know, collaborating in open source and and taking their careers to the next level.

Marcus Eagan:

So I think that's that's what this is all about for me.

Bdougie:

Yeah. And I appreciate that. And I appreciate you joining the the Discord just recently and, offering your skill set and, collaboration.

Marcus Eagan:

Yeah. I'm super excited to to to see what comes of it. I'm gonna try to, you know, pick up one of those good first issues.

Bdougie:

Alright. Let's do

Marcus Eagan:

it. You know, my TypeScript is a little rusty, but it's gonna have to get polished soon because I'm in a startup, and there's only 3 of us. And so that means we're all everything.

Bdougie:

Yeah. Well, we have a we do have, a rush project, and we do have go our our infrastructure is built and go. So

Marcus Eagan:

Fantastic.

Bdougie:

Yeah. Plenty of places for folks to jump in. And, like, we're as we see a need, we're building new projects to solve problems, to keep our core product moving forward. But every now and then, you need a little little side project, a little juice into the, the road map. So

Marcus Eagan:

That's fantastic. I'm I'm so excited for for your your company and your project. Like, I want people to looking at GitHub stars and thinking that's the only metric. There's a lot more to it.

Bdougie:

Yeah. And there's

Marcus Eagan:

a lot more to open source.

Bdougie:

Yeah. For sure. So thanks for the time for the conversation. We'll wind down now. But, folks, stay sauced.

Bdougie:

The secret sauce of the podcast produced in house by Open Sauce, the open source intelligence platform providing insights by the slice. If you're in San Francisco and interested in being a guest on the show, find us on Twitter at saucedopen, and don't forget to check out open sauce at opensaucedot pizza.