Programming Throwdown

What is "The Edge"? The answer is that it means different things to different people, but it always involves lifting logic, data, and processing load off of your backend servers and onto other machines. Sometimes those machines are spread out over many small datacenters, or sometimes they are in the hands of your customers. In all cases, computing on the edge is a different paradigm that requires new ways of thinking about coding. We're super lucky to have Jaxon on the show to share his experiences with edge computing and dive into this topic!!

Show Notes

00:01:15 Introducing Jaxon Repp

00:01:42 What is HarperDB?

00:08:10 Edge Computing

00:10:06 What is the “Edge”

00:14:58 Jaxon’s history with Edge Computing and HarperDB

00:22:35 Edge Computing in everyday life

00:26:12 Tesla AI and data

00:28:09 Edge Computing in the oil industry

00:35:23 Docker containers

00:42:33 Databases

00:48:29 Data Conflicts

00:55:43 HarperDB for personal use

01:00:00 MeteorJS

01:02:29 Netflix, as an example

01:06:19 The speed of edge computing

01:08:43 HarperDB’s work environment and who is Harper?

01:10:30 The Great Debate

01:12:17 Career opportunities in HarperDB

01:18:56 Quantum computing

01:21:22 Reach HarperDB

01:23:53 Raspberry Pi and HarperDB home applications

01:27:20 Farewells

Resources mentioned in this episode:

Companies

HarperDB https://harperdb.io/
MeteorJS https://www.meteor.com/

Tools

Raspberry Pi https://www.raspberrypi.org/
Docker https://www.docker.com/

If you’ve enjoyed this episode, you can listen to more on Programming Throwdown’s website: https://www.programmingthrowdown.com/

Reach out to us via email: programmingthrowdown@gmail.com

You can also follow Programming Throwdown on

Facebook | Apple Podcasts | Spotify | Player.FM

Join the discussion on our Discord

Help support Programming Throwdown through our Patreon

★ Support this podcast on Patreon ★

What is Programming Throwdown?

Programming Throwdown educates Computer Scientists and Software Engineers on a cavalcade of programming and tech topics. Every show will cover a new programming language, so listeners will be able to speak intelligently about any programming language.

[00:00:00] Patrick Wheeler: Programming Throwdown episode 121 Edge Computing with Jaxon Repp. Take it away, Jason.
[00:00:22] Jason Gauci: Hey everybody. This is an awesome episode. I'm really looking forward to this. Your Edge Computing is one of these things where, when I first learned about it, I thought it was just client-side computing. I thought it was something on the browser or something on your mobile device or something like that.
Well, you think about it. There's actually a whole gradient between that and some, some backend server, right? So imagine Netflix releases a new episode that they know it's going to be super popular, you know, their most popular. And everybody just starts downloading it and that would just completely blow them up.
Right. They also just can't put, you know, a two gigabyte show on everybody's phone proactively. They can't do that either. Right. So there has to be an answer there and Edge Computing is a big part of that answer. And so there's, there's a lot of complexity around how this all works and I'm so happy to have, you know, head of product at HarperDB Jaxon Repp here to really dive into Edge Computing and learn as much as we can about this topic.
So thanks for coming on the show, Jaxon,
[00:01:24] Jaxon Repp: Thank you very much for having me. I really appreciate it.
[00:01:27] Jason Gauci: Cool. So, you know, we always kind of start this off. Like how has, COVID kind of changed, you know, Harper and change your kind of work style? You know, like what do you feel has been the sort of, you know, big salient point for that.
[00:01:41] Jaxon Repp: Well, we are a, we're a relatively young company formed in 2017. We were in a co-office, coworking space. We had a bunch of different little isolated spaces together and we had finally got enough traction, got our product where we thought, yeah, we took another round of funding and at least an office space in January of 2020.
[00:02:01] Jason Gauci: Oh wow.
[00:02:02] Jaxon Repp: One month later, everybody was like, maybe that was a bad idea. And we became a fully distributed company in much the same way. We are a distributed product, a distributed database. So, after a few months we took a company survey and every single person was more productive than they had been in the office.
I think we all realized we had a much better work-life balance. My cats are much less lonely. Although all the lonely was kind of their jam. And we decided to sort of, as a company that the money we would have spent on rent, we think we're going to spend on annual or bi-annual retreats where employees can bring their families and we'll go somewhere like Mexico and do morning, you know, planning meetings and afternoon is yours.
Because to be honest, we're, we're still small enough. We can pull that off. And part two is, you know, a lot of us, at least at our company, are pretty okay with, you know, being able to better divide that line between, you know, software development is often one of those all encompassing things that takes over your soul and, and all of your available hours.
I don't know about you, but I'm not as young as I used to be. And I have kids and I would like to see them. They are not terrible people.
[00:03:14] Jason Gauci: I agree with you up until the end, but no, I think you're totally right. I think, you know, having, you know, I get together at some cadence maybe, you know, every year or every semi-annually or something like that.
Yeah. That is super nice. And that can, that can, you know, keep that bond going and like start that bond up with new folks by coming into the office every day. I mean, you know, Patrick and I used to have huge commutes like hour, each way, commutes, and that just eats so much of your day. And so many times, you know, you're there and there were days where I went in and left and didn't even really talk to anybody has kind of like, well, what was I doing for those two hours?
Right. So, yeah, I think, you know, you can get a lot done. You can get by, with not coming into the office every day. I think that's, that's definitely something we've all taken away. So what happened with your lease? Like, were you able to break that lease or, or how does that work? I just, in general, I'm always fascinated with what happened to corporate property at this point.
[00:04:13] Jaxon Repp: I think we got out of it and I think it was because basically they had entire multiple floor tenants who were trying to fight the same battle. And to be honest, I believe probably the real estate agents, lawyers were too busy, fighting, fighting real battles, and to worry about little like startup who took a corner unit with almost no windows.
[00:04:34] Jason Gauci: Nice.
[00:04:35] Jaxon Repp: I honestly, I honestly, I look at it and I say to your point about commutes, I used to commute an hour every day.
And then I moved to HarperDB and my commute was 15 to 20 minutes and I was like, that is so much better. And then I switched to a five second commute from my bedroom to the living room. And there is something to be said for the decompression that comes with a commute.
[00:04:59] Jason Gauci: Oh, that's true
[00:05:01] Jaxon Repp: From being sitting there and staring at your code like that kid who doesn't know why it works and doesn't know why it doesn't work.
And all of a sudden it works again and you're like, okay, I'm done. And you walk out in the living room and kids don't respond that way. They just, you cannot, you cannot debug them. They are just a constant challenge. And, and to be able to be present with them after you've spent, you know, a whole day banging your head against your desk, that is a skill in and of itself.
Being able to walk out and be present and not be focused on.
[00:05:32] Jason Gauci: Yeah, that, oh man. You hit on something. That's such a, such a good point. I mean, so a couple of things to riff on there is when I started taking a walk. So I started walking to work where I would basically just walk in a circle around our neighborhood and that's my walk to work.
And I just feel like mentally, it kind of puts me at a different place. So far it seems to be working. Maybe it's a little bit of placebo effect there or something, but, but I feel like it's doing something. And then the other thing is, yeah, I feel, I feel like even at work, you know, there's, there'll be a situation that's totally on fire and you have to make some really hard decisions very quickly.
And there's, it's a zero sum game everyone's really upset. And then you go from that to let's say a meeting, a one-on-one meeting with somebody who's doing an amazing job and you have to kind of switch gears from, you know, your sort of debate face to, someone's like extremely excited and happy and appreciative.
And then that meeting ends. And now you're with your family, which is like another dimension. So I feel like being able to toggle all of these different persona that, that has been really, really difficult over VC. Whereas, whereas, whereas in a real office, you at least walk from one room to the other and you'd have time to sort of, you know, like reframe yourself over and over.
[00:06:48] Jaxon Repp: Yeah. Just sitting down to dinner and, you know, kind of crossing your hands and saying, all right, let's talk about your performance today.
[00:06:57] Jason Gauci: That's right.
[00:06:59] Jaxon Repp: It's my understanding that you spilled food on your, on your shirt at the beginning of the day. And then you had to walk around with that stain. That's clearly not the image we want to project.
[00:07:08] Jason Gauci: Yeah that's right. You might have to go live next door.
Oh man. So, cool. It sounds like, yeah, it really worked out for the best that I, I think that this acceleration, I mean, definitely, you know, you wouldn't ever wish COVID on anybody or any country or anything like that, but there has been a real silver lining.
I think this has been one of them where we've started to understand the working relationship better. Cool. So, so yeah, let's dive into Edge Computing. So I, you know, initially I thought Edge Computing was just on the browser, on the mobile app, right. That's definitely like the extreme Edge. I mean, there's definitely things you want to do in that space, but there's a whole bunch of stuff in between that.
And let's say an EC2 instance you have running. And so kind of walk us through what that really is like, what is available there and what can people do on the Edge?
[00:08:02] Jaxon Repp: Well, the Edge is defined, loosely, to say, to say the least, we used to think of the Edge as not so far out as the browser, because there's so many limitations in terms of what you can do.
You're in a sandbox. So the next smallest compute unit that we we would focus on, and this is both at HarperDB in my prior company, which is IOT Platform, things like Raspberry Pis, small microcomputer, you know, Jetson boards, stuff like that, where you can run code that handles a workload and will perform some smaller tasks than could be handled on a larger server up in the cloud.
What you find though is that that hardware is not super ready for a lot of dynamic programming and workloads. If you want to be out in a vineyard, for example, an autonomous vineyard just outside of Tucson, Arizona under my previous company, and we are, automatically watering the vines and using compute to analyze soil, moisture content, and humidity and temperature and, and canopy and shade and infrared and all of that stuff.
And we are still using ruggedized Raspberry Pis out there because it's hard to find something that will give you the flexibility to install a platform or a database or whatever you might want to store or run on it. And then likewise handle the reality that it rains outside, or that it's 150 degrees sometimes when, when you're down there measuring temperature.
So the hardware was a huge challenge and we were banging our head against that at HarperDB, because we knew that distributed computing would require distributed data, but, and we knew the benefits of distributed computing running an AI model at the Edge on a small data set or stream data set is much more efficient than shipping it all up to the cloud, especially when you have intermittent conductivity, which is often the case out at what we call the Edge.
[00:09:56] Jason Gauci: Can you describe it, so, I thought that the Edge was like maybe the ISP or something like, like, what exactly is the Edge? Is the Edge it's like your house or, or is the Edge some server in between you and the internet or what exactly is that?
[00:10:13] Jaxon Repp: The Edges, everything outside, what I would call a major colo facility, like a major cloud provider.
So we have we've partnered with Lumen, you know, field century slash level three. They're pushing, you know, micro Edge data centers, which are still data centers and still much more server capacity than I have ever had in my closet at home. But to them that's the Edge. And that was really the change for us to realize that the Edge isn't a wearable because we can't be installed on it, but it is the Edge to some people who can compile apps and put it on a watch for some people, that's the Edge. For a lot of people, you know, where are your sensors are and where you're collecting that data is, is the Edge. That's how you define it. So you fight the battle and you find the hard work. It can survive there and collect that data. But for other people, the Edge is simply my customers have a really fast connection.
I can trust that that connection will exist. I just want to move my application closer to them. So that the round trips to the API is, you know, a millisecond instead of 300 milliseconds. That's the sort of performance that I want to get back. So we find that lots of people are defining the Edge. I mean, the people who own giant cloud data centers are definitely defining the Edge as a slightly smaller data center, slightly closer to the users and the people who are building apps that collect sensor data and make use of that with machine learning, they're pushing it out further and solving those hardware challenges.
I don't think we really truly to find it more than that, because tomorrow we're going to invent some new technology that's inside me.
[00:11:55] Jason Gauci: Yeah, that's right. We're all the Edge. Okay. Oh, that, okay. That totally puts it in perspective. Cause I did a bit of research prior to the show. People who listen to the show to I'm an, I'm an AI person.
So I don't have any background in full stack, but I did a bit of research. And what I saw was things like CloudFlare Edge and AWS Lambda Edge. And that sounded like, as you said, just a smaller data center and there's just a lot of them, but it sounds like Edge is much bigger than that. I mean, it's also if your vineyard is a great example.
So in this case you have, you have this swarm of Raspberry Pis on this vineyard and now that's the Edge. So for Harper, for example, are you concerned with all of those different types of Edge Computing or are you focused more on the former or the latter?
[00:12:43] Jaxon Repp: Well, as I mentioned, when we first started out, when I first joined HarperDB, we were very, very much focused on let's go into AI powered classifications in the mining industry.
And that's a very small, it might be a, a Dell Edge device or a Raspberry PI for proof of concept. And let's do our calculations and let's provide real benefit. And we could absolutely do that, but the client loves it. The result is great. And they're like, "cool, we ran that, but we can't really run a Raspberry PI in this, you know, hot smelting, you know, environment."
So what hardware solutions do we have and inevitably you run across, you know, budget concerns because if we've got 150 of those across a plant, and you're going to spend now $3,000 on a ruggedized piece of hardware, Well, now all of a sudden they're looking at their profit and saying, "well, I can't, I can't handle that sort of CapEx."
I need this to be OPEX, which means now we're going to what, create a HarperDB device with our product on it and we incur that capital cost and now they can pay monthly so they can run it as OPEX. And from a budgetary perspective, it's very, very challenging to have that BD Edge. And we found that the big guys with all the money who are absolutely pushing their, their giant cloud service offerings to these smaller data centers, they are just as desperate to move data and compute and functionality and, and capability to the Edge.
And they have much bigger pocket books and they can make it happen more quickly. So, I mean, not, not a lot more quickly, we can do POC all day long for smaller companies, but there's still a real hardware problem out there for, for true Edge Computing.
[00:14:36] Jason Gauci: Yeah, that makes sense. Let's step back a little bit that we've agreed, defined Edge Computing, which I think is a super, super useful as set the frame.
So what got you into, into Edge Computing? Give us kind of a bit of a background on your kind of story and what led you to HarperDB?
[00:14:53] Jaxon Repp: Sure. I was a partially reformed software developer.
[00:14:57] Jason Gauci: "Partially reformed."
[00:14:59] Jaxon Repp: Partially reformed. This is my eighth startup. I was at a, a communications startup where we were joining together all your phone calls, texts, emails into threads for customer service and the UI for that, it basically, I wrote React like literally a year before React came out.
[00:15:19] Jason Gauci: How is that possible? There's like a preview release or something?
[00:15:23] Jaxon Repp: No, no. I wrote effectively the functionality of these modularized HTML and because there wasn't any. But I knew that that's what we needed.
[00:15:31] Jason Gauci: Well, isn't it amazing how there's like a, it's literally great minds think alike, I guess there's this, there's this idea. And it just, a lot of people kind of come to the realization at the same time.
[00:15:41] Jaxon Repp: Yeah. It just seemed, it seemed so obvious. And then I immediately moved well, my wife was pregnant at the time and she, she kept getting more pregnant and I didn't want to pay cash for that baby. So I had to get a job with insurance and I worked for direct TV and it was one of those jobs.
I've never worked at a company where I had to wear khakis and a button-down before, and it just didn't fit. So I started looking for my next opportunity and I found an IOT platform sort of a low code drag and drop drag in your sensor block. And now the data that comes off of that capture that from port three and divided in two and keep a running average in your memory buffer. And if it goes above, by the way, I fetch a threshold from a database, if it goes above that limit, then send an email. So it was like super easy to use platform, a lot of Node-RED only sort of enterprise grade. And it was a, it was a, it was a great product.
And I watched costs for any given operation dropped dramatically because we didn't have to have massive power, massively powered servers, like on the floor and run cables to everything. These could be wireless connections. We could lose use low energy Bluetooth if we had the ability. So there was a tremendous opportunity for us to capture and refine data and then not send every single piece of sensorized data.
Every piece of sensor data up to the. You very quickly when you're in a, an installation with a thousand sensors, realize how much of your pipe you're taking up, sending everything up there to be analyzed. So it made sense from a filtering perspective to me. And then it was just about how can I make this easier, faster, more stable?
What are the challenges that I have and how can I overcome them?
[00:17:33] Jason Gauci: Makes sense. And so, so that, to that company that you're at, which had the IOT devices, that was from there, you went to Harper, is that the step before Harper?
[00:17:43] Jaxon Repp: Correct. I was looking for a network fault tolerant data solution, and I actually, I found HarperDB just, I think may have actually searched for that.
We have great SEO back then and I came across it and I spun it up. It worked for my use case and I integrated, I built a block for it, for our platform. But I ran into some documentation issues there. Postman collection appeared to be a little outdated. So I rewrote that Postman collection and just sent him an email@helloatharperdb.io.
I'm like, "Hey, your documentation, just to be a little out of date, here's a new file. I rewrote it. So that worked for me so I could use my, I could use the Postman collection basically locally." They said, thanks. I continued to implement it, ran into a couple of things that I thought would be cool to have.
And I wrote to them and eventually they just wrote me back and said, we would really like, perhaps you to work for us. It would be cool if you could consult or just help out. And I'm like, ironically, this office is about to go virtual and I have kids at home and I don't think that I can survive that.
And so they invited me to office with them half time, which turned into full time. And here I am two a half years later.
[00:18:54] Jason Gauci: That is awesome. I mean, that's, I think it's a really great story. We get a lot of folks asking, how do I you know, get into the field. And I mean, there's a perfect example where they well, I mean, they might've looked you up, but let's assume they did it.
I mean, they, they might not know your college degree or whether you went to this bootcamp or that bootcamp, but you're providing real value to them. And you knew they knew that, that you knew what you were doing. And so they reached out to you and got that process going. You know, that's a real sign. I think it's an inspiration for people out there who want to get into the field of you just get in there and start using things and be a part of the, you know, vocal part of the community. And that can go a really long way.
[00:19:37] Jaxon Repp: Yeah. I think one of the things I've noticed when I, when I work with newly onboarded employees is they're very much two types. The ones that need to wait to be told what to do and how to solve a problem, or they, they always have questions. And then the ones that bring me two solutions to a problem they've encountered and they've googled them or Stack Overflowed them and found them and say, I don't know which, but both of these would solve the problem all day long. That makes me like, do a happy dance inside as opposed to the other. And, and likewise, when you're working with a product, I know that they don't want their documentation to be outdated.
Nobody wants that. I hate documentation, but I was willing to do it because I needed that Postman collection to work for me anyway. So as soon as I got it to work for me, I'll just send it to them. And that way nobody else has to have that problem. We have enough problems as programmers.
[00:20:30] Jason Gauci: That's right.
[00:20:31] Jaxon Repp: Let's not let them fester out there in the world.
[00:20:33] Jason Gauci: Yeah, totally cool. Yeah. That's awesome. And so, yeah, that's great. So you were, you were using HarperDB as part of this IOT project. You're communicating with them and he said, wow, this is actually a really cool piece of technology. I'm just going to, I want to kind of go there full time.
And you also, you, you had the virtual thing. Yeah. So let's jump into how do people write code for the Edge, you know, and how is that different from you know, building a, a regular server, like in PHP or something like that? Like, what is, what makes the Edge environment different to work with?
[00:21:10] Jaxon Repp: It's a very good question, because based on our previous, based on your previous question about what is the Edge, I think that's changed for me a lot, because for most people, I think who've been doing this for a long time, because lots of people have been programming at the Edge and that's, you know, microprocessors written in low-level C and, you know, basically super, I mean, you guys were rocket scientists, right?
And you worked on things that pointed to space. So theoretically you've worked on extremely resource constrained devices. And that always felt to me like what Edge was. Edge, it doesn't do a lot. But it's very, purpose-driven, it's not super dynamic. And to be honest, once you put it on that board, it's never going to change.
I think resources have changed the Raspberry PI kind of opened everybody's mind to what was possible and that you've got Arduinos and all of these little things where you can, you can add your custom code and even updated over time to continually adjust to changing workloads.
[00:22:09] Jason Gauci: Yeah, really to double click on that our car connects to the internet.
I mean, it's not a fancy, it's just a Honda Odyssey, but it connects to the internet when we get home. And one day we were out driving and I stopped at a red light and the car shut off and I, I panicked, but it turned out that this was just a new feature that had rolled out where the car literally turns off when you hit a red light.
And then when you let go of the brake, it turns itself back on. And I guess that's somehow more economical, but it just randomly happened. So, so we've kind of moved from, you know, when Patrick and I were doing, we're doing embedded work. You know, there'd be this firmware update and you'd have to carry a briefcase with, with a laptop in it and a cord and you'd go to the site and you'd plug it in and update the firmware.
And it took thousands of dollars for you to fly across halfway across the world to do that. And now my car just does it. I don't even know. Right. I mean, it's just so different now. It is.
[00:23:04] Jaxon Repp: I mean, I do feel like maybe they should send you an email telling you your car is going to shut off starting Friday.
[00:23:10] Jason Gauci: You would think, right. It just randomly started happening.
[00:23:14] Jaxon Repp: Don't freak out, but this is going to start happening.
[00:23:19] Jason Gauci: Yeah. But it shows the dramatic change. And to your point, a Raspberry PI is, is also just a massive game changer because it puts it in everybody's hands. I mean, I needed a lot of handholding to do embedded work, having no background in C or anything like that.
And now with the Raspberry PI you can, you have an entire Debian at the Edge, which, which gives you a ton of flexibility.
[00:23:40] Jaxon Repp: And so I think as, as we look at where the Edge has moved, it's it becomes resource. And now, and you look at AWS Lambda and you look at you, look at our new feature, custom functions.
Realistically, JavaScript is the, is, and this, I don't want to start a flame war. I don't, I don't want to get you guys like a million downvotes, but JavaScript is an exceptionally easy language to learn. And to be honest, if sandboxed properly, it can be less, it can be as non dangerous as you want it to be.
And it can be as performant as you, as you architect it. So I feel like it's, it's certainly the future of, I think, at least Edge prototyping. I would be hard pressed to say that there is still not going to be a use case. Once you figure out what you want to do with data at the Edge to continually lower costs and perhaps solve that pro that hardware problem permanently, you are going to be on more constrained devices.
And maybe you're going to use something like a Kotlin that can compile down you know, and run in the JVM or, you know, something that's going to be able to function out there, but still has that, I don't know the mental clock cycles of the developer in mind and the ease of use the, I don't know, call it the usability, I guess.
I want it to be usable because I always call it when we're talking about collecting sensor data and doing workloads at the Edge right now, it's so new. Everybody's been talking about it. It's computing for years, but we are collecting so much. And I call it the Rumsfeldian challenge because we just, we don't know what we don't know.
So we better collect everything and obviously transporting that all to the cloud is not ideal. So we will solve this problem, but I think it takes a lot of experimentation in the very beginning. And you need something flexible for that. And so these, these wholly capable standalone Ubuntu environments, like a Raspberry PI are, are the ideal place for us to figure out what it is we're even going to do when we're out there.
[00:25:42] Jason Gauci: Yeah. I mean, I remember the Tesla AI day, I remember listening to Andrej Karpathy talk about the AI for the Tesla autopilot. And he was effectively saying, well, we just need to get enough data and then we'll be done. And so it's really a data problem. I think that the challenge, I mean, I agree with him in principle, but practically the challenge is some data's more important than others, right? So for example you know, if you're driving on a road here in Texas, that it just, there's nobody on the road and the 70 mile an hour speed limit. And you're just going on a straight road by herself. That is a lot less interesting than you're part of a 17 car pilot.
Right? And so that second thing doesn't happen very often, but when it does, it's, it's really important to collect that data because you want every single time that happens, you want to, to learn as much as possible, right? Anytime there's a black swan event, you want to learn as much as possible. But as you said, you can't collect everything all the time or even half the time or even a tenth of a time.
So you need something that's smart, that's saying, you know, is what's happening right now interesting. If it is then start collecting it, if it's not then throw it away. And that smart thing has to live on the Edge by definition. And so, and so I think that the differentiator, and I'm not an autonomous vehicle guy or anything like that, but, but just looking at the Tesla idea, I feel like the differentiator there is can they do smart things at the Edge that's going to make or break that whole idea.
And so I, I think there's probably a hundred other examples where, where Edge Computing is going to make or break a lot of the, the next generation of, of tech and of ideas.
[00:27:36] Jaxon Repp: I agree. And the classic example that I always talk about, I was working with the oil and gas industry of the turbines that are processing at the refineries.
They spin 20,000 RPM, and if something goes wrong, it goes really wrong. And those things shut down and it's, at peak natural gas prices, it's a million dollars a day that they're losing.
[00:27:59] Jason Gauci: Wow.
[00:27:59] Jaxon Repp: For just one turbine. Right? So it shuts down and it's not able to refine it. They're losing a million dollars, a million dollars a day, or in some cases a lot more than that.
And, and they were collecting data and pushing it into an old school data historian, you know, from sensors and their resolution was every five seconds. And you can look at the data points leading up to a failure and you can say, well, there's probably something there.
[00:28:24] Jason Gauci: Yeah that's right.
[00:28:26] Jaxon Repp: And, and they had one guy, they introduced me to the one guy and he comes in and he looks at it and he's like, well, yeah, here's what happened.
And everybody else in the room, we were all looking at the exact same screen and we had no idea what he was talking about. And he's like, well, I remember once in this other place, you know, 20 years ago I saw something like this. If we didn't have sensors back then, but it was a lot like this. And he just had all of his tribal knowledge.
And he was 60 and he desperately wanted to retire, but he could not, he would get called up in the middle of the night and have to fly out somewhere in the world to analyze something that had gone wrong. And so our mission in this consulting project with this company was to provide, you know, five millisecond resolution, but you cannot record that and put that all into a historian because you're going to overwhelm that period.
So we built a system that basically kept a rolling buffer of five minutes and would wait for an anomaly to occur would wait through that anomaly or shut down if that's what happened. And then capture five minutes on the other side, wrap that up into a package, put that into a local Edge instance of our product, which would then be transported up to the cloud for analysis.
So the ability to understand what an event is. What data led up to it and capture that in a higher resolution when you're just normal, steady state analysis of five second resolution where you're like, yeah, as long as the line is flat, everything's great. But as soon as that line starts to move a little bit and infinitesimally so, and you're talking about vibration at something that's spinning at 20,000 RPM, it's important to know all of those fluctuations and important to be able to look at them because not everybody has that tribal knowledge to understand that, you know, when it goes up by a fraction, a down by a fraction, every five seconds to what you're really looking at is very real problems with vibration in the 24 hours leading up to this thing, flying through the wall and ruining everybody's day.
[00:30:28] Jason Gauci: Yeah. Yeah. I think the that problem of, you know, am I seeing something interesting, it's like a really phenomenal, I think it's a type of an active learning problem. And so for something like, so I definitely, you know, I think, you know, JavaScript is a actually I truly enjoy writing TypeScript, which compiles down to JavaScript.
I feel like it's a solid language. Do you think we'll get to a point where the Edge will be language independent? Or is there something about the Edge where, you know, if you were to add more languages, it's just a lot of work. Like, is there something like, could you describe a little bit, like, what's this machine VM, or what's the sort of box that runs at the Edge in terms of software and what actually is going on there?
[00:31:17] Jaxon Repp: Well, I mean, it depends if it's, what we've found is that the opportunity in POC is, is that you can put anything out there. It could be any language and the box doesn't need necessarily to even survive the elements. So you're not really hardware limited until you go out of proof of concept and into production.
Right? So, Once you, once you figure out what your functionality is, then you start to look at what is the cost effective hardware that can be, that can be run. And how do I replicate this functionality out there? And from a resource perspective, will I have the benefit of, you know, an OS that supports Python?
Correct? Could I run my statistics in a Python script or, or could I use JavaScript because I want to do a bunch of API calls, the third party resources, or, you know, what language accomplishes my goal for the proof of concept may well be different than the language that accomplishes the goal in final production.
So in the same way, TypeScript compiles down to JavaScript, Kotlin, you know, compiles down to something that runs in a JVM. I'm sure that somebody very smart, you know, is, is going to take some super easy language that hasn't even been invented yet, but that my children are going to learn how to write that will compile down to C and ultimately.
Be able to be shipped off through chip as a service. And you're going to send me the, the thermostat program that my kid wrote to keep her room cooler, like, you know, in the middle of the summer and adjust thermostat automatically. So I feel like I want to believe that, you know, language, the language shouldn't matter that the programming languages to whatever end there are, you know, tribes that, that adhere and fight desperately for one language over the other that probably all eventually goes away.
And we all just drag and drop some boxes onto a screen and say, that's what I wanted to do and go make it happen where I want it to happen. Containerization is obviously a huge movement and it has been in the cloud for applications. We see certainly in these smaller Edge data centers, everything is containerized.
They are only pushing containers out there and nobody's installing on bare metal. And, you know, even at the Edge, out of the vineyard, we are running, you know, little Docker containers and those Raspberry Pis. So it's entirely possible to be, you know, a Kubernetes cluster, you know, pushed out to the Edge.
K3s is a great really, really minimalist container management platform. But again, I don't know how, how the code gets out to the Edge. And to be honest, I try not to care. I try to work on how does the part of the puzzle that I'm working on? How does it make everybody's life easier rather than more of a problem?
[00:34:08] Jason Gauci: Yep. Yep. That makes sense.
[00:34:10] Jaxon Repp: I'm trying to make it just work and somebody with a lot more money and a lot more time and who hasn't spent as much of their life banging their head against the hardware problem is probably going to have to solve that one. Cause I think I may have given up on it.
[00:34:23] Jason Gauci: Yeah, that makes sense.
Yeah. I think maybe, you know, it started as a, I believe a lot of these Lambda functions started as, as sort of JavaScript kind of front-end servers. And so, and so you have the browser running Java script. And so I think maybe it's a starting point that a lot of this, the CloudFlare Edge I think is JavaScript only.
And that's probably just because of their pedigree, like where they came from and their inspiration. And, but to your point, it's, it's, it's all run on VMs. And so it's just a matter of time before they'll say, look, you can point us to your Docker hub location that could be running just about anything. And then just really open it up.
[00:35:03] Jaxon Repp: It teaches you a lot as you, as you start to move into real life deployments, where containerization is the standard, it also teaches you how important it is to build a good Docker image. Cause that's one of the things I feel like, I feel like Donald Knuth's Axiom about premature optimization is one of those things that can absolutely kill a company, but man, if you're gonna spend it anyway, A good Docker container is, is key.
I think when I, when I got to HarperDB or Docker container was 350 megs and I was like, it feels too big. I mean, it literally feels like it might be too big given that our actual installer or actual installed binary is under a hundred and all we needed was, you know, Node.js I feel like we could do, we could, we could do this better.
So we spend a lot of time, just recently actually, with our new release, working on that and making it, what were, I guess the industry terms, a first class citizen, because truly, I think containerized applications and workloads, and to be honest, dynamically distributed by, you know, large service providers that provide rapid access to your application on demand.
They don't want your Docker container running out on their Edge servers all of the time. They may only want to, you know, follow the sun. And right now the best framework we have for that. You know, a Kubernetes cluster that they can shut down and spin up and, and have access to persistent discs, certainly for data storage. But a lot of it is ephemeral.
[00:36:31] Jason Gauci: Yeah. Yeah. That makes sense. Yeah. I had an issue recently with AWS Lambda where I think Lambda can only be 200 Meg that's their limit. And so I wanted to run some machine learning and, and as anyone knows, who's tried to install PyTorch or TensorFlow or any of these things.
Like you type, you know, pip install PyTorch, and then you get this, this you know, in console progress bar telling you you're downloading like 900 megabytes. And you're like, what? You know, but it's just, I think it's because it has all these different optimizers, you know, if you're running on Intel hardware, there's this thing called MKL.
Which is some kind of linear algebra thing and a, if you're running on the GPU, they have that. And so it ends up being this massive thing that really can't be, I don't know how to decompose it. And so I think I ended up getting around that with some elastic file systems and a Lambda function, you know, mounts this file system that Amazon is just holding onto for you.
And you can have a bunch of Lambda functions all using this. But yeah, I think you, you start to hit a lot of limitations for good reason because you're, you're fanning this out now. It's not just some server that could be über powerful sitting in, in the Midwest somewhere, but you're, but you're fanning this out to many, many different nodes, potentially all over the world. And so that just creates a lot of limitations that people might've not had to deal with otherwise.
[00:37:55] Jaxon Repp: It creates a tremendous number of limitations. I mean, it also creates a lot of opportunity for the challenges around logistics. And that's the other thing that Kubernetes, you know, for better or worse is very good at it's like I have an Atom and I want this Atom to do some work and I wanted to do some work here across all of these places.
And it's very easy to scripted and it's very easy to spin it up and it's very easy to spin it down. And that is, that is truly, you know, as the container sizes get smaller and as the edge compute resources become more powerful, you know, you're just going to continue to push out. I don't see any change in containerized architectures coming because I can't imagine a better, more atomic way to send out a core piece of functionality than in that container.
I mean, I would like it to have less overhead. Sure. What I like to be a little less complex, yes. But it does a great job. And that's why dev ops people are so angry all the time.
[00:38:57] Jason Gauci: Yeah, that's right. I think, and you could, please fill in the gaps here, but I think the way that the container system works is it's kind of like a, you start with some base image and then it keeps track of all the changes to it.
So if you, if you you have some Docker script that says, download HarperDB and unpack it and download Node.js and install it, you know, that, those commands are run starting from some frame of reference. Maybe it's an Ubuntu install or something like that. And so you don't have to actually copy the Ubuntu install because that's sort of your base image that everyone has agreed on.
This is the Ubuntu image. But you're copying over basically what you've done to that image. And so I guess, and walk us through this, but like you're shrinking the Docker container in this case, I guess. Does it mean doing less things to the base image so that, so that there's less to keep there?
[00:39:52] Jaxon Repp: Well, there's a process called like a sequential build where you could bring in the Ubuntu image and then you install Node.js, but all you really need is Node.js because on top of Node.js previously, the only prerequisite for HarperDB, you install HarperDB.
And so rather than carry the whole Ubuntu image, because again, these are going to be installed over a Linux OS, right? That's what's running Docker or Linux subsystem on Windows. So you've got Linux. You don't need all of Ubuntu definitely need Node, you definitely need Node.js because we require that.
So ultimately you want to install a no JS and there are Node.js based images. And then you can install HarperDB, be on top of that. And then in our case because we persist the disc and that's, you know, we're not just reading from from inbound streaming data. We need to persist something on your data and your config.
We, on container start will the first time install, reach out to that persistent disk set up all the files that we need set up your config, set up your data store. setup your data files and then ultimately that becomes your install. So there's, it's not plug and play cause otherwise we wouldn't be able to persist any data, but it is as quick as it can be that first time.
And then if it were to shut down and start up, it will look at its persistent disc it's it's file mount basically and say, oh, well, all of those installed files are there, so I'm good. I'm just gonna basically spin up the APIs that HarperDB has and wait for somebody to try to talk to me
[00:41:32] Jason Gauci: Got it. I see. And if it reads those files and it says, this is a version eight of HarperDB, but I'm on version nine and it has some migration logic and all of that.
[00:41:40] Jaxon Repp: Exactly.
[00:41:41] Jason Gauci: Got it. Cool. Cool. That makes sense. So cool. So let's dive into databases now. So we have, I think we've given a really good overview of Edge Computing. And so you, you kind of, you can see kind of how this can follow. You have all these machines running, let's say in the vineyard. And you know, they want to do things without having to phone home.
So they don't want to have to give all the data to some server, which could be a thousand miles away. They want to do some processing locally, a lot of processing locally, and then just send back, you know, the most important things. And so to do that and to coordinate that we need to have some centralized place where we can have information.
So to, to use the vineyard as an example, you know, maybe we want a centralized place where we store what we consider to be like anomalous temperatures and that could change as the season changes. And so we want to keep some, some information in all of these Raspberry Pis, so that they're all kind of on the same page.
And they can kind of make decisions kind of as a, as a unit, right? And so you went, what you ended up having to do is if you were to write this by hand is do a lot of message passing and anyone who's, you know, and I, I wrote maimhub a long time ago, so it's a, it's a peer to peer kind of video game thing.
Now, anyone who's ever done peer to peer knows how hard it is, you know, getting to Raspberry Pis to talk to each other. You know, unless they have a public IP address is, is super difficult. And just having a mesh network, even a mesh network of public computers is really difficult. So kind of walk us through like, what is kind of HarperDB, you know, how does it solve this problem? And I'm why, you know, why, why is it able to do what it does?
[00:43:32] Jaxon Repp: Sure. So HarperDB was built by developers, I love the phrase, by developers for developers. It feels like every product feels like every product is, is that really? Isn't it?
[00:43:41] Jason Gauci: Well, maybe not for developers, but definitely by developers. You got half of it.
[00:43:46] Jaxon Repp: Right? You gotta have that. Ultimately it was built to solve a lot of the pain points that we found when we were building distributed applications. So I know that I have workloads that I want to run on disparate devices. I know that I'm going to collect some sensor data. I know that I want to run some calculations on that.
I know that sometimes the sensor data comes in. More quickly than I can run those calculations. And sometimes it comes in less quickly. I need to persist that in some ways I'm writing it to a file or I'm holding it in RAM, except I lose power. And now all of a sudden I've lost all the data I was holding or, or I was able to make my calculation, and now I've reduced that stream of data to the every 10 minute running average that I really want. And I have that 10 minute running average, and now I want to transmit that 10 minute running average, the next, the next data point in that 10 minute running average off to the server, that's going to analyze all of the 10 minute running averages across all of my data sensors that I'm collecting, except I lost my network connection.
So now I need to build a buffer to hold that and oh, wait somebody shut off the power again. So I lost my buffer. So that's a giant pain, and HarperDB is designed to function and to push that data storage out to the Edge. So you can run your calculation or, sorry, you can collect your data from the sensor with app process, and then you can simply put it into the database.
Then you can have a, and it is persistent and it is Acid compliant and, and you know, that it's been stored. And then you can have a second process that will pull those things out and aggregate your 10 minute averages. And then it runs every 10 minutes and then it puts the result into a second table. And that table is now, now persisted.
And it is, it is there. And we know that we have it. And the fact that you want to now move that data over to say the cloud node for analysis, we have what we call clustering, which is not traditional database clustering, but we call our bi-directional table level data replication. So you don't have to replicate an entire database with HarperDB.
You can literally choose within a schema or a table, records, oh, sorry. What tables are going? Which direction I can publish it up. I could subscribe to say a threshold stable that might bring the thresholds for an alert down to the Edge. And then when I create my 10 minute running average, I can publish that table up to the cloud.
So my application gets a lot simpler because I only need to make a local host calls. I don't need to worry about network connectivity. I don't need to worry about holding an, a memory rougher. I don't need to worry about what happens if the power goes off cause I know it's persistent. I don't need to worry about, you know, wifi going out or, or whatever mesh network collapsing for a few seconds because somebody kicked a power cord it's there.
And when it gets plugged back in and HarperDB boots back up, it's going to say, oh, I've got these messages. Oh, I have not sent them. I'm going to send them now. And they'll send that. So really, if you look at what HarperDB does, it allows you to simplify your programming by just sitting there being on always on, always connected data fabric. So you can move your data wherever you need, and you can do operations on it and you don't need to move all of it, but ultimately it reduces your application code to just making local calls. So it also, to some degree allows you to fold down that box a little more because you don't need your application code to be making calls out to third, third party APIs.
You could have a cloud server making those calls, putting the results of those calls into HarperDB, and then subscribing those calls, the results of that data back down to a third party API call table. And so I don't need to make those calls from the Edge.
[00:47:33] Jason Gauci: Wow. That's cool. So, so how does the developer handle conflicts? Right. So power goes out. You say that the rolling average is X, but because the power went out, someone else got there first, and I think the rolling average should be, Y power comes back on and now you have a, you have a conflict. Is that, is there some API to handle that? Or is there something about the way the transactions are specified that there's always a logical way that that gets resolved? How does that work?
[00:48:02] Jaxon Repp: There is an, in most of the instances that I've kind of describing add node of HarperDB with, with a, with a piece of compute, maybe some sensors hanging off the side of it, there won't be a conflict because the rolling average that I'm talking rating is based on the sensors that I'm attached to this particular node.
So there may be a thousand nodes, but their rolling average is going to be, is going to be basically tied to their sensors. So, so in this, there wouldn't be a conflict. However, when you're looking at other applications where perhaps my Edge unit is not a Raspberry PI in a field collecting sensor data, but instead is an Edge node and one of those smaller data centers and a user logs on, and because of their IP address and their location, they registered to one and they input some data into a form. And that immediately is replicated up to the cloud, which powers the, the massive UI for the, for the core application. However, somebody else is elsewhere and they may have entered a number into that value a little bit after, but their network connection was a little bit faster and they get it there first.
So the way Harper to be handles that is we have timestamps and we look at a unified time server and say, this timestamp versus this timestamp, whoever last rider wins. And we can overwrite that, but there are oftentimes where that will cause a, a further conflict where you can get into that.
Ultimately it's a, it's an age old problem in distributed computing. And that is, you know, the conflict between data happening over a slow versus a fast network connection. And so our next chapter. Or a next version or maybe two versions away. Forget we're looking at, look, we're looking at CRD Ts, which are conflict-free replicated data types.
So they have a bunch more metadata associated with them and they can therefore make comparisons and say, and I know you wanted to do this, but I have to handle this transaction first, even though it came in later. So while I have persisted you, I'm going to unwind you and rerun this and now run you.
And the end result is going to be what was intended right now. You can do that with HarperDB, simply through simply through intelligent architecture. But if our motto is, it should just work and, you know, sacrifice, simplicity without sacrifice, which is our tagline. Ultimately we should handle that for people automatically.
So we always have an eye on that and we're working. We're new enough that we're working with customers and we help them architect these solutions because. Distributed computing is, is a, is a challenge for a lot of people that that is new to them. And so we are, we're not the subject matter experts.
We're not the only subject matter experts, but we feel like we have a good handle on how we can architect around some of the limitations of existing of existing solutions. And we're always looking forward to try to figure out what the best long-term solution is going to be.
[00:50:59] Jason Gauci: Yeah. I remember reading about this with Bitcoin, where it's called the double spend problem, where basically a two people are in different geographic regions, or maybe a better way of saying it, who are somehow far apart in the internet space can both spend the same money at the same time. And then it might take a really long time for that to get resolved. And until it's fully resolved, if someone actually executes that spend, then that now you've both been able to buy a coffee for the same price or something like that.
And again, I'm not a crypto expert either, but I think that what's going on there is. I guess like, there's just, it's becomes an outcome kind of popularity contest where there's this big battle over who is right. And eventually there's a consensus. And so I would imagine, yeah, it's something like if you're using HarperDB for a e-commerce checkout shopping cart or something like that, you might run into some of these issues.
[00:51:53] Jaxon Repp: Yeah. I think with the existing paradigm of a lot of, a lot of applications and serverless functions in general is you're going to push the API out further and further and closer to the edge because you want to get low response times. You want to, you want to get that request. But then for most architectures, at least currently, there's one master database, because that's how you solve that problem.
And it's a giant vertically scaled instance that costs hundreds of thousands of dollars a month sitting in Oregon. And ultimately you're going to overwhelm it with. You know, a thousand different servers running, serverless, you know, running your Lambdas that are all going back to the same place. And we did a proof of concept with a large social company.
And, you know, if you were in Buenos Aires and you hit their API, the ping to the end point, like the connection was almost instantaneous. Cause they had a Lander running in south America in a data center, the data that would come back, your friends list. So sometimes upwards of 11 seconds, because it was all the way back.
In, in Seattle, we ultimately realized that our benefit was we can handle pushing the data out to the Edge we can handle with our new custom functions, Lambdas that are at the Edge also. So you're, you're basically your data is right next to your Lambda that's trying to access it. And then we handle moving the data around and that data, the data that we move around and replicate to all the other instances in a globally replicated cluster of HarperDB is the transaction. It can be as large as the initial operation, but it can also be smaller because it might not change everything at the end of the day, so we can move less data around and we can move it on pipe that we control because we understand the internal IP addresses, which are going to be faster than traditional external IP addresses.
And it becomes a homogenous dataset with very, very low latency for everybody who's interacting on it. And now literally the only challenge that remains is to make sure that multiple actors acting on data at the same time are resolved correctly. So we are, we say we are Acid compliant at the node and we are eventually consistent.
So right now we can do, for example, financial services, right? We're not going to solve that double spend problem. But there are a lot of places where that's not critical. Social media is certainly one of those, but we're working on, we're working on a solution that would, that would make us able to solve that problem.
[00:54:28] Jason Gauci: Very cool. So we talked about kind of mining equipment and some of these like really a specialized environment. What about if someone is, here's a good example? What if someone just building a, yeah, an email app, so an email iPhone app, right. So, you know, they would want to have access to their emails.
Obviously the server has a copy of their emails. It's kind of caching, but it's also really more like a database. I mean, you can imagine someone wanting all of their emails on their device, right. And so it could, could someone use HarperDB for something like that, giving us more of like a consumer facing, you know, like on their consumer device running an instance of HarperDB, is that part of the, the sort of use space for that?
[00:55:10] Jaxon Repp: Absolutely. We don't, we need, we need Node.js, so we're not going to run on a iOS device. We were able to use UserLAnd, which is an Android app that actually installs a Linux subsystem, you know, a full Ubuntu copy, and we could run it there it's it was not, not a recommended implementation, but you certainly can do it.
You can get it running on an Android tablet. I built a vehicle telemetry app on an, on a tablet. It was completely self-contained and it would store local data in HarperDB. And then when the tablet came within wifi range of the office, it would then replicate that data into the cloud. And you would see the vehicle and its path and any violations from its, from its thresholds, like immediately represented.
So if it had cell service, it would be doing that in real time. If I shut off, you know, cell service, it would still collect that data still persist that data. And when it had the network connection, it would push that up. So it's, it's very possible to persist that data without maintaining that connection.
And I think I forgot what the literal question was.
[00:56:16] Jason Gauci: Oh, the question was, could you run HarperDB on an iPhone if you're building some app that, that needs a window of the data locally? You know, so imagine I'm building an email app. I go to airplane mode. I still want to see my emails. I delete a few, I come off airplane mode, it needs to sync all of that sounds, I would put that in the hard category in terms of being able to do that correctly, or I don't delete the wrong email or have a double delete or something. And so it'd be amazing if there was, and there might, I haven't done a survey on this, but it'd be amazing if there was some technology out there where I could just use some library and I would have some snaps, some not snapshots, but some slice of the data locally on my phone.
And they would take care of everything else, which it sounds like what Harper is doing. And then that's when you brought up the, the restriction around the Node.js and all that.
[00:57:08] Jaxon Repp: Yeah, and there are, there are pure like client-side JavaScript browser level, JavaScript libraries that that can accomplish a lot of what we do.
They'll they'll make use of index DB as an underlying key value store. We have an underlying key value store that we use called LMDP, which is lightening memory map database, which is extremely fast, very performance written in C, but obviously it's just key value stores. So it doesn't have, you know, all of the properties that you'd want and in a database SQL querying and indexing and stuff like that.
So we've built all of HarperDB's functionality on top of that. However, underlying that as a key value store. So could we, if we had unlimited time and resources, replicate all of that into just a client side library that you could include in a browser app, have it sync data down from, you know, a cloud and be completely performance, self standalone.
And if your browser on your phone then reconnected to a network later, execute the exact sort of synching that HarperDB does currently from say Raspberry PI or a smaller data center Edge node. Absolutely. You could 100% do that. And there are a few, there are a few solutions that do that.
The challenge is they maintain those subscriptions. Maintaining those subscriptions is expensive on the server. So continually synching that data back and forth and holding what is in effect, a socket open so that you can subscribe to a specific query from say, a server side entity is very expensive.
Subscribing to a table is a lot less specific because you're going to have a lot less individual subscriptions. It's not that. Once you start getting those query level subscriptions, it can become very expensive. MeteorJS was a great platform that, that did that. And it was built on top of MongoDB, and it looked at transaction log to figure out what real-time data needed to be pushed down, but it was incredibly resource inefficient.
[00:59:12] Jason Gauci: Oh, interesting. I was wondering, because I remember when MeteorJS came out, I did try the demo. I think we talked about, about the show years ago and it looked magical. Like it looked like, okay, well, you know, I have this slice of user data and I just want it to exist over here. And it just magically worked, but then it never took off. And it sounds maybe like, this is why, like, it just, it just at scale, it just fell apart.
[00:59:39] Jaxon Repp: It was, it was, it was magical. It was truly magical. It was just, as soon as that group started to move, to include other databases, they realized how incredibly challenging that. Because they integrated it so closely.
And so they ended up building an entire library that moved away from MeteorJS and ultimately became Prisma. That's what it was PrismaIO.
[01:00:05] Jason Gauci: Oh yeah. I've heard of that too. Yup.
[01:00:07] Jaxon Repp: Yeah. So that was the next iteration of that. That was the next iteration of how do we sync data between a client and a server and do that in a more efficient way and not necessarily overwhelm with individual subscriptions. And they're all great use cases and they are truly magical for users, but they become incredibly resource intensive. So we are focusing on, I'd say less the long tail of simplicity and providing the bulk of functionality we can within what we know to be the limits of data replication between every single client on, on earth and one central data store. Because obviously the other challenges, if I give you access to every single piece of data, then you could update that data and now I have a billion clients that are all trying to resolve, you know, who did, what, when, what was your network timestamp, you know, who came first? What's the right answer. And then you'll never get into financial services, which as you know, is where all the money is.
[01:01:10] Jason Gauci: Yeah, that's right. It close closer to the money supply. Yeah. So, so it sounds like the, the Harper sort of a center of mass for HarperDB is, is just using Netflix as an example, you know, Netflix wants to push its most popular videos to the Edge so that you don't have to go all the way to Los Gatos or wherever Netflix is data center is to to, to get that video.
Right. And so you can imagine all over the world, there's a ton of these like small data centers hosting, whatever the most popular Netflix video is. And so, so you have this cache and so people will go to the server, the server will say, oh yeah, I have that video to one of these super popular videos for your region. Here it is. Or, oh, I don't have this really esoteric video about you know, leopards or something. I'm going to have to go to the, to the main data center and go fetch that. But, but anytime you write any kind of logic or really do anything with computer, with a computer, you're going to want to keep some records, right?
You're going to want to keep track of how many people watched each video. And so now you could, every time someone goes to watch a video, you could phone home to the main server. But now you hit a whole bunch of other issues. As we talked about with that main server now getting bombarded with tons of requests all the time and it doesn't scale. So, so at HarperDB could do is sit on these Edge nodes, collect all of those statistics. So that, so that tomorrow, Netflix, know, you know, what videos are more, are the most popular tomorrow and it can keep that fresh. And then, and then all of that gets replicated as has all these machines are ticking up this histogram of videos. And then at some point, maybe at the end of the day, some someone or some process at Netflix can get a copy of this database that all these edge nodes are sharing and, and, and read it and learn some intelligence from it. I explain a use case pretty well. Or is there any?
[01:03:11] Jaxon Repp: You did, and I, I'd go one step farther to say, once the further to say you would run a, a, an AI machine learning model to actively compress all of the individual data points that, that maybe come through a Netflix UI or user experience. I might hover over a movie. I might watch the trailer for it. I'm only halfway through the trailer. If you've ever thumb through your Netflix queue, gone past, you know, a row of films and gone back up, you'll see cover art change for films as they try to like test different cover arts to see if you'll you'll click on that.
So a lot of these decisions are simply like, we want to try AB test this thing automatically, but at some point somebody is going to realize that there's an advantage to one of those covers versus the other cover. And we point that is going to become a policy that is rolled down to every single client. We're saying, this is the best cover for this. This is what gets people to click on this. Or based on this profile, we're going to show this cover and the demographic data that we've classified, and we're going to run a machine learning model that will basically classify all of our users into one of three archetypes.
And the cover art is defined by that architect. All of that happens at the Edge. None of that, you know, aside from larger, you know, aggregation or probably strategies for that knowledge is going to happen at the Edge. It'll happen in the cloud. But most of it, you want to have happen out there. Otherwise you run into the same problem everybody runs into before distributed computing was even a thing, which is my God, we need the server to be literally the size of the planet.
[01:04:57] Jason Gauci: Right? Yeah. Yeah. That makes sense. I think too, there is this study. I'm sure you're more familiar as I am, but there was some study that I think Google did this study back in like 2011 said basically for every millisecond it takes their site to load.
Their product gets hit in some significant way, or maybe it was every 10 milliseconds. And so, and so there's the real economic advantages and it's, it's one of these things that's probably innate, or it's probably a subconscious, you know, you're not sitting there looking at your watch saying, oh, this was 80 milliseconds of out, but, but subconsciously like the product gets hit hard. Every 10 milliseconds, it takes two to return a result. And so, so anything you can push to the edge, just real turn into like material dollars and cents.
[01:05:43] Jaxon Repp: Exactly. And I mean, ultimately we, we, we say that at least in gaming and in computing, the 16 milliseconds is what the human being can perceive as a delay. So you, you want, you want it to be down at 16 miliseconds. Like I mentioned a case study earlier where users in, when a Sarah's were spending. A few milliseconds connecting to a local API, but then data would take anywhere between 300 milliseconds and 11 seconds to bring back a friends list. And when we started running our tests with our custom functions and the data, which had been replicated out your friends list, doesn't change all that often, but we'd replicated the data out right to where the end point was.
And you were seeing response times of five to 10 milliseconds, which, you know, I, we knew that underload, we would see that push, but our objective was under a hundred milliseconds and we, and we beat that easily, which you were never, ever going to do if all of the data is still lived in Seattle.
[01:06:44] Jason Gauci: Yep. Yep. Totally. Makes sense. Cool. Yeah, I think we covered a ton of really good material here. I think we, we, we opened all the bookmarks, which is good. Let's jump into HarperDB as a, as a company. So, so what's something that. Kind of unique about HarperDB could be the way you play in your off sites. It could be the layout of the office or what's something where, when you, when you showed up at Harper or maybe through your courses, through your tenure, there has really made Harper stand out in terms of, you know, the work environment.
[01:07:19] Jaxon Repp: Well, I think if, if you, if you go to the site, you'll see our logo is a dog. Harper is actually our CEO's dog's name.
[01:07:27] Jason Gauci: Oh, wow. Okay.
[01:07:28] Jaxon Repp: All of our, all of our demo datasets, if you go to our postman collection, if you go to docs. Harperdb.io, you'll see, like we have a ton of demos and our demo datasets are all the dogs owned by people in the office and then a breeds table.
So you can do a join of those datasets. So all of our demos are based on the concepts of dogs. And at the end of the day, it's it's about, you know, somebody who was hopelessly loyal to you. Always there and ultimately. They, they make your life better. And so if that is the driving architecture of every employee, we hire every, every feature we look at on our feature of boat board and say, do enough people want this?
Is it, is it going to make people's lives better? And a lot of us are multi-disciplinary, you know, software guys. So we've seen lots of problems over, over time. And, and to be honest, this product was built to solve problems that the founders were having in specific application at their former company. But they solve a lot of problems that I've had to, and, and there's no, there's no limit to problems you face as a programmer.
And we call our approach, ultimately collapsing the stack. So we have now effectively Lambda functions or old school, you might call them stored procedures, but they're written in JavaScript and they're super easy to deploy and they make your life easier and better. And hopefully. You can spend less time working on that and more time outside playing with your dog, which is all they really want.
[01:08:59] Jason Gauci: Yeah. Do you do you let dogs in the office? This is a great debate. You know, I've, I've worked at places where dogs were in the office. I never had an issue with it. Definitely. Some people didn't like it. And I've worked at places where dogs were banned and people really didn't like that either. What's, what's, Harper's take on dogs at the office.
[01:09:15] Jaxon Repp: Well, when we had an office, oh, that's true too. When we had an office, dogs were up there. Absolutely welcome. We did, the irony is that Harper was not a nice dog and Harper was like the only dog. If Harper wanted to come to the office, no other dogs can come in the office, but otherwise you could, like there was, we had, we talked about conflict resolution replicated data types. Ultimately there were also conflict resolution, dog types where certain mixes of dogs were allowed in that that dog was gonna come. We definitely knew that you can't bring this dog because they will not get along.
[01:09:50] Jason Gauci: Yeah. You need operational transforms for dogs. This dog, this person has to get transformed across the hallway or something.
[01:09:57] Jaxon Repp: Right. The SQL query where not in.
[01:10:00] Jason Gauci: Yeah, that's right.
[01:10:01] Jaxon Repp: Where, dog's not in disagrees with this dog. Yeah,
[01:10:05] Jason Gauci: I think in the last episode we were talking with the CEO of Pinecode, which is a database that does vector, a vector arithmetic database. And Patrick is bringing up our trees. And I think in this, this would be a perfect example where we could have you know, rectangles for each zone of influence for each dog. And if we get an overlap, you know, that throws an alert or something.
[01:10:26] Jaxon Repp: Yes. The, the, the Venn diagram of dogs that don't get along. And it's just a circle.
It's just, we can't put all these dogs in one room. It's just too many dogs.
[01:10:39] Jason Gauci: So, so, okay. So it's distributed. So, you know, are you hiring sort of interns or full-timers and where are you hiring? What kind of people are you hiring? You know, kind of walk us through. Yeah. People could definitely, I'm sure you have a careers page and people can check it out, but just a sensibly at a high level, what are, what are you looking for? HarperDB on the engineering side? What kind of persona are you? Are you looking for?
[01:11:06] Jaxon Repp: We just went through a, we just had our first hiring round in a couple of years. We, we built out all of our core functionality and now we're ready to, I think we got what I'd call the first versions of this, where we're figuring out what is the thing's supposed to do.
How is it supposed to work? And. What is our technical debt leftover from that learning process. And we've cleaned that up. And so now we're out there looking for a new full-stack developer and we were looking for a designer and then an infrastructure developer, because we're finding that the bulk of the challenge, once the product is sound, we want to increase the size of the team.
That's helping build cool new features, but right now our features make it super easy to deploy. And we think it meets the needs of most of our customers. Now it becomes the services layer that we, that we put in place to help big customers solve their architecture problems because distributed computing is, is a new paradigm for many of them.
So an infrastructure developer, somebody familiar with, you know, taking a Kubernetes cluster and extending it across public, private clouds, figuring out how to make it work with Edge devices and script all of the inter-node connectivity. So we're looking for obviously very smart people in DevOps, a full stack software engineer. Node.js is what we're written in. So Node.js is, is a prerequisite there. We are also not solving, I wouldn't call them traditional programming problems. We are, we're in a very, very specific space. So we don't, we're not, we're not looking for extremely experienced programmers. We're looking for people who sort of get it and understand the goal is to build something that's a joy to use.
And as such, there might be a little more heavy lifting on our side, so that there's a little left, a little less heavy lifting you know, on the, on the parts of developers who are using our product. So to that end, we really, really like to be able to take somebody in and make sure that they care what the customer thinks.
Cause there's, there's a lot of developers who want a functional spec and they want to build out, you know, according to code and then they want to, they want to check out and IMET the objective and I'm like, guess what? The objectives, the objectives are gonna change every day. But there's one core and that's, it'll only change if it makes it easier to use more stable, smaller, tighter, faster, whatever.
And the other part is your free to bring silly suggestions to the table. Just as much as our CTO or myself, or, you know, our director of marketing who's out there on dev to o, you know, and reading all the articles and all the feedback on our blog posts. And he's like, you know what, everybody hates this thing.
Like, they talk about how we don't solve it, but nobody solves it. And everybody hates. And maybe we should look at that and that's just as valid and as an idea as the idea that, you know, our clustering engine should perhaps change to something written on a lower level language. So that it's faster.
[01:14:17] Jason Gauci: Yeah, totally makes sense. Yeah. So, so the job isn't just you know, inverting the binary tree or, or solving some really tricky you know, dynamic programming problem or something like that. That's not actually the job that might be something you have to learn as a rite of passage, but it's not the job.
[01:14:34] Jaxon Repp: Yeah. I mean, you'll totally do this nice.
You'll totally 100% do that, but we've put, we've, we've got, we've got so much of the core written that at this point, our time to data model and our indexing and all the things we do are really, really solid. I think we'd love somebody to become familiar, familiar enough with it that our CTO could take a day because that'd be nice every once in a while.
Right. But I think the other, the other part is just the flexibility to say, I don't know the answer, but also nobody knows the answer. So let's figure out a way to write it two or three or 10 times, test it all and figure out what the right answer is right now. One of the, one of the things I realized that the, the most authoritative paper on resolving conflicts in distributed computing was written in like 1984 by a woman at Microsoft.
Like that's the, that's the paper that all of the articles eventually go back to her cite as like the primary influence. And we've known it was a problem for a very long time. And we still have people still end up at that article because we have not solved that.
[01:15:47] Jason Gauci: Right. Yup. Yup. Yeah. I don't know if that is the same as Paxos I've I've heard the name Paxos a lot. I think that's some way to do I think leader election and solve conflicts at that paper, at least in my circles seems to come up a lot, but what everyone tells me and I'm sure you've seen this too, is it's great in theory, everything's great in theory. And then in practice, you have to find out the right corners to cut so that something doesn't take three months to to be consistent. And also on the flip side, doesn't have massive errors. And so it's playing that game, I think is a question of what do the customers really value? I think at the end of the day is what really matters.
[01:16:26] Jaxon Repp: I think, I think somewhere, somewhere down the line, the idea of raft consensus, leader election, all of that will, will fade away and the data itself will contain bits of metadata that allow you to have a leaderless distributed system. So inherently all of the information you need to know about what you need to do is present for each node to execute rather than having a central broker, that kind of a direct traffic, because, because you could have a cluster or, you know, two liters or fail-over, whatever, but inevitably it's going to be a single point of failure if you wait, if you wait for that, for that one person to make that decision, right, you're going to have lots of people doing things and it won't scale. So it is my thought that ultimately it will be able to decide in a deterministic manner by itself just based on the data itself. And it has some interesting applications down the road for quantum computing, where, you know, you can make probabilistic, determinations across massive datasets.
And obviously databases are supposed to be deterministic. And there's a lot of debate about whether or not quantum computing could ever be used for data persistence or data logic. But I think there's a tremendous opportunity there to find the lowest energy solution or the probable lowest energy solutions for a query. It just requires a lot more cubits than we have right now, but 10 years down the line, man, my patent's going to be awesome.
[01:18:08] Jason Gauci: Yeah. And due to the point you brought up earlier, I mean, there's, there is a double spend problem and there are like very specific niche cases where where, where you do have to spend that time and you do have to have sort of that perfect answer, but the vast, vast majority of the time you don't, and, and so in, in all of these instances, you can use Edge Computing.
You can use, you know, things like HarperDB and, and, and all these like Edge Computing services that kind of make it easy to, to deploy to the Edge, Docker, all the things we talked about will be extremely, extremely important. And then that one time when you actually click the checkout button that time, you know, it can go to the server and take a long time.
And people kind of expect that expect, okay. If my credit card is going to get charged, I expect to wait a little bit and you can kind of have the best of both worlds just by being smart about when do I use A or B and, and both of them are extremely, extremely important.
[01:19:06] Jaxon Repp: Exactly. It's the, it's the challenge that you want, you want it to be as fast as possible, but not too fast.
[01:19:14] Jason Gauci: Yeah, that's right. Yeah. Yeah. As fast as possible without hubris, right?
[01:19:20] Jaxon Repp: That is, that is, that is the Sisyphean struggle, right?
[01:19:23] Jason Gauci: Yeah.
[01:19:23] Jaxon Repp: We pushed it. We push the rock up the hill every day.
[01:19:26] Jason Gauci: Yeah, very cool. Cool. So let's jump into how people can reach you know, you and how people can learn more about HarperDB and what are some good resources for folks out there.
And alongside that, you know, we have a lot of folks who are in university who love to try out different technology and, you know, is there a way they can try out HarperDB for free you know, is there, is there a permanent free tier or, you know, what are some of the opportunities for them to be great to kind of cover some of those, those bases?
[01:19:55] Jaxon Repp: Sure. Our, our URL is HarperDB.io. On there, we have a docs tab which will teach you everything you need to know from getting started. You can install HarperDB locally. It just requires Node.js and NPM. And you can just NPM I G HarperDB, super easy. We also have a management studio, which is, even your local instances, because obviously your browser is capable of making local network connections.
You can manage local and cloud, and other instances that you might've installed through our studio that allows you to connect it to each other, set up intra enter note data replication at a table level hub and sub as well as our new custom functions feature, where you can hang your Lambdas, basically off the side of HarperDB and its own API point.
So you can not just use our operations API, but set up something with third party authentication that makes a query, perhaps insert some data, then runs another query, calculates an average and inserts it into a time series. It's a super cool piece of functionality that then you can package up a project, click a button and send it to any of the other HarperDB instances in your organization.
So it's very easy to deploy these as well. That system is backed by our own AWS hosted collection of Lambdas. And obviously if you've worked with Lambdas before, you know, that deploying them and writing them and getting them out everywhere is not necessarily always the easiest. So we, we took a note on that and we tried to make it easier.
We think we accomplished it. We do have within our studio, the ability to spin up Harvard DB cloud instances as our database, as a service product, which is your own EC2 node with HarperDB running on it. We have a free tier. There's also a free tier for the locally installed instances.
And you can effectively network all of these things together. Watch the data, move around. Run your local instance on a Raspberry PI collect some sensor data. Watch that get replicated up. It's really, really easy to set up a very comprehensive, distributed computing application in only a few minutes using a HarperDB and the studio.
[01:22:09] Jason Gauci: That's super, super cool. So folks at home, you can most people have a Raspberry PI. We've been, we've been telling people to buy Raspberry Pis for what half a decade or something. So, so you have a Raspberry PI, you have a computer, you can run HarperDB on the PI on HarperDB on the computer. And then whenever you, you know, insert foo into table bar, it just shows up on the computer, which is pretty cool.
I mean, there's a lot of really fun stuff you can do at that. If you want to have a Raspberry PI we're working on a Raspberry PI water fountain know to control a water. That's the latest project the family has been doing. And so we could just have a database, which is, which is just saying, you know, should, when should I turn on the water fountain or should I have it on right now?
And then from our computer, we could just, you know, change, change a value in that database and boom, the water fountain shuts off. So, so there's a whole bunch of really fun stuff you can do at this. And then as you learn that technology and you go to a company that you know, is moving a lot of bits and needs to do things at the edge for all the reasons we talked about, you'll have that experience. You'll be ready to go. And you'll be, you'll have sort of a leg up there.
[01:23:17] Jaxon Repp: Absolutely. I wrote a thermostat program for my own house. I have old fan coil units that either have hot air going through or hot water going through them or cold water going through them. And depending on that, based on the temperature, if you want it cold or do you need to know what the temperature of the water.
Because you don't want it to just turn on. So you need this piece of data. And I wrote it on a Raspberry PI with a, with a little seven inch monitor on top of it. And it runs Harper to be that stores it, and it does a little predictive, you know, temperature curve. It does third-party calls to the weather service. It will turn on the cooling, if it, if there's cold water in there and it knows it's going to be hot later, it may turn on the air conditioning a little early. So it's a super easy and simple, the simple system that then actuates the power button on any given tranquil unit in the house based on which window is that facing. And does that room face, and is it time for it to be colder in here now? Or can I wait until later in the afternoon? It's, it's a very, very simple proof of concept, but it's one that, you know, a commercial thermostat was never going to meet my needs.
[01:24:22] Jason Gauci: Yeah. Wow. That's super cool. And yeah, and this way they can all talk to each other and they can all be aware of each other.
So if one of them is going, you know, all out, the other ones know that, okay, maybe the temperature's going to drop and they can kind of bootstrap off of each other.
[01:24:37] Jaxon Repp: Exactly.
[01:24:38] Jason Gauci: Yeah. Very cool.
[01:24:39] Jaxon Repp: Just trying to save two, $2.
[01:24:42] Jason Gauci: That's the engineer thing, right? It's like, well, I could purchase this product for $99. I could purchase a bug snag subscription for $10 a month, but I think I'll write my own and spend three years, you know.
[01:24:55] Jaxon Repp: Write it from scratch under the guise of someday I'll productize this and I'll get all that money back.
[01:25:01] Jason Gauci: Yeah, that's right. Never, never works. Never works. Cool. Jaxon was so awesome having you on the show.
I learned a ton on your Patrick and I've learned a ton about Edge Computing from you and really appreciate it. Folks at home have learned a bunch. If you want to reach to HarperDB, you're there on Twitter. We'll post a link to their social media. You can reach out to them if you built something cool, you know, at them on, on social media and show off what you've built.
I think they'd love to see that and I will also post the site and everything else. Thank you so much for coming on the show. I really appreciate it.
[01:25:37] Jaxon Repp: You're welcome.
[01:25:38] Jason Gauci: Cool. And for everyone out there, thanks for subscribing to us on Patreon and checking checking out Audible on our behalf. We really appreciate that. And we will catch everyone in a couple of weeks. See you later.
[01:26:08] Patrick Wheeler: The programming Throwdown is distributed under a creative commons attribution ShareALike2.0 license. You're free to share. Copy, distribute, transmit the work to remix, adapt the work, but you must provide attribution to Patrick and I, and ShareALike in kind.

More episodes

Chapters

Show Notes

What is Programming Throwdown?