Programming Throwdown

What are hash maps, and when do you need them? What are examples of hash functions? And can you bring a water calculator to your SATs?

Show Notes

In this duo episode, Jason and Patrick introduce us to the world of hash maps, from buckets and hash functions, to differences between open and closed addressing, to minimal perfect hashes and locality sensitive hashing. A familiarity with hash maps is an oft-overlooked but highly sought-after skill, and it can be a valuable asset for those eyeing a career in programming.

Along with the main topic, Jason and Patrick also talk about some of their latest interests: books, gadgets, tools and games.

This episode touches on the following key topics and ideas:

00:01:27 Playing games with Oculus Quest: Acron, Racket: Nx, Gorn, Superhot 
00:11:05 News: “I Made a Water Computer” by Steve Mould
00:14:56 colinfurze
00:15:52 News: Comprehensive guide to Attention Mechanisms
00:21:53 News: Starship SN15
00:25:18 News: MailSync now Open source (GPL)
00:28:34 Jason’s Book of the Show: Elon Musk
00:32:04 Patrick’s Book of the Show: Ready Player Two
00:33:40 Jason’s Tool of the Show: Datadog
00:38:44 Patrick’s Tool of the Show: I Expect You to Die
00:40:30 Escape rooms
00:45:39 Sudoku
00:48:35 Hash maps: the promise and idea
00:50:59 Hash Functions
00:52:34 Examples of hash functions: Cryptographically Secure and Non-Crypto
01:01:05 Load Factors
01:03:43 Open vs Closed Addressing
01:15:10 Minimal Perfect Hash
01:16:25 salts
01:19:00 Locality Sensitive Hashing

Resources mentioned in this episode:

Tools

Books
  • Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future by Ashlee Vance 
  • Ready Player Two by Ernest Cline

Gadgets

Games

Videos:

Articles:

If you’ve enjoyed this episode, you can listen to more on Programming Throwdown’s website: https://www.programmingthrowdown.com/

Reach out to us via email: programmingthrowdown@gmail.com

You can also follow Programming Throwdown on 
Facebook | Apple Podcasts | Spotify | Player.FM 

Join the discussion on our Discord
You can also help support Programming Throwdown through our Patreon
★ Support this podcast on Patreon ★

What is Programming Throwdown?

Programming Throwdown educates Computer Scientists and Software Engineers on a cavalcade of programming and tech topics. Every show will cover a new programming language, so listeners will be able to speak intelligently about any programming language.

Patrick Wheeler: Programming Throwdown Episode 116: Hash Maps. Take it away, Jason.
[00:00:21] Jason Gauci: Hey everybody, we're here with a duo episode. We've moved on to 2 episodes a month. I'm sure people have noticed by now, which is a really, really cool. And I probably talk about that too much, but I just can't tell you how excited I am that we have an actual producer and we can up the amount of content that we get out to everybody, which is really cool.
[00:00:43] Patrick Wheeler: Exciting!
[00:00:44] Jason Gauci: Yeah! So this episode, we're going to talk about hash maps, which is really cool. So before we get into the actual show, I just have like a pre-show, I wasn't sure if hash maps include hash tables or like, what the umbrella term is. I think hash maps kind of cover everything.
[00:01:05] Patrick Wheeler: I think they include, I know hash maps and hash sets are basically the same thing.
[00:01:09] Jason Gauci: Yeah, that's true.
[00:01:10] Patrick Wheeler: But then hash table? I mean, does that just mean you're using an array? I don't...
[00:01:15] Jason Gauci: Yeah. I don't know either. We'll have to all look it up. We'll look it up. But yeah, our show is going to be all about hashing and all the different things you can do with hashing and all the clever stuff that it unlocks. So we're really excited about that.
[00:01:27] And I actually, another thing I'm excited about is I got an Oculus Quest, one of the Quest 2s. And I think I've, I think in our very first show, I think I talked about, or maybe it was one of the first episodes, how I got a cell phone, and how that was kind of a big deal. And I was such a, or smartphone, sorry, a smartphone. I was a late adopter to the whole smartphone thing.
[00:01:48] Now I'm also a late adopter here, so I feel like I've been consistent. But I got a Quest 2. So I got Beat Saber and a lot of things that everyone knows about. And it's been fun. I've been exercising a lot with it.
[00:02:00] One of the things I was able to do that I didn't expect is, I've had a racing wheel. So it's like a USB wheel that actually clamps to my desk and it comes also with foot pedals. And so I'm really into racing games. And so I can play like F1 and Project Cars and Dirt, and these other racing games with the actual steering wheel. And I've had that for a little while.
[00:02:22] But I actually was able to play a VR racing game with the wheel and the Quest. So the quest has this thing called Oculus Link. And so you can connect the quest to the computer and use it, have your computer drive it kind of like the Rift. So you can use the Quest in this way.
[00:02:40] And so with that whole, like, Frankenstein kind of set up, here with the wheel and the computer and the link and the Quest, I was able to actually do a racing game with like physical controls and VR. And it was awesome. (laughter) Like, it was really, really fun. I was racing go-karts and you could actually, like, as you pass someone, you could look or if someone's behind you and you want to see closely, or you could turn your head and like, look at them and then turn back.
[00:03:05] It was really fun. And even just the feeling of like, you kind of like instinctively lean when you turn the steering wheel. And so when I lean my head actually leans and the view changes and it was, it was super, super fun. I have to admit it was a total blast.
[00:03:20] Patrick Wheeler: Yeah. I also got an Oculus Quest 2. There's some, like, I don't use Facebook personally, like nothing against Facebook. It's just, it's not something that I engage with regularly. So it was a little awkward that it's so heavily linked between sort of like Facebook, obviously, which you can go read about all on the internet and people have very strong opinions.
[00:03:38] Jason Gauci: Yeah. So I just set this up. I could talk to this. When I started it, it said, Hey, do you want to create a Facebook account or an Oculus account? And I said, I'll create Oculus account and try and keep things separate. So that way I can use Key Pass. I have a separate password, et cetera. So I went through like the whole signup flow and then at some point really early on, I think it's like I went to get my first game or something.
[00:04:00] They're like yeah, actually about that, you're going to have to link these 2 accounts and the one you just created is going to get deleted in a year. I was like, what? Yeah.
[00:04:10] Patrick Wheeler: Aside from that, I also have enjoyed it. My kids have really enjoyed it. We do it in small, small doses. And we cast from the headset to the TV, so other people can watch.
[00:04:20] Jason Gauci: Wait, what? Wait, wait, how do you do that?
[00:04:22] Patrick Wheeler: There's, there's an option, like a cast option on the main, main menu. So you can either from your phone, if you have it linked to your phone, you can just watch it on your phone or tablet. But then you can also send it if you have a Chromecast.
[00:04:34] I think it also might support the, what is the, Apple TV Airplay? I've not tried that yet, but yeah, it's a pretty cool experience. I actually have the tool of the show. I won't spoil. It is an Oculus quest game that I put as tool of the show before we decided to talk about VR at the beginning. So I'll leave that for them, but there's been a couple of games I really enjoyed being able to play and being immersive.
[00:04:57] One that was really different was I think it's called a crime, which is like a permutation of the letters of acorn, about a tree and not stealing squirrels. And the interesting thing is that the person wearing the VR goggles is a tree that can throw sort of, I guess like bombs and balls and whatever at squirrels that are running towards it, trying to eat the nuts that are at the base of the tree and take them back to their sort of base.
[00:05:27] But instead of having only bots do this, you can have other people on their phones download the game and take control of the squirrels. So they become the squirrels, and you take turns, or at least we do, we take turns, who's the tree, you have giant tree limb arms, and you're picking up these things and throwing them at the squirrels.
[00:05:45] And you know, the other people in the room are playing with you right there, the squirrels trying to come and steal your nuts.
[00:05:51] Jason Gauci: Whoa, that sounds awesome.
[00:05:52] Patrick Wheeler: And so, that's like, super cool. You know, just like an experience that you wouldn't otherwise have.
[00:05:57] The only kind of like downside that I've gotten is, one is like the battery life isn't super long. You can easily, taking turns with everyone, eat up the batteries. And then the second thing is you actually need a fairly large space for the games that don't have you stationary. For the games where you actually like move or walk around the space, which I find to be kind of like the most immersive and most cool. You actually need a fairly decent space that's clear of objects. So like a living room or something, you need probably a good, I don't know what they recommend, like 8 feet by 8 feet or 10 feet by 10 feet.
[00:06:29] But it's actually surprisingly difficult, at least was in my house, to like find such a space. And so we had to kind of like make an arrangement where we move some of the furniture to be able to have a big enough space to really fully enjoy it.
[00:06:42] Jason Gauci: Yeah. I was playing this game called I think it's called Racket RX, but it's, it's effectively like a, it's like solitaire, solitaire tennis, where you're surrounded, your encased in like a sphere. And so you hit the ball off the sphere at different points to hit various targets and stuff that are like overlaid on the sphere, right? And so you're, and the harder you swing, well, it's not, if you swing hard enough, then the ball will actually skate along the surface. So for example, if you hit it hard enough, then you don't have to, like hit the target directly. You just have to be kind of leading the target and the ball will kind of go through it.
[00:07:21] And so it kind of encouraged you to swing, to swing a little bit hard. I don't think the harder you swing the better it is, but you have to swing hard enough. And so I had drawn my guardian, which is what they call the sort of border you set up before you start playing. I had drawn it kind of right up to the wall and not really thinking about it.
[00:07:39] Patrick Wheeler: Oh no.
[00:07:41] Jason Gauci: Yeah, by the time the guardian warned me like a split second later, I'd like punch the wall. It wasn't a big deal, but, but yeah, you easily like kind of lose track of where you are spatially. And I was actually, that was one of the things that surprised me. I've only played it for a few days now, but surprised me was how often the guardian is actually warning me, like how often I get far away from where I was originally standing.
[00:08:04] I also got a game called Gorn, G O R N, which is a, it's like a Roman Coliseum. And there's a bunch of like weapons on the ground. And so you actually reached down to the ground, pick them up and, and you have to sort of fight other people in this Colosseum. And it is, I mean, all of these games are a huge, huge workout. I mean, I was sweating afterwards. It was a lot of fun.
[00:08:30] Patrick Wheeler: It feels like that movie, Wall-E, like all the people like riding around in their chairs and drinking their Slurpees. Like we are, we can't go outside and exercise, it's too much work. So we're going to put on these goggles and stay inside and exercise. It's just kind of, we're getting there.
[00:08:47] Jason Gauci: It's pretty wild. I mean, it's, it's yeah, it's interesting because, it's definitely more of a workout than playing like Quake or something. So in a sense, it's like a step in the right direction as far as not ending up like a person in Wall-E. (laughter) But yeah, you're right. It is, actually there is a, I saw a game in the store that said VR VR, where I guess you go around and putting on VR goggles in VR. (laughter)
[00:09:13] Patrick Wheeler: So there is, I think it's this, Superhot. So I think Superhot, which I believe is available.
[00:09:18] And we should say there's lots of VR goggles out there. I mean, these are, I think this one's gaining momentum towards being like, kind of more commonplace than just early adopters?
[00:09:28] Jason Gauci: Well, I think the Quest is the only one that's self-contained, right? At least that I know of.
[00:09:32] Patrick Wheeler: I think there are others, but they're not at the same performance. Like they were much cheaper ones further back, like years ago. Again, we're late to the party.
[00:09:39] Jason Gauci: Yup. Yup.
[00:09:40] Patrick Wheeler: So, in Superhot, the, which is an interesting game, they have I believe it's, there is a scene where you reach up and pull down VR goggles onto your head to use your controllers, to reach up and grab it and pull it down onto your head. And it's sort of that same surreal thing. Wait, I have on VR goggles and I'm reaching for VR goggles in VR to put on my VR gog-- ugh. (laughter)
[00:10:01] Jason Gauci: That's amazing.
[00:10:03] Patrick Wheeler: I look forward to all of these little cheeky puns in these games as people start adopting these more and more.
[00:10:10] Jason Gauci: Yeah. The thing I did just for the experience. Yeah, I don't think I would do it again, but it was actually pretty interesting was, I watched an episode of Family Guy in VR. And so you're just like, in this, I think you're in like in some kind of a tree house or something, whatever that environment is when you start the Quest for the first time. You're, you're in this environment and there's basically just a giant, enormous TV, that's just suspended in midair and is playing Family Guy.
[00:10:37] And there's something just interesting about that experience. Like, I don't think I would do it again. It's like you guys will just watch the TV of course, right? But there's just something interesting about that. Part of it too is, I was just surprised how comfortable it was. That really shocked me, actually. I think that having the strap that goes over your head and, and just getting all those straps tighten the right way, actually made a really big difference.
[00:11:05] Patrick Wheeler: I think it's a time for news.
[00:11:08] Jason Gauci: Go for it. What's your news.
[00:11:09] Patrick Wheeler: All right. My first one is a YouTube video called The Water Computer, or I guess it's called, I Made a Water Computer by Steve Mould. Which, I guess he's a fairly famous YouTuber. I keep coming across YouTubers who have like a million subscribers and I'm like, I never heard of this person before.
[00:11:25] Jason Gauci: Yeah. I wonder what the percentile is there. Like I bet there's, there's, there's gotta be over a hundred thousand, maybe.
[00:11:32] Patrick Wheeler: Everyone's probably groaning like, oh, of course. I already saw this video. I knew it. Well, I didn't. (laughter)
[00:11:37] Jason Gauci: I never heard of it. So...
[00:11:39] Patrick Wheeler: I came across this video by Steve Mould saying, I made a water computer, and he is doing it with another sort of like Math YouTuber, Matt Parker, which Matt Parker, I've watched his videos before, so I'm not sure, I watch all of them, but I'm not sure how I missed this.
[00:11:54] Anyways, the guy has an interesting mechanism for building logic gates out of sort of plastic and doing the computation, instead of electricity, you use water. And I decided it was pretty cool. So in college I took some amount of classes in digital logic, these kind of things about building up an entire CPU out of just gates, it's pretty common in a lot of electrical engineering or computer engineering curriculum.
[00:12:20] So not unique, but I always just come back to it as like a sort of very enjoyable way to, to kind of think about these like individual gates and how that they grow up, up, up, and eventually you're playing VR on a headset.
[00:12:33] Jason Gauci: So just to be clear like, this doesn't mean a waterproof computer. This is literally a computer that uses water instead of electricity.
[00:12:40] Patrick Wheeler: Oh, okay. So yeah, I was about to get, (laughter) thank you for making sure to clarify that. Yeah, so this is like a class of people who have done, I guess what you'd call like physical computing, where instead of having Silicon logic gates, or it used to be, I guess, vacuum tube gates, you actually make anything that by having 2 physical objects, and then if both of them are in the quote unquote, one state, then you know, and you have an end gate, then you get some sort of output.
[00:13:08] So I've seen them built of Legos, of a Tinkertoys, of now somebody with flowing water. These are always just like super fascinating to me. And in theory, I mean, there's some nuance to using water for doing your actual computation, but in theory, you could build up a fully functioning computer using this.
[00:13:27] I mean, it would get unreasonably large, I mean, modern silicon has an enormous amount of transistors in it. So you would end up, it wouldn't be practical, but it's just a fun, thoughtful way to see the kinds of things people are able to do.
[00:13:41] And so he actually builds something they can add to, oh, I don't have in front of me, 2 4-bit numbers or 2 5-bit numbers. And then the output computation is a given by water flowing through this series of valves. So if you've never seen it before, check that one out or just in general, the fascination around using physical objects to do--
[00:14:01] Jason Gauci: Yeah. So when you go to take your SATs and they tell you, you can't bring your calculator, you just be like, it's all right. I got this, guys. (laughter)
[00:14:09] Patrick Wheeler: Can you please define calculator for me?
[00:14:11] Jason Gauci: Yeah, I just, it's okay. I don't need a calculator, but can you bring me like 10 jugs of water? (laughter)
[00:14:17] Patrick Wheeler: Yeah. Can I have 55 gallon barrel of water?
[00:14:21] Oh, wait guys. It's still dripping. It's still dripping! Hang on! I need 10 more seconds! (laughter)
[00:14:28] Jason Gauci: Oh my gosh. Yeah. That brings back so many memories of, I think the SAT, you could have a regular calculator, but the ACT you couldn't have anything. I'm sure it's changed now.
[00:14:38] Patrick Wheeler: I haven't kept up with it, but I remember, yeah, there being like a specific list of approved calculators that you could use.
[00:14:43] Jason Gauci: Yeah, that's right. Cool. Yeah, that sounds awesome. I'll get, I'll give this a shot. Actually, you know, I was thinking about it the other day. A lot of the videos that are in my YouTube history are videos that you've, you've posted.
[00:14:56] Patrick Wheeler: Really?
[00:14:56] Jason Gauci: Yeah. This is yet another creator, I guess another way of putting it. Yeah, maybe not the history, cause that's, that's too shallow, but like the, like too recent, but a lot of the people I subscribed to are people that you found for me. Ah, so cool. I'll have to return the favor. Maybe next show, I'll do a... have you heard of Colin Furze?
[00:15:15] Patrick Wheeler: Yeah.
[00:15:16] Jason Gauci: Okay. Okay. And that's, that's one person that I found on my own.
[00:15:18] So just for people who haven't heard of him on the show, he does he just makes crazy things. He's like a mechanical engineer extraordinaire, and just builds all sorts of wild things. The most, maybe not the most recent episode, but a recent episode, he made a car waterproof, and the interior of the car is totally waterproof. And so then he filled it up full of water, and he had like a car that was also a hot tub.
[00:15:42] Patrick Wheeler: What? Okay, I've not seen this one yet.
[00:15:44] Jason Gauci: Yeah. You got to check that out. It's a convertible and it's a hot tub. And while you're driving, there's like water splashing out of the car and everything.
[00:15:52] All right. My news, one of my news is this really cool guide to something called attention mechanisms and so many to kind of explain and build up to this.
[00:16:02] But, as a lot of people know, if you want to do what people call supervised learning, which means someone gives you a label. So they say like, okay, here's a picture. This is a dog. Here's a picture, this a cat. Here's a picture of the dog. They give you a bunch of those. You can do supervised learning and you could train a machine learning model to predict the output. So you could give that model a new image and it will tell you whether it's a dog or a cat. Actually, it'll give you the probability that it's a dog, the probability that's a cat.
[00:16:33] Right, and so, you might have, let's say a table full of data in Excel or something. And then you have this one column and that one column in the table, and maybe sometimes it's missing. And so you want to try and predict the value of that column when you don't have it. And so you train a machine learning model to do that, and you would take all the other columns, you'd pass them in as inputs. That's relatively straightforward, and you'd have some kind of neural ne2rk or some kind of model to do that.
[00:17:01] Now, when you have images, that's more tricky because an image, every single pixel and really every channel of every pixel is a separate input. And so if you just use a regular neural ne2rk or some other model, it's just going to have to it's too expressive, too open-ended and they can't really learn effectively, they won't be able to handle different-sized images and stuff like that.
[00:17:26] And so for images, they've made this thing called a convolutional ne2rk. And so the convolutional ne2rk's kind of nice. Think of it as like a roving eye. So it's like a little eye that can look at a piece of the image and that eye kind of moves across the image, doing the same thing at all the different spots in the image.
[00:17:46] And so if you know how to detect a dog in one spot of an image, that same dog detector will run everywhere else. And so you could end up with the output could be, let's say another image, that's some black and white image and the more intense the image, the more likely there's a dog there or not.
[00:18:04] And that image is created by this eye that kind of like roves over the image, doing the same thing over and over again. And because you only have to learn that eye once, then it makes it practical to build these models.
[00:18:19] And so it's cool for images, but then you, you have this, you have like a lot of other modalities that you want to maybe learn from, right? And one might be language. So you might say, well, here I have a sentence. And I want to predict whether this sentence is a, I don't know, it makes people happy or sad. And so you have a bunch of sentences and then you have a bunch of, you have another column that tells you whether the person was happy or sad and you want to train the same type of model.
[00:18:45] But how do you do that with a sentence, right? It's not really clear, like how to, what sort of model or what sort of topology you need to do that. And the thing that's especially challenging about sentences is that they can go on forever. And other than the image, if there's a dog, the dog will probably just be in one spot of the image, right? Or if the dog is going to be over the whole image, you could shrink the image down, there's tricks you can do where you only have to look at, let's say a certain spot in the image.
[00:19:17] But for sentences, you can't do that, especially for certain languages and things like that, but even probably across all the languages, like the first part of the sentence might be really important when you're reading the end of the sentence.
[00:19:28] And so the sort of structure that they've designed to do things based off of language is called a LSTM, which stands for long short-term memory, and it's similar to this roving eye, I mean, it's, it's a different technology, but it's similar the sense that it uses a certain mechanism to kind of go over the whole sentence and be able to sort of build up knowledge as it goes through the sentence.
[00:19:56] So with all that groundwork, LSTMs are similar to convolutional nets that they didn't work for a really long time. And recently people have been able to get LSTMs to actually work and be really accurate and train models that are really accurate. It's through something called an attention mechanism.
[00:20:16] So the problem with LSTMs are, so this LSTM, think of it like a, like a accumulator, like a battery, right. And things are coming into this battery and knowledge is coming into this battery. And this battery is sort of storing this knowledge. And then when you finished storing the knowledge and the battery now has kind of everything in it, Now you can start draining the battery, draining that knowledge.
[00:20:41] So you can imagine if you're, let's say, translating English to French, you would store all this English knowledge and then you'd have another model that kind of drains it in French. And if in French you would have reversed the order of the words in the sentence, that's okay, because the battery kind of has everything, right?
[00:20:58] The problem that they were having was, as you store new things, they were kind of erasing the older things. And it's, it's very hard actually to kind of keep storing more and more and more words without destroying, like the very first word you put in. And so the attention mechanism is a way to, to prevent that from happening.
[00:21:22] And so the article gets a lot more detailed than that, but this, this guide is really cool because it goes through the whole history and explains it kind of with pictures. And, and so if you ever wondered how machine learning works on words, check out this guide. It's got a lot of good info.
[00:21:39] Patrick Wheeler: You make my news articles seem so shallow. (laughter)
[00:21:43] Jason Gauci: Well that's probably the deepest one I've ever linked. So a sample size of one.
[00:21:47] Patrick Wheeler: Yeah, that's super fascinating. Thanks. That was a great overview. I knew all of those words.
[00:21:53] My news article is another abrupt, 180 degree different direction, which is the successful launch and landing of Starship SN15.
[00:22:03] So we have like a little bit, I guess Jason was mentioning about being a late adopter to smartphones. We have a history as well of covering some amounts of space articles and making space predictions. This happened a couple of weeks ago while we're recording this. So it's, it's sort of old news, but I just want to celebrate, I think it's super cool that this is a Space X Prototype and they were able to launch it out of Texas up to, I don't remember where they go up to about 10 kilometers or something and then land it back just next to where it took off from.
[00:22:34] But just the whole thing flying in launching it, it, it looks so ridiculous, and to actually see them do it successfully, they've managed to sort of land before, but there was always problems of one kind or another. And so this is the first time they sort of took off, landed, no major issues, and in fact are looking to potentially do it again. I guess they're doing inspections and stuff.
[00:22:56] Jason Gauci: Oh, I was going to ask about that. 'Cause I thought they had done it before, but yeah, maybe there were problems. It might've looked like it landed, but something broke or something.
[00:23:04] Patrick Wheeler: Yeah. And there was also the, they did tests earlier, which were not as high and weren't, didn't have aerodynamic surfaces. So they did some like tests of the engines and stuff. This is the first time they sort of sent up what would end up being the final stage, like the orbital part of the thing. And now it looks like they're based on filings, they are thinking about launching up to orbit from Texas and splashing down somewhere around Hawaii sometime soon, which would require, I think them to stack 2 stages together.
[00:23:34] So this thing's huge. If you've not, go look at the like, diagrams for how big the final thing is going to be compared to like a person or existing launch vehicles and it's, it's absolutely insane.
[00:23:46] Jason Gauci: Wow. It's amazing. Yeah. The so, so how far are we away from being able to get a ride on one of those as a passenger? (laughter)
[00:23:54] Patrick Wheeler: You and me, or an arbitrary rich person?
[00:23:57] Jason Gauci: Let's say, let's say an arbitrary rich person.
[00:24:00] Patrick Wheeler: I mean, I think they're going to fly several civilian launches this year. I think Space X is doing one on the Falcon, their current ones that go up to the space station. They're going to launch some civilians, which will sort of kick off another set of civilian space travel, which hasn't happened for a few years.
[00:24:17] But I think on Starship, I would say probably still sort of 2, 3 years away. I mean, there's the Japanese person who bought a trip around the moon on one of these star ships and they haven't set an exact date yet, but I think they're saying it's going to be closer than you think.
[00:24:34] Jason Gauci: Wow. It's amazing.
[00:24:36] Patrick Wheeler: Now, you and I affording it? Um...
[00:24:38] Jason Gauci: Yeah, that's a, that's a whole another question. (laughter) Yeah. Maybe if I, yeah, we stopped buying so much junk food, maybe we can get there.
[00:24:49] Patrick Wheeler: How much dogecoin did you buy?
[00:24:50] Jason Gauci: Oh gosh, dogecoin. (laughter) Yeah. Last show. or a couple of shows ago, few features ago, we were talking about like a fractional-reserve banking and those things, the last duo episode. And since then people have been trading their dollars for dog money. So I think, I think that just kinda tells you everything you need to know. (laughter)
[00:25:18] So, okay. My, my second news article is on Mailsync. So I actually use Mailspring. If you've ever gotten an email from me recently, you'll actually see this little footer because I've never bothered to change the default, which would be great advertising for them.
[00:25:35] You'll see, there's a little footer saying Mailspring, the best free email app or something like that. But I actually love Mailspring. We should definitely do this tool to show some time, if haven't already, but it's a, it's a nice desktop email app. It has a lot of like really nice features. It collects all your inboxes. I just feel like it has a great interface. I'm a big fan.
[00:25:54] And for awhile, the Mailspring kind of front end, which is all in Electron, was open source. But the backend which actually handled ingesting mail from a trillion different sources, they handle like exchange, they handle Gmail, they do all of that, that was all, was all closed source. And so I don't remember if Mailspring is on Linux. I know it's on, it's on Mac and Windows. I'm sure it's on Linux too.
[00:26:26] But in general, like you always, when something's closed source, you always kind of wonder how long it's going to survive. And and now they recently decided to open source the whole backend, which is really cool. Not only does it mean mail spring will be around for forever, but also if you ever needed to do that, like if you needed to ingest mail and, basically Mailsync basically converts email into think of it as like a SQL Lite table.
[00:26:57] And so if you ever needed sort of a database representation of your email, and one that sort of is kept, kept in sync and kept alive, Mailsync does exactly that. And so it's, I think it's a really nice thing to have for the community. So, so I'm really excited about all of that.
[00:27:13] I think it's, I think email is still an area where there's a lot of low-hanging fruit. I mean, I spend a ton of time on email. Something that kind of organized email, I do think there is something to that, and this will sort of open up that more, where people can write new front ends and new mail experiences without having to learn 10 different protocols.
[00:27:36] Patrick Wheeler: Oh man. That's cool.
[00:27:38] Jason Gauci: Yeah. There are some email clients that are premium, like there's Superhuman and there's Hey, but I haven't really tried them yet, but yeah, there's gotta be, I think, a better way to do email, like something that's suggested... like, I don't want something that automatically puts things in different folders cause you never really know what it's going to do, but something that somehow like just suggested ways for me to do email better.
[00:28:04] I think that could be cool.
[00:28:05] Patrick Wheeler: I mean, I was, I much preferred Inbox, Google's Gmail client that they experimented with a while. I still feel I'm worse at email using the Gmail app, than I was using Inbox. I too still have a hole in my soul.
[00:28:22] Jason Gauci: Yeah, I actually switched. I found Mailspring when Inbox was shut down.
[00:28:27] Patrick Wheeler: Okay. Well, we're not going to get on that. It's gonna make me sad. It's time for...
[00:28:31] Jason Gauci: Yeah, it's true. Inbox is so good, but it is what it is.
[00:28:34] Patrick Wheeler: It's time for Book of the Show.
[00:28:37] Jason Gauci: All right. My Book of the Show, which actually maybe this might be the first time ever, but I already read this book. I'm not suggesting a book and walking someone off a plank. But yeah, this book is it's self-titled, it's Elon Musk. And so it's all about Elon Musk.
[00:28:54] I think it's awesome, actually. I found the book fascinating. I think that it really, it did a good job of covering the good and bad stuff. Actually. It's really interesting, the foreword is, or maybe the first chapter preface or whatever, is from the author first-person talking about the negotiation you had to go through with Elon to write the book.
[00:29:20] And originally Elon wanted all these footnotes and I guess he's like he has a sort of engineer mindset where he wants to like, just correct the record and wants everything to compile right. But they kind of went back and forth on that a lot. But yo know, as you can imagine, like someone who's like really concerned about their image and everything, right? That's not a surprise.
[00:29:40] But yeah, I felt like the book was extremely well-balanced, had a lot of really good detail. It covered, no pun intended, all different like chapters in Elon Musk's life. So he started off as just like a intern at some software company, and then just doing a bunch of coding.
[00:30:02] One thing I noticed was, and one thing you kind of take for granted, or at least I kind of took for granted is the support that he received from his family. So one thing that, so there's a bit of trivia here, like Elon Musk is, one of his things he's known for is having a, like kind of a rough childhood. Like his parents I guess were really strict. And that, I'm not taking anything away from that. But his parents actually funded the first company. And actually when I've read about other like tech billionaires, actually, you kind of see that pattern over and over again. And so I was thinking, it made me think a lot about like, like that sort of community, like the Elon Musks and the Mark Zuckerbergs and the Bill Gates and what have you.
[00:30:43] And a lot of these people are kind of like in a position where they can take some of those really big risks. So like someone, in all of these cases, someone offered them millions of dollars to their company and they turned them down, and I feel like that would be pretty extraordinary to do that. But then when you see the background, it's like, okay, well actually these people already had millions of dollars, usually like through their family. So that kind of like made more sense.
[00:31:11] But still, I mean, Elon Musk's a total genius. I felt like I have a ton of respect for him after reading the book, and yeah, I thought it was a fantastic read. It actually is very exciting. I was really interested in it the whole time. So it doesn't, you know a lot of these can tend to drag on because they might say, well, here's sort of the middle years where nothing happened. And there really wasn't a lot of that. The book was, was super entertaining all the way through.
[00:31:35] Patrick Wheeler: Nice. I've actually not read any Elon Musk biography. I not too much studied him as a person.
[00:31:41] Jason Gauci: Yeah. You know, I know very little also about a lot of, I don't really study a lot of people, but for some reason, like I was really just drawn to reading this biography. The only other biography I've read is the Steve Jobs biography. And yeah, there's, there's just a lot of really interesting parallels, parallels there and yeah, it's a solid book.
[00:32:04] Patrick Wheeler: Nice. My Book of the Show is Ready Player 2. This is the follow-up to Ready Player One by Ernest Cline, and set in the same universe as Ready Player One. I guess it's a low-hanging, low-hanging fruit, not a super deep book, but enjoyable.
[00:32:19] I enjoyed it as much as Ready Player One, but definitely pretty solid. I am not getting through as many books nowadays as I used to.
[00:32:26] Jason Gauci: Yup, same here.
[00:32:27] Patrick Wheeler: But this one was nice, easy. I actually listened to it, but a nice, easy listen, and I kind of did it in the background. I thought it was good. You know, the same kind of thing if you'd like the first one, you'd probably like the second one. If you didn't like the first one, I wouldn't bother with the second one. Yeah. I mean, it's the same kind of like pop culture-laden exploration of the 80's and a little bit more music-bent maybe than the first one, but yeah, a good read.
[00:32:53] I always struggle. I don't, unlike a biography, I don't want to spoil the fiction too much. So I don't really want to say too much about what the book is, but yeah. Ready Player 2 by Ernest Cline.
[00:33:06] Jason Gauci: Cool. Yeah, you recommended the first one a while back and I gave it a read and I thought it was fabulous. So yeah, I'll definitely, I think I already bought this book as soon as I saw it, but I haven't read it yet.
[00:33:17] Patrick Wheeler: Yeah. My backlog is getting pretty big, so I got to start spending some dedicated time getting through all these books I want to read.
[00:33:25] Jason Gauci: Yeah. I noticed that I've kind of caught myself switching to podcasts because I don't have the long commute or anything. And so, but I think I'm going to try to get back into doing full books, but just in bite size pieces and see how that works out.
[00:33:40] Patrick Wheeler: Is it time for t-t-t-Tool of the Show?
[00:33:43] Jason Gauci: Tool of the Show. My Tool of the Show is Datadog, which is an amazing tool. It is super, super fun. So the way Datadog works is so you're developing, let's say some kind of app, it could be an Android app, it could be a website with some web backend, or even on the front end on the browser. Anything you develop, you're going to give it to somebody, and that person is going to run it when you're not around. And so a bunch of logs are going to be generated, right?
[00:34:11] Now if it crashes, that's a whole another thing. And I think I already mentioned Century in a previous episode. So if it, if it crashes, you have Brakepad, you have Crashpad, you have Century, you have a million different tools, Rollbar, et cetera.
[00:34:24] But then like there's also just, is this healthy? Like in general, you don't want to crash websites or apps, right? But you still want to know. For example, how many people are opening your app, clicking on, let's say the settings, and then leaving the whole app. Well, that would probably be good to know. Or just anytime you have any type of log, like sometimes you have a log that should never get here, right? And it, and it fires, right?
[00:34:53] So most apps, sites, et cetera, are generating logs and it's great to have those sort of aggregated. And so what Datadog does is, they have a bunch of clients for a whole bunch of different languages. I actually ended up writing my own client because they didn't have a C++ one. But they have a very straightforward json API. So you could, you could write a client with just about anything, even if they don't have one for that.
[00:35:19] And you basically, just through this API, you make a post request with your application ID and a bunch of logs. And then what Datadog does is it collects all of that. And then it presents you with this really nice web UI where you can, you can actually, there's something called live tail and you can actually see logs as they come in, and then you can write rules and they actually have this sort of way to automatically find patterns in your logs.
[00:35:48] So if you have a bunch of logs that says, you know, received 26 packets, received 31 packets, received a hundred packets, right? All of those lines are going to be different, but they all have the same pattern, it's just received in packets. Datadog will automatically pull out that pattern and then group all of those logs together.
[00:36:09] To be honest, I haven't ever used any of the premium features. I've only ever used the free tier, which makes me think that they probably should maybe charge more or something. But nothing I've talked about costs any money, they do all of this for free, and then you can write, you can then write rules on top of that. So you can say, for example, all of the logs that are, that have the severity level of error, I want all of those to get written to an S3 bucket. So S3 is the Amazon AWS file system. And so like I have, that's one of the rules I have set up.
[00:36:47] So whenever an error log happens on Eternal Terminal, I've actually, I've gone through and like looked at all the error logs to make sure that there's nothing like personal, like someone's name or IP address or anything like that. And so I know all these error logs are safe and then whenever anyone knows error logs are written, it will get sent to Datadog and then it gets dumped through an S3 bucket.
[00:37:09] So all of that is super nice. The other part of it is, is yeah, Datadog, as I was saying, kind of alluding to earlier, they're using your own kind of infrastructure.
[00:37:19] So like Datadog is not actually storing anything they have a way where you give them permission to a folder on your AWS. And so you're effectively, you're paying for all the hosting costs and stuff. And because of that, there's no limits. Like they can just take tons and tons of data. And if they have to write a lot of data, you're going to pay for it through, through Amazon, not through them. So it's an amazing service for free. I have to lookup what the, I think for a premium, they'll let you do alerts and more fancy things. but yeah, definitely check that out. If you're building anything that's going to be looked at by other people.
[00:37:58] Patrick Wheeler: Super cool. Yeah. They, they're like a publicly traded company and stuff now, right?
[00:38:03] Jason Gauci: Uh, that's a good question. I don't know.
[00:38:06] Patrick Wheeler: It's just super cool to hear you say, kind of like their niche on top of other people's services. I, I guess it's kind of cool.
[00:38:13] Jason Gauci: Yeah, actually that was, I felt like a really good sort of business decision because, in my case, it's a, it's an open source project. I'm not too worried about if they get hacked, these sort of error logs, just as I said, I have no personal real data. But you know, imagine you're like a company that's logging customer information or whatever, the fact that it's passing through their system briefly, and then it's gone is kind of nice from a security standpoint. And it means for them, they don't really have a big, a big footprint.
[00:38:44] Patrick Wheeler: My Tool of the Show is, I already teased it, not a tool, but a game. I Expect You to Die, for, I played it on the Quest 2, but I believe it's actually on a lot of VR platforms.
[00:38:55] Jason Gauci: So real quick, Datadog is publicly traded. And actually that company is, has a market cap of $28 billion. (laughter) So that is a much, I kind of, I thought that it was just a small company, I guess I was totally off the mark there, but I think if you're going to buy a dogecoin, instead buy a share of Datadog. That's my advice.
[00:39:18] Patrick Wheeler: Shares are a million dollars apiece. (laughter)
[00:39:23] So my game is I Expect You to Die, I guess it's a little bit like an escape room? Kind of feel you, unlike the ones I was saying, where you move around the room here, you're actually supposed to be sort of seated, but all of the scenarios make sense with you being stationary. So either you're like sitting or standing in kind of one spot and moving your body, sort of like rotating around the room to do various items and there's various puzzles. And it's themed as a you're sort of a spy going on missions. But there's not a ton of explanation, and various things that you do and trying to solve the puzzles, cause you to die. (laughter)
[00:40:02] So you may be like in a car and there's poison gas outside. And if you break the window, somehow the poison gas is going to come in and kill you. But that's not the only way, there might be a grenade, and if you drop the handle off the grenade, the grenade will explode. And so you aren't expected, at least, I guess, based on the title, I assume you're not expected to not to die, so you die and then you try it again. and you, you sort of figure out the puzzles.
[00:40:30] Puzzle rooms is an interest-- I've never done an actual puzzle room, like in real life. Like I've never gone to one.
[00:40:36] Jason Gauci: Oh, my family loves them. They're so fun. Like escape rooms, you mean?
[00:40:40] Patrick Wheeler: Oh that's what I mean. Yeah sorry. I don't even know the name. I don't, this is not my--
[00:40:43] Jason Gauci: Oh, they're so good.
[00:40:45] Patrick Wheeler: But I sense it's probably something similar where there's a kind of language to them that you have to figure out, like if you just went to one, never having heard of it before, you're not going to necessarily do so well, because you don't, the premise can be, you may need to know a certain trope or a certain way of thinking for it to start to make sense, or the same is true of a lot of things.
[00:41:09] So I feel this one does a good balance of understanding where the puzzle is. So sometimes when I've played games that are claimed to be sort of a escape room themed, I don't even, I can't figure out where the puzzle is. I understand I'm supposed to do something, but I don't, I don't get that you're supposed to just tap around the screen or you're supposed to bash open things to look what's inside. Those aren't intuitive to me, but this game, I feel like strikes a really nice balance of like, it's clear what things I can affect, and then by like working through the interactions between them, I can see where the puzzle is and I can see how to solve the puzzle.
[00:41:46] I don't always get it right the first time, but each time I can continue to make progress. And so I find it for my naive level of sophistication. I don't have to like go consult a walkthrough repeatedly to like keep making progress. Like I'm able to work it through on my own, and it's at just the right difficulty level where I don't get bored of it and I don't just cruise right through it either.
[00:42:08] So I very much enjoyed, and the theming is kind of cool too, a little bit, James Bondy a little bit this spy thing. And it has a little bit of story without being too much. I find it to be a really good experience. And each level is sort of bite-sized enough that for me, like sitting and doing one VR session, I can get through kind of like one level and that's about the right for me.
[00:42:27] Jason Gauci: Cool. That makes sense. And so, yeah, I guess one of the challenges with that, I mean, puzzle games in general, it's like you never really know as a, as a puzzle game designer, like how you're supposed to deal with someone not figuring it out. That seems like the hardest part, right? Like with these escape rooms, what they've done is, like for example, we did one where, and we've, we've done a bunch, but we did one where we just, there was just a trick that we didn't get.
[00:42:54] I think it was some kind of language trick where there was like 4 different cat names. And based on the cat names, that was the clue. Like you just had to know some factoid that none of us had ever really heard of. And so we were really stuck. And so they have this thing where you can press a button and basically tell, kind of give you, I think there's 2 levels of hints, but yeah, kind of getting that right. I think, especially for VR, it's really important because you don't want to take the headset off and then go look on Wikipedia back, you know? And so actually, how do you die in, in these games? Like does a screen just go black?
[00:43:31] Patrick Wheeler: Yeah, I think it, like, I don't remember exactly. I believe it like fades to red, you know? And then it's just like... but whatever when it happens, like an explosion of a grenade or like gas coming in and you starting to like black out and you have a few seconds to react, cause sometimes you can stop it, but sometimes like a grenade exploiting you can't. And then it fades, right, and you go to like a sort of death screen where it makes some funny quip about you sucking and not figuring it out or so much wasted training on you or some, some such something, and then you can replay the level. But there are achievements as well, and the achievements sometimes require you to die in certain ways or whatever.
[00:44:05] Jason Gauci: Oh yeah, that makes sense. Yeah. I used to play a game when I was really little called Lost Vikings, which was a puzzle game. And one thing that, that was pretty interesting was if you lost, if like you died 3 times trying to finish the level, then on the 4th time they would offer you to skip it. And yeah, I feel like that, that actually is probably the hardest part of all of these games is figuring out what to do there.
[00:44:30] Patrick Wheeler: They do that in some super Mario games. Like the newer ones. If you died too many times, they give you like a special power up or whatever, that, that makes it a little easier.
[00:44:39] Jason Gauci: Yup. That makes sense.
[00:44:41] Patrick Wheeler: But yeah, so I really enjoyed it. You know, I think I, I want to get more into these puzzle games, but like you pointed out, they need to be kind of like well-designed or else it just becomes frustrating.
[00:44:51] I still find myself frustrated at some of the puzzle games where you need to like, do an action repeatedly, even though it's not resulting in any thing happening, like this character and the character burps, and you have to kick the character and they just keep burping in exactly the same way.
[00:45:06] But on the 13th burp, they belch out the potion in their belch bubble or something. And it's like, what? Like why would I sit here and click the same thing, getting the same result. If you don't give me some indication that progression is being made, I'm going to go do something else.
[00:45:20] Jason Gauci: Yeah. Yeah, exactly. Like there should be a way to just using logic and observations, to not have to guess. I think that's really important. Actually they, I used to do the those puzzle championships, like the world puzzle championships and all of that.
[00:45:37] Patrick Wheeler: Oh, like the MIT one and stuff?
[00:45:39] Jason Gauci: Yeah, yeah. People would think that you would, you'd guess, but it would just take forever. Or like Sudoku. Sudoku's a much more common example, like you shouldn't be doing a lot of guessing in Sudoku, like if you're doing a lot of guessing--
[00:45:52] Patrick Wheeler: No, no guessing. A properly-designed Sudoku puzzle requires zero guessing.
[00:45:56] Jason Gauci: I thought there were some where like, there would be maybe one guesss?
[00:46:00] Patrick Wheeler: It may need a very advanced technique or whatever, right? Like if it's a hard puzzle, like you may not know the deduction method that's needed, and therefore you would have to guess, if it's like above your level. But you can make whatever Sudoku puzzle you want, right? As long as it has a unique solution, and it could require guessing though, but they're like when the algorithms design them, or the designers design them, they're supposed to know here's the only kind of techniques you know, and it should be forward solvable using only deductoin.
[00:46:33] Jason Gauci: Oh, interesting. Yeah, that makes a lot of sense. I think I think, yeah, there was this other puzzle game you just really enjoy. And I watched actually a podcast. There's a podcast called Game Design... I can't remember the exact name of it, it was game design podcast. And one of the inventors, the person who wrote that game was on that podcast.
[00:46:53] And it's actually really clever. They just kind of... and so it could probably work somewhat similar, is they kind of worked backwards from the answer. But you have to work backwards in a way where it's not ambiguous, but yeah, I think, I think that sounds awesome. Actually, I'll give that a shot. I'm looking for another quest game to pick up.
[00:47:11] Patrick Wheeler: Yeah, This is one where like, I pretty much, like the kids will play a level I've already played and once they learn like kind of thing, but it's over their head, but they enjoy watching me like, oh daddy. Sometimes I do it on purpose sometimes. Like I realize there's a novel way to die and I'll let myself die just for kicks, but they always think it's hilarious. Aw, daddy, you suck. Like you can, like, you didn't realize that grenade was going to kill you. Like how dumb ar-- I mean, I'm, I'm being hyperbolic, but it's an enjoyable family experience that way.
[00:47:38] Jason Gauci: Yeah. My son does the same thing with me and Zelda. So I made it to this one temple and Zelda where I, it really was too advanced. It's one of these ones you have to battle a robot. People have played the Breath of the Wild. There's some of these temples where you fight battles, but the game is nonlinear, so you could get there at any point.
[00:47:56] And this one robot, I just ended up basically, if he hit me at all, I just died instantly. And so it made, it made this battle way harder than it really needed to be by doing it so early.
[00:48:06] And yeah, my son actually was filming it on his iPad for his school share day, and he's like, dad's going to die again. He's gonna die watch he's going to die. And then I died. He's like, yeah, dad died again. There's a game over screen. I was like, honestly, like it's the first time in probably 25 years where I wanted to throw the controller (laughter) for 35 years or something, but I was like having the audience support actually didn't help that much.
[00:48:35] Patrick Wheeler: Uh, all right. Hash maps.
[00:48:39] Jason Gauci: Hash maps!
[00:48:41] Patrick Wheeler: Yeah. I'm going to kick it out. So before, when I was in college and learning about data search and algorithms, of course I learned abou hash maps. And, you know, oh, cool, like we're going to talk about what makes them interesting in a minute, but it never really clicked to me, but, and I was mentioning this to Jason in the pre-show, I don't think we'll get to it at the end 'cause we already already going along. But when I went to start looking at sort of what you call Silicon Valley companies, big companies, like interviewing there.
[00:49:11] I kept reading, oh, hash maps, you gotta know hash maps, you gotta know, I hadn't really used hash maps. And I thought they were overplaying it, but sure enough, all my interviews, like some solutions needed familiarity with hash maps and intricacies of hash maps, and then showing up to my first jobs there, sure enough, hash maps were super common when you get to kind of really big data. I was really surprised. I underestimated the importance of hash maps early in my computer science career.
[00:49:40] Jason Gauci: Yep. Same here.
[00:49:42] Patrick Wheeler: So the promise of a hash map is insertion and lookup being constant time. So the idea is, if you think of like a binary tree, we talked about trees, I think in our last duo episode, then you get the insertion, right? The insertion, if you just insert into a sorted array, for instance, and you don't use binary search, you'd have to sort of like go through each element in the array to find where you need to insert it. So that would be like linear time.
[00:50:12] If your stuff is a well-formed binary tree, then you need like visits to nodes before you find where to insert. But hash maps promise to blow that away by saying, I'll give you a constant time insertion and a constant time lookup. So if you, if you're trying to see if this key is in your hash map, you can do that in sort of constant time.
[00:50:38] That doesn't mean zero time. It just means that it doesn't really matter how many items are in your map. That no matter how big your map grows, within bounds, you would still get sort of this algorithmic constant time. and so that's a super, super powerful thing. And we're going to talk a little bit about the various ways that can be accomplished.
[00:50:59] Jason Gauci: Cool. Yeah, totally makes sense. Yeah, I think I think it actually will surprise a lot of people like how ubiquitous hash functions are. They're really just everywhere. But yeah, at a high level what's going on here is, so we talked about trees, right? And trees are super nice. The challenges with trees are that you have to keep them balanced. And so that could be really expensive. And even if you don't have to keep them balanced, putting something or finding something in a tree takes logarithmic time. And so that doesn't sound like a big deal, but when you start putting lots and lots of things into the database, it can actually become like a really big problem, right?
[00:51:41] So imagine I'm like log base 2 of a million is, oh, geez. I kind of kind of put myself on the spot here is why like maybe 6 or something or the log base, no, log base 2 have a million is what like 14 or something? Anyways, it could easily end up taking like 14 times as long... oh, 20. Well, there you go. (laughter) All right. It shows how well I could do a log in my head, but yeah.
[00:52:09] And so, so imagine like, you have an app, and now it takes 20 times as long to run. It could be, it could end up being a really big deal. And so the challenge with hashing is figuring out sort of the hash function. And, you know, what that means is figuring out sort of some type of way to represent something really concisely, or really like ultimately represented with just a number.
[00:52:34] And so there's been a bunch of work in this field. I mean, it would take a really long time for us to really dive deep into it. But at a high level, there's really 2 types of hash functions that are important. The first type is like cryptographically secure and the second is not, is not secure, right? So for example, if you're trying to hash like credit card numbers, well maybe, let's say someone has a person's name, they're trying to find their credit card number and maybe they have, somehow they have some access to the database.
[00:53:11] And so they have all these hashed credit card numbers. If you do something that's not cryptographically secure, then you know, they can use various statistical tricks to try to reverse engineer that, or maybe they can narrow down. If they have a list of credit card numbers, they can narrow down which one it is.
[00:53:27] And so if you're storing something really sensitive, using a cryptographically secure one has some nice guarantees, right? Or if speed is not super, super critical. And so that's why something like, Git for example, Git will use SHA-256, which is cryptographically secure. Now you might not be storing passwords in Git, in fact you probably should never be storing passwords in Git, but they do that because it's on almost all the computers and you know, that's not the thing that's going to slow you down if you're using Git, like, you're not blocked by it.
[00:54:02] Patrick Wheeler: Jason, there's one more point there too, that I was kind of unaware of because I don't build these kinds of services. But another reason to make sure to use cryptographic hashes is if you have sort of a public service in some way or an API that isn't only trusted users, I guess, that is backed by a hash map. And the keys in some way are like basically user defined. So I'm creating a username or I'm creating something that I can kind of control. And if someone can deduce what hash function you're using and insert entries, which all collide, all go to the same hash, then they can basically cause a denial of service.
[00:54:44] We'll talk about why in sort of a minute, but that's another reason why, if you have a public API of some nature where you're going to back it by a hash, you want to be really careful that someone can't cause, like force hash collisions to occur and thereby like slower denial of service, your, your service.
[00:55:03] Jason Gauci: Yeah. Yeah. That makes sense. Yeah, I think, yeah. So, so just, yeah, using Git as a good example, like, so let's say someone creates like 10 million Git commits that all have the same content in them, right? Well, and you're running like a Git website, like you're hosting Git for other people as your job, right?
[00:55:22] So you're probably going to write some rule that says, look, if this person makes the same commit all these times that all have the same content, then I'm going to block them, or I'm going to, yeah, block them and send an email saying your account has been compromised or what have you, or something bad has happened, right?
[00:55:39] On the flip side, maybe someone is creating a whole bunch of different commits with different content because they're writing some script that's automatically converting all the tabs to spaces and all their files or something. And for whatever reason, it's creating a lot of commits. And so that might be normal, right?
[00:55:55] And so the getting to Patrick's point, like someone could actually, if it's not a cryptographically secure hash function, they could actually figure out a way to be making a lot of different changes, but they're all hashing to the same hash and, and creating all sorts of problems there. and so it'd be very hard to fix that.
[00:56:17] And but you know, there are times, I mean, a good example of a time to use hash that's not crypto secure is when you're doing these Big Data ETL things. So when you're doing like a MapReduce or Spark or one of these things, so just to like not sidetrack too much, but the way MapReduce works is you have this enormous volume of data somewhere, and let's say, you just want to build a histogram of it.
[00:56:45] So you want to count the number of times you see Patrick, you want to count number of times, you see the word Jason, you want to count the number of times you see the word Programming Throwdown. So the way MapReduce will do this is it will have a ton of different machines that are going to just take a small piece of this massive data set and count the Patricks and the Jasons and the Programming Throwdowns.
[00:57:07] And so then it's going to hash all of those names, right? So Patrick is going to hash to some kind of number. Jason's going to hash some kind number, Programming Throwdown's going to hash to a number. And then it's going to, based on that number, send it to what's called a reducer. And so if you have, let's say 10 reducers, you might say, well, all those numbers that end in a zero go to the first reducer, all those numbers in and a one go to the second reducer. I guess you'd need 11 reducers.
[00:57:36] But, but anyway, so right, so then you, you use that to, the reducers' going to collect all that information. And so a single reducer knows it's going to have all the Jasons because Jason is always going to hash to the same thing. And so that's how, how MapReduce works. And but for MapReduce, as I said, you're processing these massive data sets and you know it's not really a cryptography thing. Like there's not really any concern about that.
[00:58:07] And so MapReduce, at least back when I was working on it a long time ago, use Murmur, now it might be using xxHash, which is, which has been proven to be a lot faster, but that's, that's something where actually saving that bit of time by using Murmur instead of SHA will actually really matter because you know, the, the computation there...
[00:58:29] Patrick Wheeler: And so the hash and hash maps is needed because ultimately what you're trying to do is take some key, we should mention, I guess, hash maps and hash sets are kind of interchangeable, like it just kind of, the underlying fundamentals are the same, it's just what data type you use, whether it's just the key only, or the key plus some data.
[00:58:52] And so what you're trying to do with the hash is take arbitrary data, mix it up, combine it, do something in some way to go from an arbitrary, any input size of bytes, imagine a string and you don't know the length of the string, right? It could be Bob and it could be server integrate. I don't know, like something really big. And you want to combine it all down and let's say you want only one byte to represent all of that data. How do you mix it up so that your keys are as evenly distributed as possible across your key space?
[00:59:25] Now, the other thing you have is you have the hash function, which goes from an arbitrary number of bytes, usually, like for the ones Jason was mentioning to 4 bytes or 8 bytes for 64 bits or 256 bytes would be what, 16 bytes.
[00:59:40] Jason Gauci: Yeah, that's true. (laughter)
[00:59:42] Patrick Wheeler: So you end up having say, let's say 4 bytes, right? But that's still 2 to the 32. That's a lot of like keys, and the getting constant time, you have something called a bucket and a bucket is taking a key and having, you can think of it as an array of a side.
[01:00:02] So if you,
[01:00:03] Jason Gauci: So 256 bit is 32 bytes.
[01:00:05] Patrick Wheeler: There we go. Thank you. That makes sense. And so you have these buckets and the buckets are like slots in your array. So say I have, I want to make my array small to begin with. So I have 7 buckets, 7 elements in my array. And I have a 4-byte hash result, right?
[01:00:24] So I need to get from 4 bytes down to 7. and the way you do that, typically, I mean, there's other variation is, is a modulus. And if you do the modulus of 2 to the 32, whatever the value is, modulus 7, then it'll just keep wrapping around and you'll find that which swat to put your item in.
[01:00:43] Now there is interplay as well that we won't get into about how to choose, that when you want more slots than 7, like, what do you choose as your next? Do you choose 8? Do you choose another prime number? Because 7's prime. And there's some trade-offs there, but ultimately you're taking this hash function and fitting it into your buckets of some size.
[01:01:05] Now the size is determined in part by one of the design factors that can end up sort of breaking the constant time assumption by something called the load factor. So the load factors, how many items you have stored in your hash map, divided by your current number of buckets. So if you have buckets that are size 7 and you have 14 items you try to store in 7 buckets, then clearly you're going to have at least some collisions, and your load factor would be 2.
[01:01:36] Now, when you have collisions 2 things mapping to the same slot, there's a variety of ways of handling it. We'll talk about that in sort of a minute, but in all of the cases, the bigger your load factor gets in some hash maps, they try to keep, or must keep the load factor under one. In other cases you could allow it to grow above one, but in both times, once the load factor gets too high, you start breaking your constant time assumption. You end up going to linear time, which is worse than logarithmic time. And so you have to be really careful to play this load factor.
[01:02:15] Now, where the load factor game comes into play, is that you end up using more memory because the smaller your load factor is you're effectively wasting memory. So if I say, oh, I'm just going to create a billion entry array, and I'm just going to hash everything to it, and I'm not going to worry about hash collisions. Okay, cool. Except that billion element array is really big in memory and it's mostly sitting empty. So there's a trade-off between memory usage and speed.
[01:02:45] Jason Gauci: Yeah. So I mean, what makes the hash table or hash map or hash set, what makes these things so fast is that hashing is turning some content into a number. And that number is then used as an index. And so if you have an array, getting an item from that array from its index is super fast.
[01:03:06] And so you're kind of turning like you're doing sort of an association, right? That's turning strings or whatever you want into, into indices. And then, and then just putting it into that spot. But yeah, as Patrick said, the problem is that only makes sense if the array has already been totally allocated in memory and you can go to spot 5 by doing some arithmetic.
[01:03:31] And so for that whole thing to work, and there has to be something, even if it's just empty in spot 4, in spot 3, in spot 2, in spot one, like that space has to be occupied.
[01:03:43] Patrick Wheeler: So we were kind of getting out of this load factor, hashing about changing your data into a sort of number, modulusing that number by the number of buckets you have, and then looking in that slot. Now what happens if 2 things mapped to the same slot? So it turns out there's roughly 2 strategies for dealing with it.
[01:04:03] The first one's called closed addressing. So an example of this is the C++ STL library, for their unordered map and unordered set, use closed addressing, and there's some variation and complexity about what they do, but roughly what ends up happening is if you have a collision, because you could have 7 buckets and you insert your first item in index one, and then your second thing, hashes, and you also insert it index one. So you have only 2 things, but they collided. And what would happen in the STL library and these closed addressing, is you would just make a linked list.
[01:04:38] So you would take, put them in the same bucket, but there would be a list. And so when you go to do a lookup, you get the bucket and you say, does my key match the key in the first, at the head of the linked list? If not, go traverse a linked list until you either find the match, or you find out that it's not in your set. So that's closed addressing.
[01:05:02] Jason Gauci: Yeah. Actually one thing I wanted to just mention there is, yeah, you have to keep the original key. 'Cause when you hash something, you've kind of destroy the original value. Like if I, yeah, if I hash "Jason", it's going to give me like 13 or something like that. And so now "Jason" is gone, right? And so, and so in the hash table, when you go to spot 13, you actually still need to store the key again. And that's what kind of confuses a lot of new people when they see hash tables.
[01:05:31] And so, yeah, so it's, that's to Patrick's point, like if you're looking for Jason, just cause you got the 13, it doesn't mean that you found Jason because it could have been something else hashed to 13. So that's where you would actually look through that list and say, okay, is there an actual Jason here? And if that list gets really long, that that that's going to take a long time.
[01:05:49] Patrick Wheeler: Hot tip here too, is I think every one of these things we're calling out, I've heard personally in an interview I've been in, done or listened to. So if you're going to do a interview for a job in Silicon Valley, like every one of these caveats, I think we've mentioned so far has come up, at least in, in my experience. So the nuances here, if for no other reason than interviewing ability are, are important. But they do actually impact when you get into the subtleties of how your hash map is performing. Because I do know a lot of people who just slap in a hash map and assume it works and then move on and they don't actually measure to see if that's an improvement or if they should tune it.
[01:06:30] Jason Gauci: Yeah. One of the things that I was really fascinated by, again working, so, so MapReduce, like working on MapReduce was definitely the time where I was really exposed to a lot of these things. 'Cause that scale is when you're, you'll really start to see all of the, the challenges get exposed. And MapReduce switched from using using QuickSort to Merge Sort. And the reason is because in QuickSort and Merge Sort have the same run time on average, but if you pick bad pivots, which could happen because the data is just structured a certain way and you're expecting totally random or arbitrarily distributed data, there's actually times where QuickSort would consistently for this one MapReduce, Quicksort would take almost linear time.
[01:07:21] And you know, this is over, like billions and billions of items. So it just, it basically never finish. And so we ended up switching everything to Merge Sort. And so I think, and so that's one of these challenges with hash tables too, is, is, especially if you're doing something at scale is you want to monitor the number of collisions, and also something called the smearing, which is just like a, another word for the entropy really, but you want to measure the entropy of your hash table.
[01:07:58] So in other words, you want to see that a lot of the items in the hash table, all the cells in the hash table are like, these are either empty or filled with one or 2 or 3 things. And and I just keep a constant, constant eye on it and you're right. It does come up a lot in interviews trying to understand, that whole, like understand all of those dynamics.
[01:08:18] Patrick Wheeler: So if we gave the name closed addressing, it doesn't take much, I guess, to figure out the alternative is open addressing. So if you ended up with a collision and you're using an open addressing hash map, then you use some mechanism for finding another bucket, another slot in your array to go look at, to see if your key is there.
[01:08:42] So the easiest one to kind of think about is when inserting, if I insert and there's already something there, then I just do my bucket plus one. And then I look, is there something there? And if so, go bucket plus 2, right? And I just keep walking down and you could do linear probe, quadratic probe, right? You just basically looking and kind of iterating through all of the buckets, looking for one that's open.
[01:09:07] And then at accessing, you just do kind of like the opposite. You would go to your array. If the item, if the bucket is empty, you're done. But if the bucket has an item and it doesn't match your key, then you have to follow sort of the same probing as the insertion algorithm use.
[01:09:26] And there's a variety of these and these actually, I'll talk about one in a second, but there's, there's Cuckoo hashing, RobinHood hashing. So like in RobinHood hashing, you measure how far away from your original slot each element is, and you use that to decide whether to keep looking or to displace the elements of their, put yours in and then move that one further down. So if you're trying to get to your keys as close to their original bucket as possible.
[01:09:53] And the reason why these end up mattering and why this versus like the linked list approach from close addressing matters is because again, algorithmically, these are constant time insertion and lookup on average, but in practice you have problems with cache efficiency. In a linked list, if I don't pay attention, I go lookup one key and then it's not correct. And so I have to traverse the linked list, but that item in memory could be on an entirely different place in RAM, just based on wherever I got it from my allocator off the heat. And I have to wait for the processor to go through all of its caches out to RAM, pulling that value and access it, right?
[01:10:34] So the cache efficiency there is very different than if I have 2 slots and an array, which will be consecutive in memory, and I just need to look at the first one and then the second one, because of the way the processor works is it doesn't just fetch your value from RAM and into cache. It fetches it's called sort of a cache line, it fetches a bunch of memory all at once into the cache. And then, so it, that it expects that your next lookup is close to your first lookup when you're accessing memory. And so these open addressing ones try to leverage that realization to make much more cache efficient ways of accessing the hash buckets.
[01:11:16] Jason Gauci: So yeah, one question. When you're doing the open address thing, how do you know when to stop? Like let's say the hash table's full, right?
[01:11:23] Patrick Wheeler: Yep. So you stop when either you get to an empty bucket, or when you have, if it's completely full, you've gone all the way around, but these are much more sensitive-- so closed addressing, and I might, someone will tell me I'm wrong, that's fine. (laughter) Closed addressing, you can have a load factor over 1. Open addressing, you could never have a load factor over 1.
[01:11:47] Jason Gauci: Right. It'll crash or something, right? Cause it just won't be able to insert.
[01:11:52] Patrick Wheeler: Like, yeah, you have no, you can't insert the last thing, as soon as you go over 1. And if you have 1, then you're right. Any, checking for any value not in the set will require you to iterate all buckets.
[01:12:05] Jason Gauci: Yeah. The only alternative would be you'd have to also store the, the hash. So like in every bucket you'd have to store what the hash would have been if there were no collisions.
[01:12:18] Patrick Wheeler: Yes, there are, so there are also hybrid, hybrid hash maps, I think, that kind of start to use rehashing methods. So rather than using a single hash, you use a different seed and you hash again, or multiple bucket lists. Um, but yeah, so...
[01:12:34] Jason Gauci: Oh no, I was saying something a little different. I was saying like, so let's say you're doing linear probing, right. And so, but, but let's say in the hash table, along with the key, you're also storing the hash of the key, or I guess you could just do it on the fly. So like, if you're doing the linear probing and you know, you're looking for "apple", and "apple" hashes to 2, and then you go to the second index and it's "dog", right? And so you go, you do linear probing. So you go to the next one, and that item, whatever that is, if it doesn't hash to 2, then you know to stop, even if the hash table's full. But then like you're either like doing a lot of hashes or you'd have to somehow like store the hash or something. Does that make sense?
[01:13:19] Patrick Wheeler: Um, maybe. (laughter) I don't want to get into a design review of your algorithm. I feel something else could be hashed to 3. So you could have some "dog" was hashed at 3.
[01:13:30] Jason Gauci: Yeah. But I'm saying like, okay, so let's say the thing you're searching for your query hashes to 2, right? And you go to that spot, and there's something that hashes to 2, but it's not your item, right? So then you go to the next spot, and if that thing in that next spot doesn't hash to 2, then you're done, right?
[01:13:50] Patrick Wheeler: But I'm saying-- something else could have already been there. Yeah.
[01:13:54] Jason Gauci: Oh, and then you have to jump over it?
[01:13:56] Patrick Wheeler: Yes.
[01:13:56] Jason Gauci: Oh. And actually, actually now I think about it. So this is, this is a, yeah. To bring back a lot of memories. So actually when you... if something, if there's something that doesn't hash to 2 in that spot that you're in right now, it could be that "apple" is still there. It just got there too late and like someone else-- oh yeah. Okay. So yeah, you're right. You have to go through the entire hash table, worst case.
[01:14:20] Yeah. See, I would totally fail. Actually, just, just to circle back something you said earlier, I've actually never had a hash table kind of question, but you could tell I would totally fail that. (laughter) But I think it is super important. I think, don't panic if like your background is in something else. I don't know if you'll get maybe these kinds of questions, but it's still really good to know.
[01:14:42] Patrick Wheeler: Okay. I've seen good interviews and bad interviews. I won't speak to bad interviews, but in good interviews, I think they're actually hoping you don't know the answer because they don't care. They want you to be able to like, Jason just did, like, I have this idea, oh wait this idea, it doesn't work. And then, you know, be able to make the progression to refining your idea or discovering something that's already in literature that maybe your interview already knows about, but they're letting you sort of do it from first principles and seeing how your brain unpacks the problem.
[01:15:08] Jason Gauci: Yeah, it's a good call.
[01:15:10] Patrick Wheeler: So if you really, really want the cheat code, then there's something called the minimal perfect hash. So the idea here is if you know upfront all the items you're going to hash, say 397 of them, then you make exactly 397 buckets, and you find the hash function, which one to one maps, all of your input, keys to your bucket list, with zero collisions.
[01:15:41] These are pretty cool because you have a perfectly space-efficient answer, and the only problem is, how do you search the space of hash functions to find such a perfect hash? And there's a class of ones that you can narrow in on, and how to structure this problem. It's a very interesting, I guess, sort of research topic.
[01:16:01] I've never seen it used in practice, but it is a really cool idea that you could have the best of all worlds, a constant time lookup, no wasted space, a load factor of one, and guarantee that you would instantly know if a key is in your set or not. Like it just, it's really awesome. The problem is it becomes very, very expensive to find this perfect hash.
[01:16:25] Jason Gauci: Yeah, that makes sense. Yeah. I mean the only way I could think to do it as a person who has no experience would be to just try adding salts, and just keep doing that until, but that's going to take forever. I mean, the odds that--
[01:16:37] Patrick Wheeler: You've just discovered Bitcoin mining. (laughter)
[01:16:40] Jason Gauci: I guess so. Yeah, I think, I mean, the odds that you, that you eventually hit that right hash or just take, oh, for people who don't know, a salt is... A salt is very simple. It's something that you add to all of your keys that just make the hashing work differently. Uh, oh yeah, yeah. Okay. And the reason why, correct me if I'm wrong on this, or feel free to fill in the holes here, but I think the reason why they use salts is not to try and find better hash functions, but it's to, the idea's the salt is something that you, as the programmer know, but someone on the outside doesn't know. And so if someone let's say gets a hold of your database, but they don't know your salt because they don't have the source code, then there are certain tricks that they can't use.
[01:17:27] Like, for example, let's say someone has a database full of hashed passwords. Well, they know that a lot of people out there make their password the word "password". So they'll just run like the hash function. They'll like, look at all the popular hash functions, and run them all on the word "password" and see the numbers that come out of that, or the hash let's say, that comes out of that. And just see like, okay, which of these hashes is in your database a lot? Okay. It's this one, therefore you're using this type of hash function. And then knowing that allows them to do a bunch of other things.
[01:18:02] But if you've put a salt in, now, you don't have a whole bunch of hashed words of "password". Now you have a bunch of hashes of "mycoolsaltpassword", and "mycoolsaltabba" and "mycoolsaltPatrick" and whatever. And just like, even though you added the same thing to every key it's going to cause the hashes to be totally different.
[01:18:27] Patrick Wheeler: Yep. That's right.
[01:18:28] Jason Gauci: And so, yeah, I think, so getting back to Patrick saying like, you could just try, like adding, trying different salts until you get one, hashes these 300 things in a 300 unique, like integers after you've modded them. But that will take forever.
[01:18:43] Patrick Wheeler: Not forever, just a long time.
[01:18:47] Jason Gauci: Not literally forever, but yeah, really, really, really long time. Yeah, I mean, definitely. I mean intractably long time, like not long as in, come back in a week. (laughter) Long as in like, your whole lifetime, like more than that.
[01:19:00] Another thing that, that I've used hashing a lot for is, which is like very different from the traditional hashing, but still really useful is locality sensitive hashing. And this is kind of the opposite of everything we just talked about for, like goals for hashing.
[01:19:18] In locality sensitive hashing, you actually want things that are similar to hash to the same number, right? So imagine, like imagine a map or like a map of the United States, right? And you have a bunch of points on the map. And so, maybe all the points represent different businesses or something, right? And so in this case, like you want to hash the points that are close together to the same number, because that number is going to represent, let's say a file on disk that has all the names of the businesses. And so when you're looking on like Google Maps, you're probably looking at a whole area. And so you're probably looking at a bunch of businesses, but they're all close together, spatially. You want to hash all of those to the same numbers that they go to that, so that all the names of the businesses come from the same file on disk, right? And so that's, that's locality sensitive hashing.
[01:20:16] And so in that case, you're not worried about collisions, it's kind of a different, a different use case. But you still get a lot of benefit. You get a lot of the order of one type benefit. So imagine if you now have a bunch of these points from your map, or maybe even you have a region of points, right. And you can feed this through this locality sensitive hash, maybe feed the boundaries or Monte Carlo it, or what have you.
[01:20:43] And it'll say, okay, yeah, the hashes are 3, 4 and 5, and then you can load those files, which might have businesses that are outside of the range, but they're all going to be kind of in that area because of the way you've done the hashing. And that's how a lot of these kinds of spatial databases work.
[01:21:02] Patrick Wheeler: Awesome. Yeah. This is really good, because it was a fairly fun.
[01:21:06] Jason Gauci: Yeah. I mean, hash tables and trees, you know, it's sort of like a, there's that saying that, with a, with a linear saw and a hammer like you could do just about anything, like you build a whole house just with a linear saw and a hammer.
[01:21:20] I feel like trees and hash tables are kind of like that for us. Like with trees and hash tables you can build just about... and I mean, things like arrays and algebra, I mean, that's all primitives, right. But with those 2 data structures you can build just about everything.
[01:21:37]Patrick Wheeler: And a 55 gallon drum of water. (laughter)
[01:21:39] Jason Gauci: Yeah, that's right. You could pass the the computer science GRE with a 55 gallon drum of water, a water computer, and trees and hashing.
[01:22:07] Patrick Wheeler: Music by Eric Barndollar.
[01:22:10] Jason Gauci: Programming Throwdown is distributed under Creative Commons, Attribution ShareAlike 2.0 license. You're free to share, copy, distribute, transmit the work, to remix and adapt the work, but you must provide attribution to Patrick and I, and sharealike in kind.