Screaming in the Cloud

Jonathan (Koz) Kozolchyk, General Manager for Certificate Services at AWS, joins Corey on Screaming in the Cloud to discuss the best practices he recommends around certificates. Jonathan walks through when and why he recommends private certs, and the use cases where he’d recommend longer or unusual expirations. Jonathan also highlights the importance of knowing who’s using what cert and why he believes in separating expiration from rotation. Corey and Jonathan also discuss their love of smart home devices as well as their security concerns around them and how they hope these concerns are addressed moving forward. 


About Jonathan

Jonathan is General Manager of Certificate Services for AWS, leading the engineering, operations, and product management of AWS certificate offerings including AWS Certificate Manager (ACM) AWS Private CA, Code Signing, and Encryption in transit. Jonathan is an experienced leader of software organizations, with a focus on high availability distributed systems and PKI. Starting as an intern, he has built his career at Amazon, and has led development teams within our Consumer and AWS businesses, spanning from Fulfillment Center Software, Identity Services, Customer Protection Systems and Cryptography. Jonathan is passionate about building high performing teams, and working together to create solutions for our customers. He holds a BS in Computer Science from University of Illinois, and multiple patents for his work inventing for customers. When not at work you’ll find him with his wife and two kids or playing with hobbies that are hard to do well with limited upside, like roasting coffee.



Links Referenced:


What is Screaming in the Cloud?

Screaming in the Cloud with Corey Quinn features conversations with domain experts in the world of Cloud Computing. Topics discussed include AWS, GCP, Azure, Oracle Cloud, and the "why" behind how businesses are coming to think about the Cloud.

Transcript

Announcer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.

Corey: In the cloud, ideas turn into innovation at virtually limitless speed and scale. To secure innovation in the cloud, you need Runtime Insights to prioritize critical risks and stay ahead of unknown threats. What's Runtime Insights, you ask? Visit sysdig.com/screaming to learn more. That's S-Y-S-D-I-G.com/screaming.

My thanks as well to Sysdig for sponsoring this ridiculous podcast.

Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. As I record this, we are about a week and a half from re:Inforce in Anaheim, California. I am not attending, not out of any moral reason not to because I don’t believe in cloud security or conferences that Amazon has that are named after subject lines, but rather because I am going to be officiating a wedding on the other side of the world because I am an ordained minister of the Church of There Is A Problem With This Website’s Security Certificate. So today, my guest is going to be someone who’s a contributor, in many ways, to that religion, Jonathan Kozolchyk—but, you know, we all call him Koz—is the general manager for Certificate Services at AWS. Koz, thank you for joining me.

Koz: Happy to be here, Corey.

Corey: So, one of the nice things about ACM historically—the managed service that handles certificates from AWS—is that for anything public-facing, it’s free—which is always nice, you should not be doing upcharges for security—but you also don’t let people have the private portion of the cert. You control all of the endpoints that terminate SSL. Whereas when I terminate SSL myself, it terminates on the floor because I’ve dropped things here and there, which means that suddenly the world of people exposing things they shouldn’t or expiry concerns just largely seemed to melt away. What was the reason that Amazon looked around at the landscape and said, “Ah, we’re going to launch our own certificate service, but bear with me here, we’re not going to charge people money for it.” It seems a little bit out of character.

Koz: Well, Amazon itself has been battling with certificates for years, long before even AWS was a thing, and we learned that you have to automate. And even that’s not enough; you have to inspect and you have to audit, you need a controlled loop. And we learned that you need a closed loop to truly manage it and make sure that you don’t have outages. And so, when we built ACM, we built it saying, we need to provide that same functionality to our customers, that certificates should not be the thing that makes them go out. Is that we need to keep them available and we need to minimize the sharp edges customers have to deal with.

Corey: I somewhat recently caught some flack on one of the Twitter replacement social media sites for complaining about the user experience of expired SSL certs. Because on the one hand, if I go to my bank’s website, and the response is that instead, the server is sneakyhackerman.com, it has the exact same alert and failure mode as, holy crap, this certificate reached its expiry period 20 minutes ago. And from my perspective, one of those is a lot more serious than the other. What also I wind up encountering is not just when I’m doing banking, but when I’m trying to read some random blog on how to solve a technical problem. I’m not exactly putting personal information into the thing. It feels like that was a missed opportunity, agree or disagree?

Koz: Well, I wouldn’t categorize it as a missed opportunity. I think one of the things you have to think about with security is you have to keep it simple so that everyone, whether they’re a technologist or not, can abide by the rules and be safe. And so, it’s much easier to say to somebody, “There’s something wrong. Period. Stop.” versus saying there are degrees of wrongness. Now, that said, boy, do I wish we had originally built PKI and TLS such that you could submit multiple certificates to somebody, in a connection for example, so that you could always say, you know, my certificates can expire, but I’ve got two, and they’re off by six months, for example. Or do something so that you don’t have to close failed because the certificate expired.

Corey: It feels like people don’t tend to think about what failure modes are going to look like. Because, pfhh, as an expired certificate? What kind of irresponsible buffoon would do such a thing? But I’ve worked in enough companies where you have historically, the wildcard cert because individual certs cost money, once upon a time. So, you wound up getting the one certificate that could work on all of the stuff that ends in the same domain.

And that was great, but then whenever it expired, you had to go through and find all the places that you put it and you always miss some, so things would break for a while and the corporate response was, “Ugh, that was awful. Instead of a one-year certificate, let’s get a five-year or a ten-year certificate this time.” And that doesn’t make the problem better; it makes it absolutely worse because now it proliferates forever. Everyone who knows where that thing lives is now long gone by the time it hits again. Counterintuitively, it seems the industry has largely been moving toward short-lived certs. Let’s Encrypt, for example, winds up rotating every 90 days, by my estimation. ACM is a year, if memory serves.

Koz: So, ACM certs are 13 months, and we start rotating them around the 11th month. And Let’s Encrypt offers you 90-day certs, but they don’t necessarily require you to rotate every 90 days; they expire in 90 days. My tip for everybody is divorce expiration from rotation. So, if your cert is a 90-day cert, rotate it at 45 days. If your cert is a year cert, give yourself a couple of months before expiration to start the rotation. And then you can alarm on it on your own timeline when something fails, and you still have time to fix it.

Corey: This makes a lot of sense in—you know, the second time because then you start remembering, okay, everywhere I use this cert, I need to start having alarms and alerts. And people are bad at these things. What ACM has done super well is that it removes that entire human from the loop because you control all of the endpoints. You folks have the ability to rotate it however often you’d like. You could have picked arbitrary timelines of huge amounts of time or small amounts of time and it would have been just fine.

I mean, you log into an EC2 instance role and I believe the credentials get passed out of either a 6 or a 12-hour validity window, and they’re consistently rotating on the back end and it’s completely invisible to the customer. Was there ever thought given to what that timeline should be,j what that experience should be? Or did you just, like, throw a dart at a wall? Like, “Yeah, 13 months feels about right. We’re going to go with that.” And never revisited it. I have a guess which—

Koz: [laugh].

Corey: Side of that it was. Did you think at all about what you were doing at the time, or—yeah.

Koz: So, I will admit, this happened just before I got there. I got to ACM after—

Corey: Ah, blame the predecessor. Always a good call.

Koz: —the launch. It’s a God-given right to blame your predecessor.

Corey: Oh, absolutely. It’s their entire job.

Koz: I think they did a smart job here. What they did was they took the longest lifetime cert that was then allowed, at 13 months, knowing that we were going to automate the rotation and basically giving us as much time as possible to do it, right, without having to worry about scaling issues or having to rotate overly frequently. You know, there are customers who while I don’t—I strongly disagree with [pinning 00:07:35], for example, but there are customers out there who don’t like certs to change very often. I don’t recommend pinning at all, but I understand these cases are out there, and changing it once every year can be easier on customers than changing it every 20 minutes, for example. If I were to pick an ideal rotation time, it’d probably be under ten days because an OCSP response is good for ten days and if you rotate before, then I never have to update an OCSP response, for example. But changing that often would play havoc with many systems because of just the sheer frequency you’re rotating what is otherwise a perfectly valid certificate.

Corey: It is computationally expensive to generate certificates at scale, I would imagine.

Koz: It starts to be a problem. You’re definitely putting a lot of load on the HSMs at that point, [laugh] when you’re generating. You know, when you have millions of certs out in deployment, you’re generating quite a few at a time.

Corey: There is an aspect of your service that used to be part of ACM and now it’s its own service—which I think is probably the right move because it was confusing for a lot of customers—Amazon looks around and sees who can we compete with next, it feels like sometimes. And it seemed like you were squarely focused on competing against your most desperate of all enemies, my crappy USB key where I used to keep the private CA I used at any given job—at the time; I did not keep it after I left, to be very clear—for whatever I’m signing things for certificates for internal use. You’re, like, “Ah, we can have your crappy USB key as a service.” And sure enough, you wound up rolling that out. It seems like adoption has been relatively brisk on that, just because I see it in almost every client account I work with.

Koz: Yeah. So, you’re talking about the private CA offering which is—

Corey: I—that’s right. Private CA was the new service name. Yes, it used to be a private certificate authority was an aspect of ACM, and now you’re—mmm, we’re just going to move that off.

Koz: And we split it out because like you said customers got confused. They thought they had to only use it with ACM. They didn’t understand it was a full standalone service. And it was built as a standalone service; it was not built as part of ACM. You know, before we built it, we talked to customers, and I remember meeting with people running fairly large startups, saying, “Yes, please run this for me. I don’t know why, but I’ve got this piece of paper in my sock drawer that one of my security engineers gave me and said, ‘if something goes wrong with our CA, you and two other people have to give me this piece of paper.’” And others were like, “Oh, you have a piece of paper? I have a USB stick in my sock drawer.” And like, this is what, you know, the startup world was running their CAs from sock drawers as far as I can tell.

Corey: Yeah. A piece of paper? Someone wrote out the key by hand? That sounds like hell on earth.

Koz: [sigh]. It was a sharding technique where you needed, you know, three of five or something like that to—

Corey: Oh, they, uh, Shamir’s Secret Sharing Service.

Koz: Yes.

Corey: The SSSS. Yeah.

Koz: Yes. You know, and we looked at it. And the other alternative was people would use open-source or free certificate authorities, but without any of the security, you’d want, like, HSM backing, for example, because that gets really expensive. And so yeah, we did what our customers wanted: we built this service. We’ve been very happy with the growth it’s taken and, like you said, we love the places we’ve seen it. It’s gone into all kinds of different things, from the traditional enterprise use cases to IoT use cases. At one point, there’s a company that tracks sheep and every collar has one of our certs in it. And so, I am active in the sheep-tracking industry.

Corey: I am certain that some wit is going to comment on this. “Oh, there’s a company out there that tracks sheep. Yeah, it’s called Apple,” or Facebook, or whatever crappy… whatever axe someone has to grind against any particular big company. But you’re talking actual sheep as in baa, smell bad, count them when going to sleep?

Koz: Yes. Actual sheep.

Corey: Excellent, excellent.

Koz: The certs are in drones, they’re in smart homes, so they’re everywhere now.

Corey: That is something I want to ask you about because I found that as a competition going on between your service, ACM because you won’t give me the private keys for reasons that we already talked about, and Let’s Encrypt. It feels like you two are both competing to not take my money, which is, you know, an odd sort of competition. You’re not actually competing, you’re both working for a secure internet in different ways, but I wind up getting certificates made automatically for me for all of my internal stuff using Let’s Encrypt, and with publicly resolvable domain names. Why would someone want a private CA instead of an option that, okay, yeah, we’re only using it internally, but there is public validity to the certificate?

Koz: Sure. And just because I have to nitpick, I wouldn’t say we’re competing with them. I personally love Let’s Encrypt; I use them at home, too. Amazon supports them financially; we give them resources. I think they’re great. I think—you know, as long as you’re getting certs I’m happy. The world is encrypted and I—people use private CA because fundamentally, before you get to the encryption, you need secure identity. And a certificate provides identity. And so, Let’s Encrypt is great if you have a publicly accessible DNS endpoint that you can prove you own and get a certificate for and you’re willing to update it within their 90-day windows. Let’s use the sheep example. The sheep don’t have publicly valid DNS endpoints and so—

Corey: Or to be very direct with you, they also tend to not have terrific operational practices around updating their own certificates.

Koz: Right. Same with drones, same with internal corporate. You may not want your DNS exposed to the internet, your internal sites. And so, you use a private certificate where you own both sides of the connection, right, where you can say—because you can put the CA in the trust store and then that gets you out of having to be compliant with the CA browser form and the web trust rules. A lot of the CA browser form dictates what a public certificate can and can’t do and the rules around that, and those are built very much around the idea of a browser connecting to a client and protecting that user.

Corey: And most people are not banking on a sheep.

Koz: Most people are not banking on a sheep, yes. But if you have, for example, a database that requires a restart to pick up a new cert, you’re not going to want to redo that every 90 days. You’re probably going to be fine with a five-year certificate on that because you want to minimize your downtime. Same goes with a lot of these IoT devices, right? You may want a thousand-year cert or a hundred-year cert or cert that doesn’t expire because this is a cert that happens at—that is generated at creation for the device. And it’s at birth, the machine is manufactured and it gets a certificate and you want it to live for the life of that device.

Or you have super-secret-project.internal.mycompany.com and you don’t want a publicly visible cert for that because you’re not ready to launch it, and so you’ll start with a private cert. Really, my advice to customers is, if you own both pieces of the connection, you know, if you have an API that gets called by a client you own, you’re almost always better off with a private certificate and managing that trust store yourself because then you are subject not to other people’s rules, but the rules that fit the security model and the threat assessment you’ve done.

Corey: For the publication system for my newsletter, when I was building it out, I wanted to use client certificates as a way of authenticating that it was me. Because I only have a small number of devices that need to talk to this thing; other people don’t, so how do I submit things into my queue and manage it? And back in those ancient days, the API Gateways didn’t support TLS authentication. Now, they do. I would redo it a bunch of different ways. They did support API key as an authentication mechanism, but the documentation back then was so terrible, or I was so new to this stuff, I didn’t realize what it was and introduced it myself from first principles where there’s a hard-coded UUID, and as long as there’s the right header with that UUID, I accept it, otherwise drop it on the floor. Which… there are probably better ways to do that.

Koz: Sure. Certificates are, you know, a very popular way to handle that situation because they provide that secure identity, right? You can be assured that the thing connecting to you can prove it is who they say they are. And that’s a great use of a private CA.

Corey: Changing gears slightly. As we record this, we are about two weeks before re:Inforce, but I will be off doing my own thing on that day. Anything interesting and exciting coming out of your group that’s going to be announced, with the proviso, of course, that this will not air until after re:Inforce.

Koz: Yes. So, we are going to be pre-announcing the launch of a connector for Active Directory. So, you will be able to tie your private CA instance to your Active Directory tree and use private CA to issue certificates for use by Active Directory for all of your Windows hosts for the users in that Active Directory tree.

Corey: It has been many years since I touched Windows in anger, but in 2003 or so, I was a mediocre Small Business Windows Server Admin. Doesn’t Active Directory have a private CA built into it by default for whenever you’re creating a new directory?

Koz: It does.

Corey: Is that one of the FSMO roles? I’m trying to remember offhand.

Koz: What’s a Fimal?

Corey: FSMO. F-S-M-O. There are—I forget, it’s some trivia question that people love to haze each other with in Microsoft interviews. “What are the seven FSMO roles?” At least back then. And have to be moved before you decommission a domain controller or you’re going to have tears before bedtime.

Koz: Ah. Yeah, so Microsoft provides a certificate authority for use with Active Directory. They’ve had it for years and they had to provide it because back then nobody had a certificate authority, but AD needed one. The difference here is we manage it for you. And it’s backed by HSMs. We ensure that the keys are kept secure. It’s a serverless connection to your Active Directory tree, you don’t have to run any software of ours on your hosts. We take care of all of it.

And it’s been the top requests from customers for years now. It’s been quite [laugh] a bit of effort to build it, but we think customers are going to love it because they’re going to get all the security and best practices from private CA that they’re used to and they can decommission their on-prem certificate authority and not have to go through the hassle of running it.

Corey: A big area where I see a lot of private CA work has been in the realm of desktops for corporate environments because when you can pass out your custom trusted root or trusted CA to all of the various nodes you have and can control them, it becomes a lot easier. I always tended to shy away from it, just because in small businesses like the one that I own, I don’t want to play corporate IT guy more than I absolutely have to.

Koz: Yeah. Trust or management is always a painful part of PKI. As if there weren’t enough painful things in PKI. Trust store management is yet another one. Thankfully, in the large enterprises, there are good tooling out there to help you manage it for the corporate desktops and things like that.

And with private CA, you can also, if you already have an offline root that is in all of your trust stores in your enterprise, you can cross-sign the route that we give you from private CA into that hierarchy. And so, then you don’t have to distribute a new trust store out if you don’t want to.

Corey: This is a tricky release and I’m very glad I’m taking the week off it’s getting announced because there are two reactions that are going to happen to any snarking I can do about this. The first is no one knows what the hell this is and doesn’t have any context for the rest, and the other folks are going to be, “Yes, shut up clown. This is going to change my workflow in amazing ways. I’ll deal with your nonsense later. I want to do this.” And I feel like one of those constituencies is very much your target market and the other isn’t. Which is fine. No service that AWS offers—except the bill—is for every customer, but every service is for someone.

Koz: That’s right. We’ve heard from a lot of our customers, especially as they—you know, the large international ones, right, they find themselves running separate Active Directory CAs in different countries because they have different regulatory requirements and separations that they want to do. They are chomping at the bit to get this functionality because we make it so easy to run a private CA in these different regions. There’s certainly going to be that segment at re:Inforce, that’s just happy certificates happen in the background and they don’t think anything about where they come from and this won’t resonate with them, but I assure you, for every one of them, they have a colleague somewhere else in the building that is going to do a happy dance when this launches because there’s a great deal of customer heavy-lifting and just sharp edges that we’re taking away from them. And we’ll manage it for them, and they’re going to love it.

[midroll 0:21:08]

Corey: One thing that I have seen the industry shift to that I love is the Let’s Encrypt model, where the certificate expires after 90 days. And I love that window because it is a quarter, which means yes, you can do the crappy thing and have a calendar reminder to renew the thing. It’s not something you have to do every week, so you will still do it, but you’re also not going to love it. It’s just enough friction to inspire people to automate these things. And that I think is the real win.

There’s a bunch of things like Certbot, I believe the protocol is called ACME A-C-M-E, always in caps, which usually means an acronym or someone has their caps lock key pressed—which is of course cruise control for cool. But that entire idea of being able to have a back-and-forth authentication pass and renew certificates on a schedule, it’s transformative.

Koz: I agree. ACM, even Amazon before ACM, we’ve always believed that automation is the way out of a lot of this pain. As you said earlier, moving from a one-year cert to a five-year cert doesn’t buy you anything other than you lose even more institutional knowledge when your cert expires. You know, I think that the move to further automation is great. I think ACME is a great first step.

One of the things we’ve learned is that we really do need a closed loop of monitoring to go with certificate issuance. So, at Amazon, for example, every cert that we issue, we also track and the endpoints emit metrics that tell us what cert they’re using. And it’s not what’s on disk, it’s what’s actually in the endpoint and what they’re serving from memory. And we know because we control every cert issued within the company, every cert that’s in use, and if we see a cert in use that, for example, isn’t the latest one we issued, we can send an alert to the team that’s running it. Or if we’ve issued a cert and we don’t see it in use, we see the old ones still in use, we can send them an alert, they can alarm and they can see that, oh, we need to do something because our automation failed in this case.

And so, I think ACME is great. I think the push Let’s Encrypt did to say, “We’re going to give you a free certificate, but it’s going to be short-lived so you have to automate,” that’s a powerful carrot and stick combination they have going, and I think for many customers Certbot’s enough. But you’ll see even with ACM where we manage it for our customers, we have that closed loop internally as well to make sure that the cert when we issue a new cert to our client, you know, to the partner team, that it does get picked up and it does get loaded. Because issuing you a cert isn’t enough; we have to make sure that you’re actually using the new certificate.

Corey: I also have learned as a result of this, for example, that AWS certificate manager—Amazon Certificate Manager, the ACM, the certificate thingy that you run, that so many names, so many acronyms. It’s great—but it has a limit—by default—of 2500 certificates. And I know this because I smacked into it. Why? I wasn’t sitting there clicking and adding that many certificates, but I had a delightful step function pattern called ‘The Lambda invokes itself.’ And you can exhaust an awful lot of resources that way because I am bad at programming. That is why for safety, I always recommend that you iterate development-wise in an account that is not production, and preferably one that belongs to someone else.

Koz: [laugh]. We do have limits on cert issuance.

Corey: You have limits on everything in AWS. As it should because it turns out that whatever there’s not a limit, A, free database just dropped, and B, things get hammered to death. You have to harden these things. And it’s one of those things that’s obvious once you’ve operated at a certain point of scale, but until you do, it just feels arbitrary and capricious. It’s one of those things where I think Amazon is still—and all the cloud companies who do this—are misunderstood.

Koz: Yeah. So, in the case of the ACM limits, we look at them fairly regularly. Right now, they’re high enough that most of our customers, vast majority, never come close to hitting it. And the ones that do tend to go way over.

Corey: And it’s been a mistake, as in my case as well. This was not a complaint, incidentally. It was like, well, I want to wind up having more waste and more ridiculous nonsense. It was not my concern.

Koz: No no no, but we do, for those customers who have not mistake use cases but actual use cases where they need more, we’re happy to work with their account teams and with the customer and we can up those limits.

Corey: I’ve always found that limit increases, with remarkably few exceptions, the process is, “Explain to you what your use case is here.” And I feel like that is a screen for, first, are you doing something horrifying for which there’s a better solution? And two, it almost feels like it’s a bit of a customer research approach where this is fine for most customers. What are you folks doing over there and is there a use case we haven’t accounted for in how we use the service?

Koz: I always find we learned something when we look at the [P100 00:26:05] accounts that they use the most certificates, and how they’re operating.

Corey: Every time I think I’ve seen it all on AWS, I just talk to one more customer, and it’s back to school I go.

Koz: Yep. And I thank them for that education.

Corey: Oh, yeah. That is the best part of working with customers and honestly being privileged enough to work with some of these things and talk to the people who are building really neat stuff. I’m just kibitzing from the sideline most of the time.

Koz: Yeah.

Corey: So, one last topic I want to get into before we call it a show. You and I have been talking a fair bit, out of school, for lack of a better term, around a couple of shared interests. The one more germane to this is home automation, which is always great because especially in a married situation, at least as I am and I know you are as well, there’s one partner who is really into home automation and the other partner finds himself living in a haunted house.

Koz: [laugh]. I knew I had won that battle when my wife was on a work trip and she was in a hotel and she was talking to me on the phone and she realized she had to get out of bed to turn the lights off because she didn’t have our Alexa Good Night routine available to her to turn all the lights off and let her go to bed. And so, she is my core customer when I do the home automation stuff. And definitely make sure my use cases and my automations work for her. But yeah, I’m… I love that space.

Coincidentally, it overlaps with my work life quite a bit because identity in smart home is a challenge. We’re really excited about the Matter standard. For those listening who aren’t sure what that is, it’s a new end-all be-all smart home standard for defining devices in a protocol-independent way that lets your hubs talk to devices without needing drivers from each company to interact with them. And one of the things I love about it is every device needs a certificate to identify it. And so, private CA has been a great partner with Matter, you know, it goes well with it.

In fact, we’re one of the leading certificate authorities for Matter devices. Customers love the pricing and the way they can get started without talking to anybody. So yeah, I’m excited to see, you know, as a smart home junkie and as a PKI guy, I’m excited to see Matter take off. Right now I have a huge amalgamation of smart home devices at home and seeing them all go to Matter will be wonderful.

Corey: Oh, it’s fantastic. I am a little worried about aspects of this, though, where you have things that get access to the internet and then act as a bridge. So suddenly, like, I have a IoT subnet with some controls on it for obvious reasons and honestly, one of the things I despise the most in this world has been the rise of smart TVs because I just want you to be a big dumb screen. “Well, how are you going to watch your movies?” “With the Apple TV I’ve plugged into the thing. I just want you to be a screen. That’s it.” So, I live a bit in fear of the day where these things find alternate ways to talk to the internet and, you know, report on what I’m watching.

Koz: Yeah, I think Matter is going to help a lot with this because it’s focused on local control. And so, you’ll have to trust your hub, whether that’s your TV or your Echo device or what have you, but they all communicate securely amongst themselves. They use certificates for identification, and they’re building into Matter a robust revocation mechanism. You know, in my case at home, my TV’s not connected to the internet because I use my Fire TV to talk to it, similar to your Apple TV situation. I want a device I control not my TV, doing it. I’m happy with the big dumb screen.

And I think, you know, what you’re going to end up doing is saying there’s a device out there you’ll trust maybe more than others and say, “That’s what I’m going to use as my hub for my Matter devices and that’s what will speak to the internet,” and otherwise my Matter devices will talk directly to my hub.

Corey: Yeah, there’s very much a spectrum of trust. There’s the, this is a Linux distribution on a computer that I installed myself and vetted and wound up contributing to at one point on the one end of the spectrum, and the other end of the spectrum of things you trust the absolute least in this world, which are, of course, printers. And most things fall somewhere in between.

Koz: Yes, right, now, it is a Wild West of rebranded white-label applications, right? You have all kinds of companies spitting out reference designs as products and white labeling the control app for it. And so, your phone starts collecting these smart home applications to control each one of these things because you buy different switches from different people. I’m looking forward to Matter collapsing that all down to having one application and one control model for all of the smart home devices.

Corey: Wemo explicitly stated that they’re not going to be pursuing this because it doesn’t let them differentiate the experience. Read as, cash grab. I also found out that Wemo—which is, of course, a Belkin subsidiary—had a critical vulnerability in some of the light switches it offered, including the one built into the wall in this room—until a week ago—where they’re not going to be releasing a patch for it because those are end-of-life. Really? Because I log into the Wemo app and the only way I would have known this has been the fact that it’s been a suspiciously long time since there was a firmware update available for it. But that’s it. Like, the only way I found this out was via a security advisory, at which point that got ripped out of the wall and replaced with something that isn’t, you know, horrifying. But man did that bother me.

Koz: Yeah. I think this is still an open issue for the smart home world.

Corey: Every company wants a moat of some sort, but I don’t want 15 different apps to manage this stuff. You turned me on to Home Assistant, which is an open-source, home control automation system and, on some level, the interface is very clearly built by a bunch of open-source people—good for them; they could benefit from a graphic designer or three to—or user experience person to tie it all together, but once you wrap your head around it, it works really well, where I have automations let me do different things. They even have an Apple Watch app [without its 00:32:14] complications on it. So, I can tap the thing and turn on the lights in my office to different levels if I don’t want to talk to the robot that runs my house. And because my daughter has started getting very deeply absorbed into some YouTube videos from time to time, after the third time I asked her what—I call her name, I tap a different one and the internet dies to her iPad specifically, and I wait about 30 to 45 seconds, and she’ll find me immediately.

Koz: That’s an amazing automation. I love Home Assistant. It’s certainly more technical than I could give to my parents, for example, right now. I think things like Matter are going to bring a lot of that functionality to the easier-to-use hubs. And I think Home Assistant will get better over time as well.

I think the only way to deal with these devices that are going to end-of-life and stop getting support is have them be local control only and so then it’s your hub that keeps getting support and that’s what talks to the internet. And so, you don’t—you know, if there’s a vulnerability in the TCP stack, for example, in your light switch, but your light switch only talks to the hub and isn’t allowed to talk to anything else, how severe is that? I don’t think it’s so bad. Certainly, I wall off all of my IoT devices so that they don’t talk to the rest of my network, but now you’re getting a fairly complicated networking… mojo that listeners to your podcast I’m sure capable of, but many people aren’t.

Corey: I had something that did something very similar and then I had to remove a lot of those restrictions, try to diagnose a phantom issue that it appears was an unreported bug in the wireless AP when you use its second ethernet port as a bridge, where things would intermittently not be able to cross VLANs when passing through that. As in, the initial host key exchange for SSH would work and then it would stall and resets on both sides and it was a disaster. It was, what is going on here? And the answer was it was haunted. So, a small architecture change later, and the problem has not recurred. I need to reapply those restrictions.

Koz: I mean, these are the kinds of things that just make me want to live in a shack in the woods, right? Like, I don’t know how you manage something like that. Like, these are just pain points all over. I think over time, they’ll get better, but until then, that shack in the woods with not even running water sounds pretty appealing.

Corey: Yeah, at some level, having smart lights, for example, one of the best approaches that all the manufacturers I’ve seen have taken, it still works exactly as you would expect when you hit the light switch on the wall because that’s something that you really need to make work or it turns out for those of us who don’t live alone, we will not be allowed to smart home things anymore.

Koz: Exactly. I don’t have any smart bulbs in my house. They’re all smart switches because I don’t want to have to put tape over something and say, “Don’t hit that switch.” And then watch one of my family members pull the tape off and hit the switch anyways.

Corey: I have floor lamps with smart bulbs in them, but I wind up treating them all as one device. And I mean, I’ve taken the switch out from the root because it’s, like, too many things to wind up slicing and dicing. But yeah, there’s a scaling problem because right now a lot of this stuff—because Matter is not quite there all winds up using either Zigbee—which is fine; I have no problem with that it feels like it’s becoming Matter quickly—or WiFi. And there is an upper bound to how many devices you want or can have on some fairly limited frequency.

Koz: Yeah. I think this is still something that needs to be resolved. You know, I’ve got hundreds of devices in my house. Thankfully, most of them are not WiFi or Zigbee. But I think we’re going to see this evolve over time and I’m excited for it.

Corey: I was talking to someone where I was explaining that, well, how this stuff works. Like, “Well, how many devices could you possibly have on your home network?” And at the time it was about 70 or 80. And they just stared at me for the longest time. I mean, it used to be that I could name all the computers in my house. I can no longer do that.

Koz: Sure. Well, I mean, every light switch ends up being a computer.

Corey: And that’s the weirdest thing is that it’s, I’m used to computers, being a thing that requires maintenance and care and feeding and security patches and—yes, relevant to your work—an SSL certificate. It’s like, so what does all of that fancy wizardry do? Well, when it receives a signal, it completes a circuit. The end. And it’s, are really better off for some of these things? There are days we wonder.

Koz: Well, my light bill, my electric bill, is definitely better off having these smart switches because nobody in my house seems to know how to turn a light switch off. And so, having the house do it itself helps quite a bit.

Corey: To be very clear, I would skewer you if you worked on an AWS service that actually charged money for anything for what you just said about the complaining about light bills and optimizing light bills and the rest—

Koz: [laugh].

Corey: —but I’ve never had to optimize your service’s certificate bill beca—after you’ve spun off the one thing that charges—because you can’t cost optimize free, as it turns out, and I’ve yet to find a way to the one optimization possible where now you start paying customers money. I’m sure there’s a way to do that somewhere but damned if I can find it.

Koz: Well, if you find a way to optimize free, please let me know and I’ll share it with all of our customers.

Corey: [laugh]. Isn’t that the truth? I really want to thank you for taking the time to speak with me today. If people want to learn more, where’s the best place for them to find you?

Koz: I can give you the standard AWS answer.

Corey: Yeah, www.aws.com. Yeah.

Koz: Well, I would have said koz@amazon.com. I’m always happy to talk about certs and PKI. I find myself less active on social media lately. You can find me, I guess, on Twitter as @seakoz and on Bluesky as [kozolchyk.com 00:38:03].

Corey: And we will put links to all of that in the [show notes 00:38:06]. Thank you so much for being so generous with your time. I appreciate it.

Koz: Always happy, Corey.

Corey: Jonathan Kozolchyk, or Koz as we all call him, general manager for Certificate Services at AWS. I’m Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, insulting comment that then will fail to post because your podcast platform of choice has an expired security certificate.

Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.