Screaming in the Cloud

On this Screaming in the Cloud In this episode of Screaming in the Cloud, Corey Quinn is joined by AWS container hero and security engineer at the Python Software Foundation, Mike Fiedler. They delve into the intricacies of Python's ecosystem, discussing the evolution of PyPI, its significance, and the ongoing battles against security threats like account takeover attacks and typo-squatting. Mike sheds light on his role in maintaining the security and reliability of the Python Package Index, the importance of 2FA, and the collaborative efforts with security researchers. Corey and Mike also explore the challenges and philosophies surrounding legacy systems versus greenfield development, with insights on maintaining critical infrastructure and the often-overlooked aspects of social engineering.


Show Highlights
(0:00) Introduction
(0:47) The Duckbill Group sponsor read
(1:21) Breaking down the Python nomenclature and its usability
(5:49) Figuring out how Boto3 is one of the most downloaded packages
(6:43) Why Mike is the only full-time security and safety engineer at the Python Software Foundation
(9:53) How the Python Software Foundation affords to operate
(14:17) Mike's stack security work
(16:14) The Duckbill Group sponsor read
(16:57) Having the "impossible job" of stopping supply chain attacks
(21:00) The dangers of social engineering attacks
(24:44) Why Mike prefers to work on legacy systems
(33:30) Where you can find more from Mike


About Mike Fiedler
Mike Fiedler is a highly analytical, forward-thinking Information Technology professional. His broad-based background includes systems administration and engineering in global environments. Mike is technically astute and versatile with ability to quickly learn, master, and leverage new technologies to meet business needs and has a track record of success in improving performance, stability, and security for all infrastructure and product initiatives.
Mike is also bilingual, speaks English and Hebrew, and he loves solving puzzling problems.


Links


Sponsor
The Duckbill Group: duckbillgroup.com 

What is Screaming in the Cloud?

Screaming in the Cloud with Corey Quinn features conversations with domain experts in the world of Cloud Computing. Topics discussed include AWS, GCP, Azure, Oracle Cloud, and the "why" behind how businesses are coming to think about the Cloud.

Mike Fiedler: Social engineering or social acumen, I think is very important to exercise because at the end of every security pipeline or security process are humans.

Corey Quinn: Welcome to Screaming in the Cloud. I'm Corey Quinn. I am joined today by longtime friend and first time guest somehow, Mike Fiedler. Who is, among other things, an AWS container hero, which is far from the most interesting thing about him. His day job is the PyPI safety and security engineer at the Python Software Foundation.

Mike, thank you for joining me. I'm surprised you could find the time. I thought people were all busy fixing dependency problems.

Mike Fiedler: Thanks for having me, Corey. It's great to be on. I also cannot believe that I've never been on before, but it's great to be here.

Sponsor: This episode is sponsored in part by my day job, the Duckbill Group. Do you have a horrifying AWS bill? That can mean a lot of things.

Predicting what it's going to be. Determining what it should be. Negotiating your next long term contract with AWS. Or just figuring out why it increasingly resembles a phone number, but nobody seems to quite know why that is. To learn more, visit duckbillgroup. com. Remember, you can't duck the duck bill, Bill.

And my CEO informs me that is absolutely not our slogan.

Corey Quinn: I have to start here. It's a little bit of a confusing ecosystem, to put it gently. I use Python when I want to get work done. It's very much not something that I've ever really peeked behind the curtains on.

It's just there. I type pip install or poetry install or one of the other five ways of installing dependencies, all of which hate each other. And that sort of gets the job done, and I move on with my life. What is PyPI versus Python, the software language, versus the Python Software Foundation, versus a big snake?

Mike Fiedler: Fair enough. Let's try and disambiguate by starting kind of from the beginning. Python, the language, is a standards driven, open source language invented In 1991, I want to say, and it's been around for, for about 30 years, and it's been in constant evolution. There's a bunch of really committed core developer contributors that have been volunteering their time and effort to develop the Python language ever since.

That is the tool that most people use to get their work done.

Corey Quinn: Most people are familiar, who've been around for a little while, are familiar with the mass migration of Python 2 to 3, which for almost all of us insisted almost entirely of replacing print followed by a space and quotation marks with print open parentheses without a space and then the quotation marks.

Mike Fiedler: Yeah, the 2 to 3 migration was a challenging one for everybody to the point where, uh, amongst the core developers, I think there's a tacit agreement of there will be no Python 4 because it hurt the world so much to make the 2 to 3 migration. That said, that doesn't mean that the language doesn't evolve and things don't change.

It just changes at a slower pace and kind of has a long tail of support. I believe five years for every Python major revision is supported now.

Corey Quinn: It's kind of impressive on some level. Well, who uses Python? Everyone. Freaking everyone uses Python for something, somewhere. It's probably one of the most approachable user friendly languages out there.

You can read the code and it does almost exactly what you would expect. Most of the time. Yes, yes, yes. You can do obfuscated Python if you want to be particularly obnoxious, or, you know, edit someone else's code until it turns out that way if you want, but it's very approachable. And on some level, it's like, is this actually how it works?

It feels like it's too easy.

Mike Fiedler: Yeah, I think that's one of the things that has led to Python's major adoption across the world. At GitHub Universe earlier this year, they announced that according to their kind of measurements, Python is the- has overtaken JavaScript as the most popular language on GitHub.

Corey Quinn: With the caveat of they broke out TypeScript separately from JavaScript, as I recall the internet drama of the hour.

Mike Fiedler: Oh, I do not recall that, but I'll take the win.

Corey Quinn: Hey, you do super well when they kneecap the competitor. It goes great.

Mike Fiedler: Works out for me. And stats, right? But the TBOE index, which has their own measurement of language popularity, has been tracking Python for a while, and it has been number one for a number of years.

Uh, we'll drop a link to that in some show notes. But the thing that remains, that, like, we've put Python on the moon through the Mars, sorry, on Mars, on the Mars helicopter. So like that is powered by Python. Things all over the planet, whether that be, you know, industrial machinery or tools that build industrial machinery or, you know, your doctor's office or games you want to play with your kid. So all of these things are built in Python.

So it's quite pervasive largely because it is that approachable, but also because there's a multitude of user extensions or projects, aka dependencies, libraries, packages, there's lots of names for the same thing, that exist so you can kind of pick up a well formed Lego block that does most of what you're doing, plug it in, and keep on going and doing what you're doing.

Corey Quinn: Yeah, for those who have heard of Boto3, that is the There's the AWS SDK or extension for Python. There's a reason you don't have to wrap every request with its own SIGV4 signing or, and have to wind up construct your own HTTP part response parser. You just tell it to do the thing and it does the thing and it's glorious.

Mike Fiedler: Yeah. So Boto3 is one of the most widely downloaded packages in the Python ecosystem because it's so widely used to interact with AWS. That could be on servers, it could be on CLI tools, it could be inside Lambda functions which run Python, that want to re-interact back with AWS APIs.

Corey Quinn: So I have to ask, this probably ties back to what you do for your day job.

How do you know that it's one of the most widely downloaded packages?

Mike Fiedler: So I work on PyPI, which is an acronym for the Python Package Index at pypi. org. You mentioned that you would pip install, or poetry install, or use one of the many other tools to download and install Python.

Corey Quinn: I wave my hands in the air and fret until someone comes and fixes it for me, like a toddler.

Mike Fiedler: Hey, whatever gets the job done, right? And, so I asked this at a data science networking event of like, "okay, when you run pip install inside your Jupyter notebook, where do you think that comes from?" And they were like, "I don't know. It just works."

Corey Quinn: Jupiter. Duh. Yeah. You already said it was on Mars.

What, what's the problem here?

Mike Fiedler: Could be on Jupiter. But I was like, "okay, I work on the thing that gives you those packages reliably and kind of securely every single time." I'm not the only one who works on it, but I'm the only one who works on it full-time.

Corey Quinn: You are the safety and security engineer.

Feels like, well, we only need one problem solved, which is that always feels like a weird role to have existing in isolation. But it's interesting as well that they have someone devoted full time to caring about this, because on some level, it doesn't sound that hard. Basically a big web, static website that gets recompiled whenever someone updates something, which is probably frequently, okay, that adds a little bit of complexity.

And, but, but where's the hard part? Does that require a full-time security person? Oh wait, it links to arbitrary third party things that wind up, you know, effectively turning into a remote code execution. You can trick people into including a line that says, "import some magic string." Okay, this starts to be a little more interesting.

Even as I'm thinking about this like, "Oh, that's why you're there. How is there only one of you?"

Mike Fiedler: Yeah, so the index itself, PyPI, has been in existence in one form or another for about 20 years. It's gone through a bunch of different hands, rewrites, and kind of reimaginations. It used, the reason it's called an index is because it used to only be an index, just a page, an HTML page with links to other people's storage of software.

And over time, because folks, you know, don't have hosting capacity or didn't want to kind of keep those packages up in perpetuity, we moved over to a hosting mode where anyone across the internet can upload a package to pypi. org. And other folks can pip install and download it without having to worry about where is this from.

Corey Quinn: It feels like the original version could have just been like a script that runs on cron every five minutes. It grabs all the packages and creates an index file, and I'm sure it was originally, and to your lasting shame, that script was no doubt written in Perl, so it needed to be fixed immediately.

And here we are.

Mike Fiedler: I'm not going to go down that path because I don't, I'm not as familiar with the super legacy code, but most of it that I have seen, it has been written in Python.

Corey Quinn: You do eat your own dog food.

Mike Fiedler: Yes, you got to. But the permutations of this have kind of evolved, and it's been largely volunteer driven, volunteer contributors, volunteer admins for this massive service that basically underpins a non-small part of the internet and the operations. Now, there are plenty of mirrors out there and mirroring softwares that you can set up for your own corporation and your enterprise to say, we would like to, you know, keep our own copy of the index locally so that way if there's any problem upstream we still have all those packages we have, and most folks should do that, because then you are owning your own availability.

But for the vast majority of open source consumers out there which are most people, there's not a need to build your own index, so we make sure that it is up and running. We can't do that on our own without a lot of generosity, because the Python Software Foundation is a non profit, and my role in particular is funded through generous donations from folks like Amazon Web Services, and then next year through the Alpha Omega Foundation, but the ability to Invest in keeping critical infrastructure up and running is not an easy task that you can just hand wave and say some company will handle this.

Corey Quinn: No, even the raw cost of this would bankrupt most folks. I don't know where you actually host these things. It never occurred to me to look at, okay, when you're downloading from the local mirror, Where is that mirror resolving to, exactly? Because if it's one of the major cloud providers, egress fees are expensive.

Yeah, if I download a gig, it'll be nine cents, and I'm not much of a user, but you know, I have a dumb provisioning process that redownloads every single time the container runs. Oops. And there's a lot of idiots, like me, out there doing the exact same thing. This can become a massive denial of wallet attack if you're not, I guess, conscious about how these things are supposed to work.

Mike Fiedler: So we are conscious about it and we're, again, we've gotten some great donations. So our infrastructure is donated largely by Amazon Web Services and Google Cloud through their Open Source Contribution Program, but the actual egress that you are kind of consuming when you are re-downloading the same package again and again, is a donation from Fastly through their Fast Forward Program for open source and non-profit users.

So we are achieving close to a 99.95 cash hit ratio through the Fastly network. So most of the calls of the, I don't know, 60,000 requests per second that are happening against PyPI's APIs are going to hit a Fastly cash and never phone home all the way back to us.

Corey Quinn: Which is phenomenal. It's, if you can resolve, if you can service a request as close to the customers or user as possible, go ahead and do it.

Sorry, they're not customers, they're users. If they're not paying you, they are not customers. That is going to be a horrible revelation for a whole bunch of freemium model startups someday.

Mike Fiedler: Yes, they do not, we do not charge money for usage. There is potentially some paid thing for corporations and organization features that is kind of in the works, but that is not live yet.

So yeah, we exist currently 100 percent on kind of inbound donations. The ability to do so, again, is made possible by our generous donors, but it's also made possible by folks like myself and other contributors talking about what we're doing and publicizing the ability, and at the end of each year, our impact report at the Python Software Foundation, I tried and give them some Stats about how much we've grown in our usage over the years.

And it's kind of doubling every year, which is crazy to think about.

Corey Quinn: Yeah, that's right. It was Daniel Sternberg who first really drew my attention to this problem years ago. He is the original author of Curl. You know, that library that's used by freaking everything, every time you need to make a web request.

And it turns out he was doing most of this as a labor of love and trying to figure out, like, what do you do for a day job? It's like, that is absurd. Like the old XKCD of the entire tower of blocks built on one thing maintained by someone in Nebraska is very much that type of approach. And, oh right, you have to make a little bit of noise, especially when you have something like this, in order to get the resources and attention it needs.

Because otherwise, look at all the browser extensions that are abandonedware and then get purchased for some paltry sum of money by malware authors, and they just go ahead and inject whatever they want. Everyone's browser still has that configured. That's a terrifying threat vector.

Mike Fiedler: It really is. And the notion of that exists in the universe of PyPI, but it's less so about folks being abandoned.

And the things that we've tried to protect against are account takeover attacks. So we reference Bodo, right? Bodo is managed by the release team who kind of uploads new versions to pypi. org on schedule, and every single one of the folks who has access to that project could become a phishing target.

And if they can get phished, they can get their accounts kind of compromised, and then someone else could be uploading Bodo, and that's a problem. So, in order to kind of prevent that, we enacted a couple years ago, and finally forcefully put our foot down in 2024, to require 2FA for all user accounts.

So, a second factor for authentication prevents the casual account takeover attacks. Again, no security is perfect, but we move forward on our progress to getting more secure solutions out there so that the universe will be a little more secure every single iteration we go at it.

Corey Quinn: I guess a question I probably should have asked a little bit earlier in this conversation is when you talk about security, especially when you're talking about something that I perceive to be relatively low level infrastructure, my naive assumption at first was, "Oh, you're just the person that makes sure that the web server stays patched and, you know, doesn't have the SSH port just hanging out there, flapping in the breeze for everyone to connect to."

Sounds like what you're doing is a lot more up the stack.

Mike Fiedler: It is. I do kind of work throughout the different layers of the stack.

Corey Quinn: You do that too, though.

Mike Fiedler: Well, we do have our director of infrastructure and an infrastructure engineer who work on not just the PyPI universe, they work on everything Python Software Foundation.

So, you know, they can dedicate a fraction of their time to PyPI. So I'll collaborate with them on certain parts of the stack, but in the kind of application side, and all the way up to the client side, that's kind of where I live. And I try to find new ways to prevent the problems from happening. So, like, problems happen. You know, folks do register for an account. It's a free service. They'll upload some malware. They'll abuse the service, and we also partner with a variety of volunteer security researchers. And these are folks who may have their own security research companies, but a lot of them are just an email that have kind of expressed some interest, and can report back to us, "Hey, this new thing that somebody uploaded 10 minutes ago looks really smelly, and here's why." And then that, we've developed the kind of messaging and pingback and workflows to get that information into a PyPI admin's hands as soon as possible to be able to get the context, react quickly, and take that down, put it in a quarantine state, or take it down completely to prevent anybody from accidentally falling for a, you know, Discord message that says, "Here, you should run this pip install command, and that'll solve all your problems," because that's a huge vector.

Sponsor: Here at the Duckbill Group, one of the things we do with, you know, my day job, is we help negotiate AWS contracts. We just recently crossed five billion dollars of contract value negotiated. It solves for fun problems such as how do you know that your contract that you have with AWS is the best deal you can get?

How do you know you're not leaving money on the table? How do you know that you're not doing what I do on this podcast and on Twitter constantly and sticking your foot in your mouth? To learn more, come chat at duckbillgroup. com. Optionally, I will also do podcast voice when we talk about it. Again, that's duckbillgroup.com.

Corey Quinn: It seems like there's, very much, a taking for granted of all of these things. Like the idea of a supply chain attack feels like it's this esoteric, very remote, very obscure thing until it happens to you and you realize just how easy it is for something like that to happen. You basically have an impossible job.

Mike Fiedler: Yeah, I mean, every, every job if you frame it like that is impossible, but I think the hope that we have is that by raising the visibility of potential supply chain attacks, we get people interested in, "Okay, well, what do I do about that?"

So we can go into that in a bit, but there's also the like, it won't happen to me, it will happen to somebody, and it might not happen to you, it'll happen to somebody near you, and then they'll move laterally and get you, right? So this is something that attackers love to do, is find some sort of injection point, and then move laterally.

So it's like, even if I can kind of get you to install a Discord thing, guess what? I'm now Corey Quinn on Discord and I can ask other people to do more serious things. So just because I got a Discord token doesn't mean that that's the end of the attack.

Corey Quinn: Let me ask you this and feel free to tell me that it is not something you can discuss and that will be fine.

But let's, I, let's say I have an evil nefarious idea that I want to trade in the latest version of cryptocurrency, AWS Credits. So what I'm going to do is I'm going to publish a package Boyo3, just change the T to a Y, one key off typo. Given the scale of it, I'm sure people do it all the time. And then I'm going to upload a direct clone of Boto3 with one minor change in that it just takes any environment credentials or keys it finds and sends them to me.

Where does that break down, as I start down that, or does it?

Mike Fiedler: Conceptually, it does not break down. The mechanisms that you would use in order to get that Boyo3, that is what we would call a "typo squatting attack." So you are squatting on a name and hoping that somebody typos that. For any company, who, or any individual who is installing packages, we often recommend using a hash based lockfile from a tool like Poetry, or an extension called pipcompile, or piptools, or any number of the other tools out there, to generate a, "these are known hashes to this file."

GitHub's Dependabot and other kind of dependency update tools recognize these lock files and they will go out and check out and do updates for you. So as long as you get that first package in correctly, spelled Boto3, you are unlikely to ever get Boyo3, because You are not typing pip install boyo3. But for the offhand, one off people who are doing so, or if you're, you know, using that in a Dockerfile command that does not have hashes, and you did a typo, yeah, that's gonna be a problem.

So, what we try to do in that situation, is once it's reported to us, we get it off the index. And we, we often prohibit that name from further use because it is now known to be malware. So even if you got it, the next time you run your pip install command, even with the typo, it should fail. And at that point, you should hopefully notice.

We also take these takedowns and, and report some of them out to a, a malicious database or an advisor, advisory database. That folks can use, uh, as like a PIP audit to see, am I using any of these, these files? Known problems. The vast majority of the typosquats don't make it in there because it's just a lot of noise.

At some point they may, but they don't today. The final kind of piece is that we have these wonderful volunteer researchers who will try and profile the attackers, and kind of, build up a, "this is what this looks like," and update their scanning rules to kind of listen to the PyPI feed and see, "is this a new one that looks like that?"

"Yeah." "Okay, let's report it." And that makes our lives as admins a little easier because we've seen this kind of attack before. We know this reporter. That we can kind of match things a lot faster.

Corey Quinn: There's a lot to focus on when you're chasing these things down. You have that aspect of it. Social engineering attack that you alluded to a few minutes ago about impersonating someone on Discord.

That's, that was wild. Back when I was Freenode network staff many years ago, we had someone pretending to be Matthew Prince, CEO of Cloudflare, try and just tell us- let us mention in passing what the origin was of something that Cloudflare was protecting. Yeah. That doesn't seem like the sort of thing that he would actually ask because he's not, you know, an idiot.

Imagine that. So it was, that took a fair hit of thought there because I'm used to running stuff that no one really cares about. When you're talking about running something that everyone uses, on some level, everyone has to care about it, but most people don't even know that it exists.

Mike Fiedler: Yeah, that's, that's, I think, one of these kind of interesting problems, right?

I know you had done an episode on remote development environments, right? So, using remote development environments is a way to sandbox your development environment, your code, to say, "you know what? There's nothing in this sandbox that shouldn't be there already. So if I am installing untrusted code, it can only do what it- it can only affect whatever's in there."

Now, again, that might be more than you want. But at least it's sandboxed. There's other methods that are, you know, around social engineering that are fascinating to think about. I do remember one company I was at. At a company, All Hands, the CFO stood up in front of the entire company and said, "I will never text anyone to wire me money. So if you ever see a text from me for a wire, that is not me. Do not trust it. Report it immediately to our IT team." And it's like that kind of social engineering or social acumen, I think is very important to exercise because at the end of every security pipeline or security process are humans. And we're just trying to do the best we can.

Corey Quinn: We've seen that here at The Duck Bill Group, a targeted spearfishing attack, where a bunch of, some pretending to be Mike, you know, my business partner and the CEO, was sent to three or four people here. Not me, suspiciously, okay? Asking to have, I think it was iTunes gift cards sent, and we'd already spoken to the team. It's like, great, you know how normally, like, if he wants to do something weird, we'll talk about it in Slack?

Yeah, if you can get access to our Slack as him, congratulations, you, you probably earned it at that point, but just some, this is Mike, it's my new phone number, my personal phone. Yeah, sure it is, buddy. Sure.

Mike Fiedler: I think that that's a far overlooked part of all security is, you know, does this make sense, right?

Did what I just get make sense to me? Should I verify? Should I blindly accept it? And I think that that's the same thing that we should apply to software packages and things we install is, do I know where this came from? You know, a little bit farm to table as it were. Let's look at the ingredients that I'm about to consume.

If I'm doing a one off BS script in a cloud environment, that's just trying something out. Maybe it's not that critical, but if I'm building my team, my project, my company's infrastructure, let's take a more critical eye to what it is we're doing and, you know, secure our environment. Maybe we shouldn't make- maybe we should not be able to make network calls externally to untrusted sources.

That's an old school, you know, outbound firewall rule that is pretty easy to kind of conceptualize. And, you know, maybe we should apply things like that to secure our teams. So, there's a variety of different tactics in play, all to kind of make security a little bit easier for folks. I want to make sure that whatever is on the index is relatively trustworthy.

So, you know, there are well over half a million packages and, you know, 12 million releases of those packages out there today, I can't monitor all of them. But we have a great network of folks who are doing that.

Corey Quinn: One other topic that I wanted to get into with you. It's something that you said that I thought was just shockingly objectionable enough to be worth talking to you about here.

And how did you phrase it? Specifically, that you were talking about legacy systems and that you found it to be more satisfying to work in legacy environments than do greenfield development. Now, before we dive in, yes, of course, legacy is that condescending engineering term for it makes money somehow, but where do you come from on this? So given that you work for a nonprofit, you definitionally do not make money with it.

Mike Fiedler: Yeah, I've worked at a variety of different companies throughout my 30 years in this ridiculous industry that we all call work, and something has been true. The word "legacy" is often used, as you said, kind of as disparaging as this is bad.

But legacy is the thing that is left behind, right? Software that is not useful gets replaced. Software that is useful is still around. Sometimes people don't like it, and that's a different problem. But it's still useful, otherwise it wouldn't be there. And working on systems that are useful, I think, I find the most satisfaction, because somebody is deriving value out of this. Whether that be, you know, some other team consuming an API, whether that be the direct channel to a B2B integration, or a website that is, you know, a website feature that is only used by like 3 percent of our clientele, it's still useful. If it wasn't useful, we should get rid of it. And with a legacy system, you have a lot of the pieces in play that you already need to make that money. Right? Or to get that goal. Whereas greenfield development is far more aspirational and unknown. You're going to hit new and interesting bugs.

I'd like to hit well debugged systems and kind of evolve them over time, as opposed to saying, "let's jettison this thing and, and start a brand new stack." Because you're going to spend more time debugging things you had already debugged.

Corey Quinn: One thing that I've always had a keen appreciation for that I think transfers is as a consultant, you have to have respect for what came before.

There's a reason things are the way that they are. There are constraints that may have been non-obvious. And whenever I'm dealing with existing code bases, "okay, why was it written like this?" There's probably a reason that I'm not immediately seeing. Before I go in and start ripping everything out, maybe read through it once or twice, just to get a vague idea of what it is you're about to do here.

Because what you're doing now works, sort of, and you want to make it do something different, maybe understand what it's doing now so you don't inadvertently leave it in pieces on the floor.

Mike Fiedler: Yeah, I think Martin Fowler goes a lot into that in his book Refactoring, and I think where a lot of people fall on this whole, like, "legacy systems are bad," are, "I don't understand it enough."

Or, "it doesn't feel good in my hand," right? Or, like, "the modules aren't constructed the way I would like them to be." Guess what? You can refactor it. You can change it. You can modify things to make them nicer. Sandy Metz, the author of Practical Object Oriented Design in Ruby, had a phrase that I really liked. I don't know if that was the origin, but it was, "Make the change easy, and then make the easy change."

Sometimes making the change easy is actually very hard, but if you make the thing that you want to do, like the second part, easy to change, then next time you need to change it, it will likely be easier to change. So, constant refactoring is something that a lot of teams that I've worked with have kind of bemoaned. That the product managers are never going to let us have time to refactor.

It's like, that's part of the work. It's not like a ticket to refactor, it's you're touching this module, You need to bake in an extra X amount of time to refactor to make the easy change.

Corey Quinn: And part of that too is, there's the idea that when I talk to customers all the time, or clients, we call them when we're, you know, actually a professional services company, we've never really seen, except in extreme situations, someone rewriting an application for cost purposes. Someone did that once when they were on, once upon a time, Google App Engine, realizing big table charge per write. They were hemorrhaging 800 grand a month for what became a quarter million dollar AWS bill. Great, so get off of that and move. Sure, most of the time it doesn't happen that way.

So, they'll care about cost and do a refactor in the next version of the architecture. So you can bake that in, but almost never are they going to do a rewrite just to save money, nor should they, frankly.

Mike Fiedler: Well, it's one of those things that I've had to make this decision a few times throughout my career of like, what- when is the right time, right?

If it's for cost, well, what else could we be doing? And is this costing us more? How many engineers are we going to have work on this? How much downtime? How many outages are we willing to take?

Corey Quinn: All of them.

Mike Fiedler: Until it's done. And how much will that cost us? Oh, wait, That number is much higher than just living with the cost of, you know, an inefficient system for another few years.

Yeah, okay, let's not do it yet, right? But let's set ourselves a marker to say, in two years, or whatever their time frame is, let's re-evaluate these pieces and see if they are still true, and if they exceed the threshold now. And are we willing to invest this amount of money on renovating a house, basically, right?

You have to do a gut reno, and pull pieces out, and move walls? That's pricey. And where are you going to live in the meantime? So, all of those things have to be true in order for you to, like, actually make a fiscal argument for a refactor. But, if you are constantly doing a little bit of refactoring every pull request to say, "ah, this module is sharing enough code with three other modules, let me pull out a helper function and reuse it," oh great. Now we have the right abstraction, we have not prematurely abstracted, we have the right abstraction, and it costs us less to maintain it because we've written the test around the helper function instead of the top level function, so writing the test is even easier.

Corey Quinn: The challenge, of course, is the other side, is this does act as a form of technical debt. Like the CDK is great when you're trying to build something out in in a reasonable recurring way. But where it falls down, at least for me, is, "okay, I built and deployed a thing. Now I don't have to touch it again for two years."

Great, now I do it, and it is screaming and whining at me every step of the way when all I'm trying to do is change a single line of code, maybe even a text string. Oh, that version of Node or Python is deprecated. We won't support that in Lambda. Oh, your versions of everything is ancient, updated. Oh, now you have critical breaking dependencies.

It's like, I just want to change a single line of code. I want to refactor everything to modern. But I'm faced with no choice when that happens. It's very frustrating.

Mike Fiedler: That is a frustration I have had myself. And one way I've found to combat that is to take kind of this approach of if it hurts, do it more often.

So, you know, in this case of like, you do it more often so that way it doesn't hurt as much. So for me, I've got Dependabot running and pushing updates and have a a CI CD pipeline for some of my projects that I trust enough that I can just merge and deploy. And if it passed CI, I am happy enough with it.

So, then you can automate pushing that button as well, and then you can just have stuff flow through your system, and then when you come back to it two years later, you It's relatively fresh.

Corey Quinn: That's a good way of doing it. The challenge, of course, I'm doing this across a bunch of different languages simultaneously.

Everything requires its own approach. I've been using ASDF as a version manager, which is universal. But then that conflicts with a bunch of things in some particular Python and other language workflows. All of it's a disaster. Nobody's happy. I'm starting to love ephemeral dev environments for this explicit purpose, but then nothing I need is local there, and I need to install the universe again.

And, uh, computers are hard. That's why we have jobs. People are harder. That's why you have jobs.

Mike Fiedler: People are very hard. People can be great and, and the worst, right? Like, they fall on the spectrum of everything. Software is the same. I think one of the reasons we have so many package management tools is because we're standards driven as opposed to, you know, one tool opinion driven.

The standard is, you know, the PyPI index will have an HTML page that has all the files, versions, and all the hashes kind of contained in them. That's a standard. You can build your own tool if you don't like it based on those standards. And that's how we get tools that it's like, well, it does what I want it to do.

And it's like, great, it will work. And we commit to kind of having that contract with our consumers. So that way, if you want to use poetry, if you want to use pip, if you want to use something else, that's your prerogative, but like, that's also your problem. There are no guarantees. So anybody saying, "I'm going to guarantee that this won't conflict," they're probably trying to sell you something.

Corey Quinn: I really want to thank you for taking the time to speak with me about all this. If people want to learn more, where's the best place for them to find you?

Mike Fiedler: I am on, uh, Mastodon on the Hackyderm site at, I'm at MikeTheMan on Hackyderm. I'm at MikeTheMan. com on BlueSky, and you can also read all the blog stuff I write on PyPI at blog.

pypi. org.

Corey Quinn: And we will include links to all of that. In the show notes. Thank you so much for finally agreeing to sit down with me and suffer the slings and arrows. I appreciate it.

Mike Fiedler: Absolutely.

Corey Quinn: Mike Fiedler, THE security engineer at the Python Software Foundation. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud.

If you've enjoyed this podcast, please leave a 5 star review on your podcast platform of choice. Whereas if you've hated this podcast, please, with a five star review on your podcast platform of choice, along with an angry comment, including a screen on why we should be using Rust instead. Please be sure to link to your PowerPoint deck.