Matt and Justin talk about staging servers and try to answer the question "do we need them?"
You Ain't Gonna Need It, the podcast where we look at software practices and tools and ask: "do we need it?"
Justin: I liked our, our full on, left turn into ranting about the state of dev content today. That was fun. That felt very, uh, personally gratify.
Matt: That's good. You know, at least, we'll, at least we'll both be happy with it then. At least two listeners.
Justin: Audience of two. Perfect.
Matt: We just have to sell our course on becoming a hot take Twitter account.
Justin: How to be a nostalgia merchant?
Matt: Yes,
Justin: You can buy it in 10 easy payments of 9 99 each.
Matt: I'm not sure that's gonna keep the, the lights on, but, uh, it's a start.
Today I'm joined by Justin Duke. He is the founder and sole operator of Button down a SaaS application for sending and growing newsletters. Justin got his start in software working as an individual contributor at Amazon and Stripe before moving into engineering management. On today's episode of Agni, we talk about whether or not we really need staging servers, if software culture is stuck in 2010, putting up intentional roadblocks and why I am black build on programming content creation.
Welcome to Agni.
I tweeted out something a little bit ago that said like what software practices are. Considered virtues, but are actually sort of pointless. And, uh, you had an interesting response, which was, um, staging servers. So, um, yeah, I don't know if you have something queued up that you're ready to get off your chest about, uh, you know, staging servers.
Um, but maybe, yeah. Like, why, why did that strike you? As, as, you know, the, the thing you replied with?
Justin: my response really was born out of the fact that I've never heard of, or seen or used any like quote unquote. Actually good staging environment. I think the, the dream of a staging environment is this like kind of ephemeral thing that looks exactly like production or a pre-prod server, except there's no danger and you can just deploy to it and like get all of the benefits of production without any of the actual, uh, drama or risk or friction.
and that's just like never true in my experience. There's always some sort of thing that's missing. There's some sort of inherent junkiness, and the vast majority of workflows that I've run into or heard people kind of rant about is, oh, if the goal of deploying to some sort of staging environ, Is to gain some level of confidence before actually deploying to production.
The amount of variables and the amount of things that can break in a staging environment to the extent that it doesn't actually mirror production, actually decreases your level of confidence to the point where you're spending time trying to actually maintain a staging environment or a bootstrap that works well, even though it doesn't actually give you any of the, the net results that you're looking.
Matt: Yeah, I think that, um, rings true to me. I guess are there times when you have like used a staging server, some kind of pre-production server where like you did find it useful or like, I know for, I know for me it's like sometimes when we've had like dedicated QA. Teams or like, we're trying to do, like some product owner needs to like, do, you know, capital A, like acceptance of a feature.
Um, it's been kinda nice to do it on that kind of server, but I agree that it, it, it's, it, it always feels like you're trying to thread like a sweet spot where the env, like if the environment is not complicated enough in production, that you can actually mirror it one to one, then it's like, Who cares if, if it's like so simple, then why do we need the staging?
But then if it gets to be so complicated that you would benefit from having it, then uh, it's hard to actually reproduce. So, I don't know, like, do you have any good experiences with it?
Justin: Uh, that's a tough one because I, I
Matt: maybe the answer is no.
Justin: I, I think there, there have been, Experiences where the thing that you're trying to get out. The staging environment is like somehow endemic to the deployment infrastructure, or like literally just the ability to kind of see it on a staging server. Maybe you're trying to test like one very specific part of say, uh, cross service communication or something where just like, Hey, pushing this onto a new box that isn't your dev box, or isn't something completely local and getting something there like that's mission accomplished.
I, I've seen staging environments be useful for that purpose. If you have a sufficiently complicated setup such that like your service mesh on prod is identical to that on qa, but like you can't set up a service mesh locally or something along those lines. Otherwise, I think you run into what you're talking about here with Young Candy Valley.
if your setup is so simple, in the ideal case that you can reproduce it entirely locally, you should like push all of those investment efforts to just having a really, really strong death box. But if you don't, then it's probably so complicated, right? That it's really hard to avoid actually hitting prod.
Matt: Yeah, I, I think it's like on one end of the, like barbell, you have like, okay, we have like a rail server and a database and like maybe we have like background jobs. And it's like, well, you can run all that locally, so what are you actually buying? You know, in staging versus production. And then, yeah, like you said, on the other end of things, if you have to set up like 20 different services and it's hard to do in dev, it seems like, it's like you have a crazy complex system that then you have to also sync up in, in staging. So when I was looking, I was looking online to just make sure that I have, uh, a somewhat accurate, uh, picture of like what. At least like the top Google results were like, you need a staging server. And like, why do you need a staging server where the benefits are? And there is a, there is a recent thread on Hacker News that was for an article that was like, let me find the title here.
So there is a blog post called We Don't use a Staging Server, um, by Squeaky ai. And one of the comments on Hacker News that was like surprising to me was, uh, somebody said like, this is actually pretty common. And like at Facebook, there's no staging environment. Engineers have a dev environment, and then the poll request is reviewed and then it just like goes into production.
Uh, and then they use, you know, feature flags and like staged rollout, uh, to monitor things. And I guess that was surprising to me because, um, and maybe you. Share some of your experience working at these bigger tech companies. Like that to me feels like the scale of company that would want to invest or like would have the resources for people to maintain these servers.
So that was kind of surprising to me. I was thinking like, yeah, staging, uh, environments maybe not so good if you're small, maybe good if you're big. But then it's like, well, even if you are big, sometimes they, they still don't have.
Justin: For sure, and I think it goes back a bit to the the barbell thing that you talked about, which is if you are a big tech company that has the developer productivity investment resourcing that you can justify, like, Hey, let's staff a team with five to seven engineers. And just have them attack this problem.
Like do you want them to work on staging or do you want them to work on, uh, the, the de dev box, like the local development equivalent? I, if you assume that like both of those things are roughly equivalent or commoditized investment opportunities, like the amount of time you want to be able to get from 0% confidence to say 75 or 85% confide.
Locally that's so much better than optimizing for the, going from, you know, 75 to 85% confidence to say 90 or 95% confidence, uh, in that pre-prod step. I think
Matt: And I guess, I guess like in production, in production, like you're gonna want to have like robust monitoring and roll out stuff anyways, so maybe it's like, if we're already gonna have to do this, like what, like what are we actually getting from the staging, Uh, the staging site itself.
Justin: Right. I found that when I'm really trying to get the last sort of like inklings of confidence, uh, that theoretically something like staging would be good for, A lot of it is for things that like staging can't. Truly replicate unless you have like a perfect syra of traffic and data. Things like what are the load patterns?
How does this handle, uh, thundering herd scenarios? How does this handle like our peak traffic every Thursday morning? Things along those lines where. Often what you're gonna end up having to do anyway is have, you know, some sort of dark read pattern of like, we'll, we'll read from both the old code path and the new code path and not do anything with the new code path, just to see what the performance characteristics are, if you're gonna end up doing that before formally launching it anyway.
Often it's just more convenient and more ergonomic to skip that pre-step of like, okay, theoretically it's kind of nicer if we do this in staging. So long as your company or your organization has the best practices of like, okay, if you're pushing to prod pretty aggressively, like Facebook and a lot of other large companies do.
Do so in a way that minimizes the blast radius. I think, uh, often this can introduce some friction in terms of like, okay, you want to change a, how you're reading an API from the back end, so you're gonna have to like be able to read both the V1 and the V2 of those formats. And like that introduces a step in the deployment that you otherwise wouldn't have.
But you're gonna have to do all of those steps anyway in aggregate once you reach a stage where it's. You can't, uh, you can't determine statistically deploy to the front end and the back end simultaneously. And, uh, that's kind of one of the, the growing up things you have to do once you hit a certain orb size.
Matt: Yeah. So, like, as we're recording this, I think it was, uh, somewhat recently that, uh, at least on Twitter, there was a bunch of tweets about, I think it was like an Airbnb
push notification that got sent out. Do you know what I'm talking about? Somebody sent out a, like a test, a test environment, uh, notification that, you know, was misconfigured or whatever.
And like everybody that had the Airbnb app got this like test notification and. guess that in my experience has been something that I think gets like swept under the rug, is that it's actually like pretty hard to isolate like a staging environment while making it. Uh, somewhat close to production and like, yeah, obviously traffic is gonna be a huge one, but even, uh, even data, um, or just interfacing with external systems, like you have to take extra care to turn off a lot of things in staging that kind of goes against the, like, this should be as close to production as. Um, so yeah, like notifications I think is one good example. Like even in our, um, even in, in my, my product, uh, work at Arrows, like we have, we have special code in there that says like, if you are on. If we're on our demo environment, like, and we send emails, like change who the email gets sent to, to actually get sent to like Matt aeros.dot two.
Um, because it's really easy to say like, Oh yeah, this is a staging site. Like you can do whatever you want on here. This is where we're like testing in QAing. But it's like, well, if it's hooked up to the. Mail server, it's sending out real emails. And, uh, if you do things like, Oh, well, like, let's copy like a slice of production data into the staging environment.
Like, oh, now you have like, potentially like customer emails, customer data, uh, notifications that could be going out that are not supposed to go out. So I think stuff like that is just like, it's, it's really easy to say like, Yeah, you should, you should have a staging server. Um, or even like, yeah, you can periodically like, you know, copy down data.
So it's representative, but you have to be really careful on these like edges of the system, I think where. Um, like another example is like file uploads. Um, so I worked in a system once where, uh, we had file uploads and like we stored those in S3 and the application had a reference to the, like the blob in s3.
Um, but when we copied things to staging, if you were to delete, uh, like the, the record, uh, in the staging site, it was going out to S3 and like deleting the blob cuz it's like, well, yeah, well, like that's how it should work. , but since we copied the data from production, it's like that actually deleted the reference in the production app.
And so that was a big mess. And certainly there are like a set of tools that try to like anonymize and, and do all this, but uh, at that point, uh, you're almost like, I don't, I don't know if it's better to work backwards and like try to like, make a safe copy of production versus like starting from scratch and making like seed data.
So I don't know that, that has always struck me as like the kind of hidden. The hidden cost of, uh, of maintaining a staging site regardless of your size.
Justin: Yeah, I think the, you, you brought up like acceptance testing being sort of the classic example of where you want a staging side of like, Hey, we have someone non-technical, like a product owner who wants to run through this. And again, you just have that. Of Okay. Should transactional emails be on or off? I had a horror story of a, a company that I, I once freelanced at back in the past where they couldn't really decide.
So they basically had a global flag for the staging environment, which is basically like, uh, in all caps, disabled, transactional, uh, emails going out, and they would flip it on or off depending on what they were trying to test, which of course, Just introduces another like confounding variable of what if person A flicks it on because they're like, We need to see what the new onboarding flow is like and doesn't turn it back off.
And person B just assumes like, okay, no transactional emails can be going out. I'm gonna put in a bunch of random Gmail addresses cause I'm trying to test, you know, user imports or something like that. Um, it's a little bit terrifying I think on the, uh, this. Data population or fuzzing note. I actually think I saw YC company launch maybe this past season.
That was trying to just do data fuzzing as a service. And I've, I've actually seen a number of teams within larger companies try and tackle this, which is just like, Hey, rather than rebuild a data set from first principles, let's have the equivalent of the world's gnarliest con that takes like a data dump.
Uh, you know, Cuts it to 1% of its size, fuzzes it to the point where nothing user identifying, nothing PII ish can be discovered and pop that in QA so you have a better ssim of what prod data looks like. I'm not sure how feasible it is to genericize that out such that you could have it as like a SAS that any arbitrary company could plug into.
But something like that I think is probably. More realistic unless you have an organization that like really, really glorifies and really, really reifies dummy data generation from day one. Cause that's such a hard thing to build up from first principles after you've already made a lot of progress and paid a lot of technical debt.
Matt: Yeah, it seems like like a very deceptively hard problem. And then, So like to take that as an assumption that you can get a good representative data set from production into a staging environment. Like that just seems to be in a lot of these articles in, in blog posts. Like, uh, treated as like, oh, it's like a precondition that has already been met, that like, we can do this.
So like, therefore like best practice says you should deploy to staging first. So I think that's where a lot of this stuff kind of falls down. In practice. At least that's been my experience is, is like, yeah, like if, if you could do that, sure. But we currently can't do that. So why are, like, why is this like treated as, uh, you know, like a given that you must, you must have a staging environment.
Justin: Yeah. And it really comes down to like, how feasible is it really? And if the answer is feasible with, uh, x hundred hours paid every year worth of engineering investment, can you figure out a better, more useful way to spend that time and energy? I feel like the answer is,
Matt: do you think that staging environments made more sense 10 years ago than they do now?
Justin: Absolutely. I think , a lot of practices probably made much more sense when the rate of software change was much lower and the number of moving variables in any given software environment was much lower. Right? Like if you could control the shape and reification. Of any given like SaaS or any given code base and say, Okay, we have absolute confidence that like the super structure of this code base is going to remain the exact same for the next three to five years, then I think the calculus of investing in a staging environment makes way more sense because all the things that you think are going to, no matter what break in the fullness of time are suddenly much more resilient.
But the reality is, at least in, in my experie, There are too many moving parts, like it is, uh, such a full-time job to keep those things in the right buckets, in the right condition. 10 years ago, I think, and maybe this is a, a ranter, a, uh, old man shakes hand at cloud unto itself of like, uh, back in 2010, like the median software environment was just much more static and didn't change that much.
I think in that world, it totally makes more.
Matt: Yeah. I, I find myself like in the same kind of, uh, camp where, uh, I'm not sure if it's like, you know, rose colored glasses or just like, that was the time that I like was the most formative, was like when I was starting my career. But it does seem like a lot of the common advice and like practices are like sort of stuck from that that time and it's, you know, uh, when I think about like, like books or conference talks or technologies, it's like there's a lot of stuff that's hanging around from like 2010 that some of it is still good.
Some of it has maybe like we, we've lost the context in which it was created, but we're still applying it. So, I don't know, maybe, uh, maybe I should find someone who is, uh, much older or much younger,
Justin: I feel like we are gonna look back at like the corpus of technical blogging from say, gosh, 2015 and 2020, where I feel like Kubernetes containerization, microservices were like very, very dujour because. We were at the point where most of the big tech companies were adopting them. And then when folks from those companies splintered out to start their own thing, uh, they kind of like proselytized those practices.
And now I think we're, maybe this is optimistic, I think. We have all now collectively realized, Oh, that probably wasn't a great idea. Like you can start pretty, pretty simple with a lot of these things and just graduate to Kubernetes and that world when you really need to. I wonder if we're gonna look back on all of the blog posts of like, Here's how you deploy a new Rails app to Kubernetes within 15 minutes so you can scale infinitely, even though you don't have a single paying customer.
It's like, Ah, we kind of just have to throw this in the Dark Ages bin.
Matt: Yeah. Yeah, it is interesting, like, and, and I think about, I don't know, it just, it feels like something happened in that time and like everything just. Uh, there's a, a switch flipped or something, and like, definitely there's more people like posting on Twitter and online and on different social media than before.
So I, I guess I, maybe it's part of it being more decentralized. There isn't these kind of central, uh, publishing channels that then create like sort of a monoculture of like, this is what the practices are. Cause I think back to like a book, like The Pragmatic Programmer, right? Like that's like. Was published I think in like 2006 or something, uh, if not earlier.
And it's like there hasn't really been a, at least that has crossed my path, there hasn't been another book like that, that you would say is like, sort of like universally like recommended regardless of your, you know, domain or tech stack or whatever. And it just feels weird. And, and maybe it's because like things are becoming more niche or like it's easier to find a community. Specific to like, you can find 25 eBooks on like the best practices for like building a React app with Redux versus like before you have to actually have a physical book that needs to get published. So it has to be about general principles and it like, you can't, you can't update the blog posts or the screencast or whatever with, with an addendum.
So you have to sort of try to hedge some of, of the, of the nuance into it. I don't know. Do you think there's anything to that?
Justin: Yeah, I feel like. At least from, from my perspective, there's so much more, and I I mean this in a positive way. There's so much more like 1 0 1 content, which is like very, very proper. No space as you're saying. Like, here's how you set up, you know, React plus Redux. Here's how you set up, uh, a next JS deployment.
Here's how you do like this specific job to be. and less focus on sort of like the craftsmanship or the metier of it all. Like, uh, you know, Ruby Cohens type stuff, practical programming. Um, I wonder, this is just kind of like anecdotal, but I wonder how much of that is the supposition becoming that, the way you get much better at programming.
Mentorship and sort of like being placed in an organization where you can learn synchronously from other folks. And I kind of hope that isn't a permanent trend because when I was first kind of going from like literally learning how to program in Python to trying to be an accomplished programmer and someone who like took a lot of care and respect of, of what I did.
I was, I didn't have a network. I wasn't working with folks who I felt like were bringing me to the next level. I had to consult like bloggers that I, I really admired and respected. I had to read through programming books and go through SICP and, and all of those things. And if anything, I feel like the opportunity for those in the, you know, uh, to use a phrase, I hate the creator economy.
Like, I feel like there should be more opportunity to do that, not less. I'm kind of curious as to why. The, the market or desire has kind of shifted away from it.
Matt: Yeah. And maybe I am just like ultra, uh, as the kids would say, black pilled on, uh, on this. But it, it does seem now that like instead of, um, getting, like getting better at the craft of software, it's like how can you get a bigger audience to like, Monetize in, in, like you said, in like this, this creator, creator sense.
It's like, you know, I think, I think some of that has to do with the fact that like more people are entering the industry so there is more demand for that. But it seems like there's an entire swath that somewhat dominates the conversation of people that are like, I want to go to a boot camp and I wanna like buy this course on like fang interview prep so that I can get a job at, you know, one of these giant companies.
And then from there you transition into your career as like a tech influencer vlogger, where you make YouTube videos, uh, that consists of 80% of you. Going to the cafeteria and ping pong table, and then, you know, the, the other 20% are like, Oh yeah. And then I like coded on this API for a little bit.
Justin: I think, um, Brian Loven, who's uh, a designer at GitHub, I believe. Uh, has this metaphor that I'm probably gonna butcher, but he talks about sort of like the, the creator spiral, where it's, if you're doing, you know, one of the, the hashtag learn in public things, or you are somehow monetizing your influence, there's a really, really strong temptation to shift from talking about, say, uh, rails programming or talking about icon design or insert thing here to talking.
Talking about that thing, and it can be become very, very recursive and fractal to the point where you're selling like an info product about how to sell info products to people, uh, who want to learn how to sell info products about info products. Like, it's very, very easy to kind of go down into, uh, the narcissistic spiral of people will buy this thing and I just spent 18 months talking about this topic, so I'm just going to reorient everything that way.
That might be one of the downsides of like having such. Very, very transparent and monetizable audience relationship.
Matt: Yeah, it is interesting just to think about, think about all that context and, and how. Just like, it's almost like societal, like, you know, let's go big picture here. It's like societal changes in the, in the past, like 10 or 15 years have like led to software being, uh, like seen as a more desirable field, which leads to more people, which, you know, in some ways is, is good because I think this is like a good career to have and can provide value to the world. But yeah, it's like at some point you're sort of diluting the, the, the craft a little bit. And, and as more people come in, like you said, there's more, more demand for like 1 0 1 content. And then there's also like more opportunity for people that want to like, uh, you know, sell shovels during the, the gold rush type thing.
Justin: It's true, and I like, I always feel a little bit ordinary when I see the. I don't even know if there's a term for this, but like the, uh, very, very basic tweet where it's like, here's how you add two strings in Python type thing of like, why is this person farming out? Like, uh, intro to Python stuff and having that be like the, the pathway to mastery.
But I, I have to like remind myself too that I'm not the target audience for that kind of stuff and like there really is a large and booming one. I feel like one of the things that I'm grateful for is the first like 10 years I was paying attention to programming and sort of SaaS development and all of these things, like there were a huge SWA of folks all the way between like, uh, here's how you kind of get a basic sass up and running.
Here's how you do marketing, here's how you do scaling. Like, I felt like I could learn. Each and every step from folks who had actually done the thing. They weren't influencers or sort of like, uh, content creators. First and foremost, they were doing that job first and foremost, and then they were also trying to talk about it in public.
I think maybe that's where some of the like lost innocence to use a truly tortured phrase comes from, is that like now the temptation is to have that be the main thing. Like there are so many people I see on Twitter through their very aggressive like promoted tweet stuff. You're, you're trying to talk about how to build a SaaS or how to scale like a Python application or something like this.
And like, you don't do that. That's not your full-time job. Your full-time job is some sort of educational or some sort of, uh, auxiliary role. And there's nothing wrong with that, but like, I'm going to put much less stock in what you say in terms of like pricing strategies. If it's clear that you've never actually like priced and sold a SaaS before, like I think there has to be some level.
True learned experience or true skin in the game.
Matt: Yeah, it's, it's like the out the outcome in the past of like, I need to learn this skill to like, you know, ship this thing at work or to hit some, like, business result has been changed with like, I need to get hired for this job so that like, I can then write a course about how I got hired, you know, for the job.
Justin: Yeah.
Matt: I wonder too, like, it, it seems like a lot of what happened as like tech sort of boomed in like the, the, you know, the 2010s or whatever. At some point there was like, uh, there was more people working at these big companies than like, there is sort of, I'll say like, work to do or like, uh, there like the, the winners by like, you know, the power law, like sort of.
Took everything. And like companies like Google, it's like it's so profitable that it doesn't matter that they have 500 engineers that are, uh, like sort of, and I'll say this like disparagingly, but I don't really mean it like not doing anything. Right. Um, and it feels like you need, like, it would be natural that like eventually, like new companies would start up.
We're not that way. And like that would be a place where new learning and new thought leadership type stuff would come from. It seems like that is like, sort of like the kind of crypto land right now. Uh, but maybe that environment is just also so poisoned by like, uh, the, the money and the sort of snake oil.
Uh, Type type stuff that, that it's hard to, uh, see it in the same light as like, Oh yeah. Like, I don't know, like I look fondly back at like, Oh yeah, when GitHub was like, getting big and like you could s like hear talks from the people that were like the first hundred employees. And it doesn't, doesn't seem like there's been that next wave of, of that yet.
Maybe I've just missed it, but.
Justin: I, I definitely agree and I think part of it is sort of the like, uh, the, what is it? Hard times create hard men like that, that sort of meme of like, there is a cyclical nature to all of this. And I think we, we will come back to a world where more people are doing, uh, sort of. Hunting and figuring out what the new, exciting things are and building them and talking about it as opposed to sort of like the, uh, metaphorically sedentary lifestyle of I'm gonna work at a fang for four years and like probably not be pushing the world to too forward.
And again, I, I say that with no judgment whatsoever. That being said, I think I have more local optimism that like people are building a lot of really, really cool stuff at any given point in time. Like even right now, I think looking at, uh, to take an example like the swath of sort of like Heroku, except for 2022.
infrastructure providers that are coming out, they're all like really cool. You look at render, you look@fly.io, you look at railway like there is actual technological innovations happening, whether it's stuff like using Nix packs, so you have these really huge build times that are now like seconds instead of minutes, like things along those lines.
People are building really, really aggressively. I think what you are seeing is. the profile or like the, the median arc of an engineer has shifted such that, like, now I think it's almost a, a rite of passage or an expectation that like, Hey, you're gonna try and build a new company, or be one of these people who is a really, really strong architect and a presenter.
You have to do a bit more of your time than you otherwise would by spending, you know, four to six years at a fang. Uh, working in some less glamorous capacity for a little bit first, but it's, it's hard to tell also because 10 years ago when I was digesting all of that information, Any person, like any individual who was on a stage and talking about a concept was so much more impressive to me.
I was 22 years old. I had no idea what half of the concepts I was digesting were, and someone being in that position regardless of where they uh came from or who they were, what their background was. I think I was just predisposed to be much more impressed because I was a more novice engineer opposed to now when I'm all cynical and jaded about these.
Matt: Yeah, I think, I think there is just something about like the internet has made everything, um, like more widely distributed to, Cause I feel like when I was younger too, it would be like, Like you said, you're, you're sort of seeing things from like a really local sense of like, Yeah, this, this person is like the best programmer I know, but it's like, Yeah, cuz like there's only like four people that know how to program like at your local high school or something.
Versus now it's like, oh, like this guy on Twitter has, uh, you know, a hundred thousand followers and he's, you know, a 13 year old that you know, is building his own startup or whatever. And these people were probably always there. N you like, didn't have exposure to them. And now it's like, Oh, if you wanna, if you wanna learn to code, it's like one of the first things you need to do is like, you know, get on Twitter, because that's where all the learning to code people are and yeah, it's, it's interesting.
Justin: Which I, I think goes back to your previous point of, for better or for worse, software development is like a prestige position now compared to, say 2007, 2005. Like, uh, a lot of people who would otherwise like go into iBanking or go. Uh, you know, law school or something along those lines like they're entering tech, whether it's through development specifically, or PM or something along those lines.
Like, that's still a much more common path than it used to be. And it has a level of, uh, I don't even know if prestige is the right word, but like normalcy that, that it didn't
Matt: Yeah. It's, it's definitely like more mainstream, uh, normy as they would say. Uh, so I think that if we try to big bring it back, it's basically like Mark Zuckerberg's fault that people still advocate for staging servers because his company started the rocket ship growth that really caused all of software engineering culture to get stuck in 2012.
Justin: Perfect. I think, uh, the evolution of the hot take, which is. Uh, not just staging environments are a bad idea, but staging environments are a bad idea and it's, Facebook's fault specifically is perfect. It's always useful to have a, uh, a enemy that you can throw your slings and arrows at.
Matt: someone to blame. Yeah.
Do you think the existence of staging servers makes people like lazier? Because it's like a place you can dump code without taking accountability for it. Like, well, we put it in the staging server if it breaks in production. Like we couldn't have known.
Justin: I don't know if lazier is the word I would use, because again, I think dealing with staging environments just requires so much effort. I, I, I think it gives people false confidence and like the, uh, the metaphor I always love is like you are trying to navigate a curve. Of confidence level in your code change.
And often people will think that bumping something to staging is gonna get you to like 95 or 99%, uh, which is pretty close to a hundred percent. So then you push it out to, to prod and you get like your ASTO 99.99% or whatever. And I think in reality, often, It's a noop compared to, uh, your local deployment that like has purely local data but is otherwise kind of the same.
I think the best case scenario where you have someone who, let's say you have an organization that is pretty well invested in staging and like, it's not broken all the time. It, it's a little drinky, but it works, but it takes a lot of effort. Like there's. There can be laziness if then you say, Okay, it's working fine and staging, therefore it's good to go.
Like that's, that's a big leap of faith to say like, this environment is so close to the real one that, that we're done here. Cause I think, again, regardless of. What folks might blog about. I think in practice it's very rare that you're not doing some level of production testing, which is not say like only test in prod L O L O L.
It's more like that has to be one of the things you do once you hit a certain level of fragility or velocity because you need to be ex exercising, exercising that code path in all scenarios. And if you skip the production exercise like that can be really, really dangerous.
Matt: What do you think about staging server? Uh, as good because it is a roadblock and the average developer is not good. Uh, and should be, it should have speed bumps and roadblocks put in front of them to prevent bad things from happening.
Justin: I, I like intentional speed bumps and roadblocks in most scenarios. Um, not because like any average developer is bad, so much as like, we are all bad once we hit, uh, a certain threshold of complexity. Right. I think. What is scariest to me is like some sort of Loony Tunes cartoon speed bump that is like there 80% of the time, like when it comes to trying to get production, uh, confidence.
The biggest thing you need. Is a level of reliability that like you perfor, it's like a checklist, manifesto thing, right? Like you perform this thing a hundred times, the outcome should be the same a hundred times, even if that outcome is super annoying and largely exists as a deterrent or as something that intentionally slows down velocity.
If you have a staging environment where you're still trying to invoke some sort of judgment call of like 80% of the time, this is good enough to go. But you know that like, Oh, we rotate out the database every Thursday, so you might run into some weird stuff if you're hitting staging on Thursdays or things of that nature.
That's where I get a bit of pause. How about you?
Matt: Yeah, I think it's tough because. Yeah, you'd have something that's like a checklist, but if, if you know that like, Oh yeah, but don't actually care about this, this step, if it, if it fails. Like that's, that's kind of like the road to, um, like losing, like you lose confidence in the process or like the process gets watered down so much that it loses the, the original meaning.
Justin: Imagine a test suite that. Had tests that were decorated with a hey 20% of the time, like roll a D five and if you get a five, if this test fails, don't worry about it. Like it's fine. Just pass the test runner. Anyway, that's like the thing that terrifies me because that, that's kind of like a bit of organizational rut that if that starts out with just one test, it's going to expand in time to be everything.
And then at that point you don't have any confidence in the.
Matt: Right, Right, right. You'd almost like, I think the naive, the naive thing would be like that any test is better than no test. But really it's like, well, if this test is unreliable, then it's maybe worse than not having a test, because then maybe you'll think, Oh yeah, there's no test for this. I better like be extra careful.
Justin: exactly.
Matt: we've been running into something similar at work where we have, uh, we have error monitoring, but there's a whole bunch of errors that we say like, Yep, we know that these are happening and like, it's not a problem. Um, but we haven't figured out a good way of like, Turning them off for ignoring them.
So what it does is it just makes it so that like what used to be, if there's ever anything in this like alerts channel, then like immediately like stop and resolve it so that we get this band back down to zero has turned into like, well, except for this error, except for this error, except for this error.
And then suddenly our strategy of like zero, like zero alert, uh, exception monitoring. Uh, we, we've lost, we've lost that. Like, uh, you know, we we're like holding, holding the line and like it's been breached and now we're saying that like, Oh yeah, we're okay with less test coverage cuz we have zero like exceptions.
It's like actually not true anymore. But then we didn't like go back and fix the test coverage.
Justin: Huh. They've done studies to that exact effect in hospital environments where you imagine a hospital room, right? And you've got 20 data points, like you've got, uh, EKGs, you've got, you know, visual data, you've got audio data, and just the presence of all of those. Creates, uh, alert fatigue, right? Where you're hearing and processing so many things that your body is literally incapable of.
De deciphering what is high bandwidth, useful signal from low bandwidth, Just noise. And even if something can be useful in certain circumstances, like if even a small portion of the time it's not, it's probably a net negative for your overall operational.
Matt: Yeah, well, I think I've reached the point where, uh, I feel like staging servers fit squarely in the, the, the mid width, uh, you know, meme format of, uh, you know, the bottom of the bell curve is like, you know, Just, just send it to production. And, uh, the middle is, uh, you know, staging servers with all these rollouts and anonymized data and, uh, you know, gated releases and things like that.
And then we've got, uh, you know, the top of the bell curve that is like, you know, just push it to production.
Justin: Yeah, I, I think the, um, the, the bell curve meme is more accurate than it is inaccurate a lot of the time. Like as long as you have really, really robust feature flagging, operational detection, perf detection, all of those things. Which you generally want to have anyway, even if you're not doing quote unquote staging testing within Pron.
Uh, it's just really the amount of stuff you get out of a true production environment, traffic data, all the things we've chatted about is so hard to replicate that. And I find myself firmly on the like low IQ side of like, I'm going to do the theoretically dumb thing because it is often so expedient both short term and.
Matt: Yeah, yeah. And maybe it doesn't exactly fit because I don't think, I don't think either of us are saying like, Replace a staging server with nothing. I think we're saying like, you're probably better off with things like using, um, like an ngrok tunnel to let people preview stuff on your local environment if you need someone to do like a quick spot check and then like in production, having feature flags and other like safe rollout features, so that deploying, like doing a big deploy is.
A common occurrence. So then you wouldn't need to necessarily use staging as like your place to find the, like, uh, you know, unknown unknowns.
Justin: Exactly.
Matt: Cool. Well then I think we can, uh, answer the question. Uh, staging servers. Do we need them? Justin, yes or no?
Justin: You're not gonna need it
Matt: You're not gonna need it. And it's a no from me for this one. Uh, so yeah, thanks for joining us as we continue to spew hot takes and hopefully make you rethink some things that you, uh, haven't thought about in a while.
Show notes, links, and a transcript can be found at yagni.fm. Today's guest was Justin Duke, founder of Buttondown. You can find Justin on Twitter @jmduke. And I'm your host Matt Swanson. And you can find me on Twitter @_swanson.
Until next time, just remember you ain't gonna need it.