Real World DevOps

Ryn Daniels joins me in this episode to talk about building resilient cultures, particularly in Engineering teams. Ryn’s knowledge comes to us through two different incidents they experienced at two different companies, how the response differed at each, and what both teach us about building safe, learning environments and a resilient culture.

Show Notes

About Ryn Daniels

Ryn Daniels is a staff infrastructure software engineer who got their start in programming with TI-80 calculators back when GeoCities was still cool. Their work has focused on infrastructure operability, sustainable on-call practices, and the design of effective and empathetic engineering cultures. They are the co-author of O’Reilly’s Effective DevOps and have spoken at numerous industry conferences on devops engineering and culture topics. Ryn lives in Berlin, Germany with a perfectly reasonable number of cats and in their spare time can often be found powerlifting, playing cello, or handcrafting knitted server koozies for the data center.



Mike Julian: This is the Real World DevOps Podcast and I'm your host Mike Julian. I'm setting out to meet the most interesting people doing awesome work in the world of DevOps. From the creators of your favorite tools to the organizers of amazing conferences, or the authors of great books to fantastic public speakers, I want to introduce you to the most interesting people I can find. 

This episode is sponsored by the lovely folks at InfluxData. If you're listening to this podcast, you're probably also interested in better monitoring tools and that's where Influx comes in. Personally, I'm a huge fan of their products and I often recommend them to my own clients. You're probably familiar with their time series database InfluxDB, but you may not be as familiar with our other tools. Telegraf for metrics collection from systems, Chronograf for visualization, and Kapacitor for real-time streaming. All of this is available as open source, and they also have a hosted commercial version too. You can check all of this out

Mike Julian: Hi folks. I'm Mike Julian, your host with Real World DevOps. My guest this week is Ryn Daniels, co-author of O'Reilly's Effective DevOps, a public speaker and previously worked in engineering for both Etsy and Travis CI. Ryn, I hear you're working everyone's favorite infrastructure automation company now, HashiCorp is it?

Ryn Daniels: Yes, it is. I'm a working on the terraform ecosystem team. I'm going to be working on the AWS provider.

Mike Julian: You've been writing and talking a lot about this idea of resilient culture and you wrote a article for a InfoQ, which we'll link in the show notes, about crafting resilient culture, which talked about the Apache Snafu. You and I were just talking before the show about an earlier story about Postfix and Puppet and well, things exploding in your face.

Ryn Daniels: Yes, so it's a fun story with a little less of a happy ending than the Apache snafu. My first ops job I inherited two data centers that didn't even have a lonely bash script for company. I was doing everything by hand. There were a lot of dragons and nobody was really sure where are the dragons were lurking. One of the things that I was kind of put in charge of was the idea of, "What if we didn't do literally everything manually? What if we had some sort of automation?" So I got to do fun stuff like set up automated Linux installs instead of me going around carrying a USB DVD player and yeah.

Mike Julian: Definitely been there.

Ryn Daniels: Yeah, that that was ... Those were sad times. So I was starting to put together Puppet and it was mostly going pretty well. I was starting out with the what seemed like the safe stuff. And I asked the engineering team, I'm like, "So it seems like Postfix is configured a bit on these servers, but it's not running. Should it be running?" And people talked amongst themselves a little bit and they were like, "Yeah, it should definitely be running because the servers are set up to email us when something goes wrong." Okay.

Mike Julian: So clearly everything was fine because no emails were going out.

Ryn Daniels: Exactly. Exactly. So I clear this with everyone. I tell them, I'm like, "Okay, I'm going to roll out this change." And I turn on postfix everywhere. And this was my very first ops job, so we didn't have anything like a testing or a staging environment. I was really kind of playing everything by ear at that point and learning as I went. So I turn on Postfix and then a few minutes later somebody says the site's down. Like how did turning on Postfix take the site down?

Mike Julian: That's weird.

Ryn Daniels: And we kind of kind of poke a little bit on one of the servers that I was logged into and like the web server was still running. Everything looked like it should have been fine. What happened was there were eight years of emails queued up on every single server, and when Puppet turned on Postfix, those eight years of cued emails started sending all at once. And the way that networking was or wasn't configured back then, I think I just like saturated every single network link in our two data centers with all of these emails, and everyone's like, "Ryn, help, make it stop, get everything back on line." I'm like, "I don't know how to un-send eight years worth of email, folks. Like, we're just going to have to wait this out." Which is kind of what happened. And eventually, eventually all of the emails sent and shockingly, there were a lot of error emails as it turns out in this sort of environment.

Mike Julian: Surprise, surprise.

Ryn Daniels: Yeah. And after that everyone was a little twitchy anytime I mentioned making a Puppet change. So yeah, it was definitely an exciting afternoon slash couple of days trying to figure out what went wrong with automation and try and keep it from going that sideways in the future.

Mike Julian: How did your teammates react to all this? Like aside from like, "Ryn, what have you done?"

Ryn Daniels: It was, it was mostly just that kind of panic and then everyone trying to figure out what to do. People had differing amounts of visibility into what was going on. There was kind of a homegrown monitoring system that was set up that also lived in the data center, which may or may not have been very accessible during this time. Oh, I remember, I was stuck in the data center physically because nothing was configured to have a remote, out-of-band access. So most of my days were spent me alone in the data center with this ancient MacBook. I think it was still running power PC, so I didn't even have Chrome running. And it had so little memory that it could really only run one application at a time.

Ryn Daniels: So I would like get the terminal up and do a thing and then if I had to look something up, I would have to quit terminal and open up Safari. And then if I wanted to talk to people in the office I would have to quit Safari and open up I think we used AIM. And it was a lot of back and forth and chaos trying to just get a baseline feel for what was going on and there was definitely a lot of yelling going on in my general direction.

Mike Julian: Yeah. What was the aftermath like?

Ryn Daniels: Eventually all of the emails sent and everything went back to normal and people said, "Okay, Ryn, please don't do that again." I'm like, "Well I'm not going to, Postfix is already turned on. I'm moving onto the next thing on my list."

Mike Julian: Did you find that people were more likely to blame you or Puppet for issues in the future?

Ryn Daniels: I think the blaming of me was mostly good natured. I feel like this was, well not the most robust environment I ever worked in. It was actually not the most blameful. There was ... Like I'm pretty sure that once I quit, people blamed me a lot cause it was kind of the culture of whoever the previous person was, everything was their fault. But when I was there, it was mostly ... Yeah.

Mike Julian: That's how it works in most ... This reminds me of I used to work for a national lab and due to a missing keyword on a Cisco configuration, I took down two entire research buildings.

Ryn Daniels: Ooh.

Mike Julian: Yeah, that was fun.

Ryn Daniels: Yeah, sounds exciting.

Mike Julian: Yeah, sounds exciting. Basically if you are configuring a trunk line and you're trying to add a VLAN to a port, if you don't add the “add” keyword, then it replaces all the VLANs with the one you specify.

Ryn Daniels: Oh.

Mike Julian: So there were a few hundred VLANs configured, and I replaced it with one.

Ryn Daniels: I bet some of those were probably important.

Mike Julian: Some of them were important. The weirdest thing about that whole situation is that my first day, the network manager during my onboarding says, "Hey, you're going to make this mistake. Just come tell me when you do it." I'm like, in hindsight, you know, maybe we should just make that mistake, not a thing that you can do.

Ryn Daniels: That would make sense.

Mike Julian: Right. That would make sense. But they never did that. After I did it, of course I never did it again, but then I had to train my replacement and say, "Hey, you're going to make this mistake too." Like this just sounds awful in every way. So it's interesting contrasting that with the well known story of the Apache snafu. Why don't you tell us a bit about that, like how it differed and basically how the experience went?

Ryn Daniels: Yeah, so the rather quick version of the Apache snafu, this was when I was working at Etsy. I think this was 2015 or 2016. I was working on the tooling that was provisioning servers in the data center. So at the time, Etsy was, for the most part, servers running in our own data centers, and some something had to get the servers in a state from the data center team just un-boxed them and racked them and wired everything up into, "This is a useful server that does something nice like serve to people who want to buy yak cozies."

Mike Julian: Sounds useful.

Ryn Daniels: Yeah, whatever it is that what you're buying that's delightful and handcrafted.

Mike Julian: Definitely yak cozies.

Ryn Daniels: Yeah, so I was working on this provisioning software, which was a collection of mostly Ruby scripts at that point. And I was getting to the point where, "Okay, I need to run some end to end tests to make sure that, okay, so all the pieces seem to work individually, but can I actually provision a server? Or more importantly, when the data center team gets a whole bunch of new servers, can they actually use this to provision them in a timely manner?"

So I have my test server and I'm trying to provision it, and one of the later provisioning steps was bootstrapping it into Chef. I was I think running a test web server since that's one of the more common use cases. And Chef failed on the Apache install step. It said, "I can't do this because the version that is pinned in Chef is older than the version that is installed by the Anaconda installer." Now this had happened a nonzero number of times before, not just to me but to other people, because the way that the Yum mirror was configured was that it would automatically pull down new versions of packages and get rid of the old ones.

So pretty much anytime that happened, if a new server was being provisioned at that point, this sort of mismatch in between the installed version and the pinned version in Chef would happen. And just the way that we were pretty much used to dealing with this was, "Okay, kind of manually test to make sure the new version does what you expect it to do and update the version in Chef." So on my little test server, I manually install the new version of Apache. It was a point release. I remember I even checked the release notes and there was nothing super interesting in them. The way that Chef was configured, it was only supposed to impact newly provisioned servers when you bumped the version so that all of the existing servers would keep the same version, they wouldn't update.

Ryn Daniels: This occasionally led to a little bit of config drift across the fleet, but that was the decision that was made. It had been working okay so far. Nobody had complained enough to change this process. So I test it by hand and I roll it out and I'm like, "Okay, that was good. Nothing's going to happen. This is going to be a no op." And then I said, "I should just log into one of the web servers and make sure that Chef does nothing." You can see where this is going, can't you?

Mike Julian: Right. I love the ... I have a feeling.

Ryn Daniels: My spidey sense, my spidey op sense was tingling a little bit. And so I log into one of the web servers and I run Chef and it upgrades Apache and Apache does not upgrade cleanly. It fails to start. And I'm like, "Oh, oh no, I've done a bad thing." So I'm realizing that it's, I don't know, sometime in the middle of the day, let's say like 2:00 or 3:00 in the afternoon, and I realized that I have just rolled out a busted Apache upgrade to the entire production and staging environment all at once.

Mike Julian: Queue panic.

Ryn Daniels: It was kind of one of those slow motion moments where you're like, "Oh God, I can see my whole life flashing before my eyes." And I happened to know that my coworker, Pat, who was sitting next to me was the one on call and I turned to him and I'm like... I'm like, "Hey, so you're about to get paged for a whole bunch of stuff. Sorry about that." And then I head into Slack and I jump into the main sysops, webops channel where everyone tended to congregate, especially when there were production issues. I'm like, "So, everyone, I've got good news and bad news. The bad news is I've broken everything. The good news is I'm aware that I've broken everything." And everyone jumped in immediately. They're like, "What can we do? How can we help?"Like people who were in the office with us who overheard me talking about this and kind of muttering to myself came over and were like, "What's going on? Is there anything we can do?"

And so people jumped in and it was really nice to see based on different people's areas of expertise, like people who were really familiar with Apache and the Apache config started poking at that. People started trying to look at like, "Oh, is the site impacted?" A fun part of this story is that many, many of the internal monitoring tools that were used at the time used Apache as a web server. So all of a sudden, not only is everything mostly on fire, but we can't even really look at what the fire is doing because the fire observation tools are also on fire.

Mike Julian: That's incredible.

Ryn Daniels: Yeah. Yeah. And people are looking at the config and trying to figure out like the config didn't change. Nothing in this looks like it should have changed. Eventually somebody figured out that if you just ran Chef a second time, everything fixed itself. But at that point nobody was really digging into why, we were just like, "It's the middle of the day. People are trying to buy their cozies. We got to get this back up." Somebody went to in a web browser cause a lot of the tools were down and somehow the site was still up.

Mike Julian: Interesting.

Ryn Daniels: It was really, really, really, really slow.

Mike Julian: But it wasn't down.

Ryn Daniels: But it was a doubt. Like I technically did not take the site down. So we coordinate the work of running Chef everywhere on all of the various servers. Like, "Let's do this one group at a time and not DoS the Chef server with everything tried to run immediately at the same time, verifying that stuff comes back up." Everything starts to come back up, everything goes back to normal. And this all took place over the longest 20 minutes of my life.

Mike Julian: Yeah. As you're telling the story, I'm like, "All right, this sounds like several hours." But no, it's actually not that long of a time.

Ryn Daniels: Yeah. Yeah. Given that Chef ran I think automatically every 10 minutes, I think given that the second Chef run fixed it, if I had just ignored it or not noticed and gone to lunch or something, you would have fixed itself pretty quickly.

Mike Julian: That's both awesome and a little terrifying.

Ryn Daniels: Yeah. Yeah.

Mike Julian: So Etsy was pretty known for their retrospectives and the learning lessons. And they even had the funny shaped sweater. What was the aftermath of that compared to your ... the aftermath of the previous story we just heard about?

Ryn Daniels: There were a lot more high fives. So I think, I can't remember what time of year that happened. I think the ... So the three-armed sweater is a physical sweater that was given out once a year, usually in like December, to the engineer who not necessarily broke something in the most spectacular way, but kind of contributed to an incident that we all learned a lot from. And there were a lot of weird little, how did that happen moments in the Apache snafu. So I ended up at the whatever December all-hands meeting, John Allspaw was handing out the three armed sweater, and it was awarded to me for this delightful, delightful incident.

Mike Julian: Bravo.

Ryn Daniels: And it was the kind of thing where that story would spread throughout engineering and people would come up to me afterwards and they're like, "Oh my God, you won the sweater. That's so cool. Congratulations." It was ... There definitely was not any incentive to go break the site and I'm pretty sure that there was a fine print in there that that would disqualify you from getting the sweater is trying to get the sweater. But it was definitely something where people wanted to hear the story because they wanted to hear what happened because it was interesting. And so it was actually a lot of like warm, fuzzy feelings that I didn't have to worry that people were secretly mad at me. I didn't have to worry that like the next time I tried to make a change that people would be like, "No, actually, I don't trust you to do that anymore because you broke something that one time." It was a much more supportive environment, which was really nice.

Mike Julian: I believe Etsy had a post on this a while back, this idea of a just culture. I think you've been talking about it in terms of resilient culture, and psychological safety plays into this. For those that aren't really aware of what all that means, could you talk more about it?

Ryn Daniels: Yeah, so I like to ... The thinking about resilience, resilience engineering is a field that has been around for a while. This is not something new that I came up with. A lot of my thinking on the subject comes from conversations with John back when I was at Etsy and afterwards. And one thing that he likes to say that I really appreciate is the idea that computers and systems can be robust, but only people can be really resilient. So thinking about these sorts of failures that happen. "Okay, like automation caused this thing. The other thing." You were talking about the VLAN incident, and wouldn't it be nice if there was a way to make it so that everyone didn't make the same mistake?

You can make a system that is kind of robust to these known failures, so that one command that everyone entered wrong, you could write some tooling around that specific command or put it in a little web interface so nobody was entering raw commands on the devices by hand, that sort of thing. But you can only do that for a known set of failures. The problem is there's always going to be the unknown unknowns, the things that you haven't thought of yet because they haven't failed yet. Or you add in some new piece to your infrastructure and all of a sudden because complex systems are complex, you have these new interactions that just didn't exist before that people haven't thought of. And it's kind of how you respond to the unknowns that kind of defines resilience I think.

Mike Julian: Okay.

Ryn Daniels: So I like to think about resilience as kind of like the opposite of that being fragility. Whereas the story with Puppet and that data center and nobody knew how to respond and everything just caught on fire, like that was really fragile. I mean that whole environment was very fragile because people responded to the unknowns and to failures with fear. Like another story from that job is the one database server. There was just the one. There was no mirroring, there was no sharding, there was just the one that had the data, and of course it was running Mongo. We all love to hate on everyone's favorite, NoSQL data store.

And the raid array in this one server was degraded. And I went to the engineering managers. I'm like, "So, I need to get some new hard drives to replace the the busted ones. Like let's plan this work. I want to do this." And they're like, "No, no, you can't do that." "Why not?" "Well, because something might go wrong during the repair process. Can you guarantee that repairing the raid array will not break it?" I'm like, "No, I can't. That's not a guarantee you can make." And they said, "Well, you can't do it then."

I'm like, "Okay, let me tell you what I can guarantee is that if you let this raid array sit with 50% of its disks busted, at some point the remaining two discs are going to die. That I can guarantee, and then you will have no data because you have this one database server and it has no backups. Like that is the guarantee that I can make. Given those risks, what if we order some new hard drives and I rebuilt this array?" And they said no, and I did it anyway. Which, I mean, you got to do what you got to do sometimes, but it was that culture of fear and having to do things in secret that was really like the opposite of resilience there.

Mike Julian: Yeah. That's a really interesting point. What I love about this concept of psychological safety and resilient culture is people really are at the center of it. And most people ... A lot of environments seem to kind of divorce the idea of the technology we run and the people operating it, when in fact they're symbiotic. You have to have both in order to have a well running environment. If you react with fear to someone like, "Hey, I need to work on this system and make this change." And you're like, "Oh God, we can't do that." Well you're actually breaking the technology too. And also breaking the people.

Ryn Daniels: Yeah. I like to say that as engineers and as an engineering organization as a whole, you're not just shipping code, but you are also shipping the entire environment that allows you to ship code and that's culture. That's the people, that's the processes. And if you ship broken processes and if your culture ends up shipping broken people, then you're going to have a bad time.

Mike Julian: Right. So for a company or a team or a person, who identifies more with this broken culture than the culture that you've been working in and have been building yourself, what can they do to start to shift their own culture? To change what's going on? How can we get a more resilient culture if we don't have one?

Ryn Daniels: Yeah, I think that's a really interesting question, and a big part of culture change obviously is getting buy-in. People have to want to change the culture specifically and they have to want to change it enough to actually overcome the inertia. And inertia is such a big factor in cultures and how we work. So getting buy-in is important. And I think there were some really interesting stories at some of the DevOps Days events. A few years ago, I remember Target did some really interesting talks about, "Okay, if you have these kind of individual teams throughout the organization who are trying to make these changes, how can they then spread those changes throughout the rest of the organization?" Some really interesting stuff there.

But there's different things within a culture that you can look at. I think a big part of it is looking at what behaviors people are rewarded for. Like what sort of incentive structures are there? So one of the things that I like to see in a culture is looking at any sort of skills matrix or career matrix. It should be required that as a senior engineer, staff engineer, what have you at those higher levels, that you be helping to create this kind of culture of psychological safety. You should be responding to people asking questions with actual help. I've definitely ... There's the stereotype of the BOFH, the grumpy sysadmin who wants to hoard all of the information to themselves and has never going to help anyone out because that's less job safety, and who is going to yell at people and make fun of them for not knowing the answer. That's creating a psychologically unsafe culture. That's creating a place where people aren't going to ask questions and they're not going to tell you what they did wrong or even what they did.

Mike Julian: Yeah. I used to find those stories hilarious. Like The Register has the massive collection of them. I always thought they're hilarious. And then I started actually being a professional and then realized, "Wow, that'd be a terrible place to work and that's a terrible person."

Ryn Daniels: Yeah. I've definitely worked at places where ... I remember one time, a long time ago, somebody added a new alert to the monitoring system and it kept flapping. And I just wanted ... There was no context around it and I wanted to know, "Is this important? Should I be worried that the seller keeps firing?" And I, for the life ... I never found out who added that alert because nobody would tell me because everyone was so afraid of ... And it wasn't an I'm mad situation. I wasn't some executive, I was on their team just trying to figure out what was going on. And I couldn't because they had been yelled at so many times for making normal mistakes. The kinds of mistakes that literally every single person has made if they've interacted with computer.

Mike Julian: Oh, that's rough.

Ryn Daniels: Yeah. And it's the kind of thing where if you have that sort of environment, you're never going to be resilient because people are going to keep more and more information to themselves, and a big part of resilience is learning. And you're not going to be able to learn effectively if you don't actually know what happened.

Mike Julian: Yeah. So working on getting buy in is great. I can see how that's super valuable, but that takes a long time and you may not have an executive that actually cares that much about it. They may not see the value themselves. Are there any more closer to home things that someone could do? Like within just their team?

Ryn Daniels: Yeah, people can look at kind of how they behave within their own team and I think it can really help to try and set up some social scripts to have, especially if you have leaders within your team who maybe have been around the organization for awhile so people listen to them a little more, to try and get those people to model the behavior that you want to see. One thing that I've found pretty helpful when thinking about how do people get information and how do people talk about things is if I have a question about how something works, instead of like private messaging someone in Slack one to one, I will drop that question in a public channel, find the channel where that's most appropriate to do. And I will just ask publicly. I'm like, "Yo, I don't know this thing."

And this is something that like, okay, I've been working for over a decade now. I've written a book, I've given conference talks. I feel like I have enough cred that people aren't going to question too much whether or not I actually know what I'm doing or belong to be there. So it's a lot safer for me than for somebody who's more junior to sort of model this behavior. So that's something that you can try and deliberately do is model. Like, "Here's what it looks like to admit you don't know something or to ask questions." And to do that publicly and to have it be okay.

Mike Julian: I love that advice. One of the ... As a consultant, I go into a lot of different companies all the time and one of my big red flags is when I look at a team Slack, or HipChat or whatever they're using, and the team channel has no activity.

Ryn Daniels: Tumbleweeds. It's so scary.

Mike Julian: It's so weird. And I immediately know that there's a lot of back channel going on. And this should terrify every team manager too because if there's no discussion happening in the team channel, well, it's happening without their knowledge. It's not that it isn't happening.

Ryn Daniels: Yeah.

Mike Julian: Yeah. Like it's always weird.

Ryn Daniels: Yeah. One thing that I really liked that Etsy did was they had pluses or imaginary internet points that were in IRC and then in Slack that was people would give each other pluses for answering a question or for asking a good question or for making a really good pun. Etsy really loved puns. I appreciate that. But that was the sort of thing where like, okay, it takes time to redo your career's matrix to make sure that people who get promoted are the sort of people who are building this sort of environment.

It's a lot easier to make a little chat robot handout imaginary internet points. And that can be ... Some people don't like the gamification and of course there's problems that like, oh, if you have some sort of like insider clique of the cool kids within your company, that other people are going to feel left out. But in the right environment you can have something that's a lot smaller, a lot lower friction like that. So if somebody asks a good question or it gives a really helpful answer, you give them some internet points, and that kind of gives literal incentivizers for those sorts of behaviors that you want to see.

Mike Julian: Yeah. That small little change can actually have a huge impact. I like that idea a lot. Shifting gears a little bit, when we've been talking about the impact of resilient culture and people, I want to talk about the people side of this. I've been following your blog for a while, and one of the things you started talking about when you were at Etsy was take care of yourself more. And I saw that you took up by playing cello and you started doing power lifting and like all that's awesome. One of the most interesting things that I saw you to start doing was this cupcake ritual. What is all that about? Like what led to that?

Ryn Daniels: So my cupcakes where my own variation of Laura Hogan does this with donuts where she wanted to be more deliberate about celebrating the successes and the wins in her career. So every time she did something that she felt was donut worthy, she would go get a nice doughnut and take a picture with it and just talk a little bit about, "Hey, here's this cool thing I did." She wanted to kind of, I think in a way normalize it for people from underrepresented groups especially to talk about like, "Hey, we're doing these cool things and it's not a bad thing to celebrate them." So I started doing cupcakes as kind of a little play on that.

Mike Julian: I love that idea. I think to me one of the hardest problems of celebrating wins is having to decide what constitutes a win. Like for me, we both wrote a book. If you celebrate a win of I just shipped a book, then what's the next one after that? It feels like it almost has to be bigger than writing a book. You're like, "Wait a minute. This is obviously going to mean I'll never get cupcakes again."

Ryn Daniels: I think that's something that I struggled with in recent years. And you mentioned that you'd read my most recent blog post on kind of retiring the cupcake ritual and I think part of it was in a way related to that where I'd done these things that were on my five year career goals. I wrote this book, I keynoted Velocity, I'd gotten this job at Etsy that I really loved and I found myself struggling with kind of where was I trying to go next? And then I had a lot of personal change in my life, moving countries for example. That was a long and involved process, which maybe probably not surprisingly took up a lot of time and brain power and just ability to focus. Bunch of other changes happening as well, some on and off chronic health issues and it didn't feel like I was accomplishing anything anymore. But nothing was living up to the previous cupcakes.

Mike Julian: Right. That has got to be super hard because the fact that you do something huge doesn't mean that anything that comes after that is now not worthy. To me, it's like ... How I view it, because my day to day work as a consultant is I have to look at the tiny wins and then occasionally I'll get a huge win and this is great but most of my life is not, "I wrote a book and then I closed a huge deal and so on and so on." So yeah, I think that's the hardest part about that whole ritual is just understanding what actually is a win. And have you found that having some external support on that made it better? Like having someone basically call you out and say like, "Cut your shit, you just did something awesome."

Ryn Daniels: That has definitely helped. I definitely have a tendency to be a little hard on myself and to downplay my own accomplishments. And one thing that's been really great is having my partner who will talk to me and be like, "Yo, you're full of shit. You have done all of these things." And she really helped me talk ... We were talking through all of this and I kind of realized that just because I'm not doing big, concrete, publicly visible things doesn't mean that I'm not still making progress. So for me, I think kind of stopping with the cupcakes was a way to help me kind of reframe how I think about success and how I think about progress.

I think, and I mentioned this in the blog post a little bit, one thing is that kind of mid-career progress is going to look different than early career stuff where, okay, once you've done kind of the big things that you wanted to do, which obviously doesn't have to be early career by any means, but once you've done kind of the big things, where do you go from there? Or once you've gotten ... It's usually a pretty clear progression, assuming you're working for a place that thinks about career progress, to get to a senior engineering level, but where you want to go after that, it can branch off. It's not as clearly defined. So I wanted to kind of stop focusing on doing things that looked good when celebrated with a cupcake and kind of just take a step back and think about where I want to go with the next stage of my career.

Mike Julian: I love that starting that ritual and ending that ritual were both really for the same reason of helping you think better, think differently about your wins.

Ryn Daniels: Yeah, there's definitely some upsides to celebrating things publicly because you get support from people. I get to feel like, "Oh, if I'm helping other people feel better about their own progress and helping other people celebrate their own wins, that's awesome." But there's definitely then this pressure of, "I've done all these things publicly. Oh no, did I peak when I was 30? What am I doing with my life now?" And it was honestly really scary to publish that blog post because it felt like admitting to everyone that I was a failure and now I'm not celebrating cupcakes anymore cause I don't have anything worth celebrating. But I think we need to also normalize that not everything is this big moment. Not everything turns into a cupcake or a big story that you can give a conference talk about, sometimes it's just the little moments that mean the most.

Mike Julian: Well, on that note, it's been absolutely fantastic having you. Where can more people find out more about you and your work?

Ryn Daniels: I am on Twitter, @RynChantress, and I blog occasionally at

Mike Julian: Awesome. Well, thank you so much for joining.

Ryn Daniels: Thank you.

Mike Julian: And to everyone listening in, thank you for listening to the Real World DevOps podcast. If you want to stay up to date on the latest episodes, you can find us and iTunes, Google Play or wherever it is you get your podcast. I'll see you in the next episode.

What is Real World DevOps?

I'm setting out to meet interesting people doing awesome work in the world of DevOps. From the creators of your favorite tools to the organizers of amazing conferences, from the authors of great books to fantastic public speakers. I want to introduce you to the most interesting people I can find.