Screaming in the Cloud

In the early days, angry nerd corners on the Internet viewed Slack and some of its predecessors as, “Oh, it’s just IRC. Now, you pay someone for it.” Many fell into that trap of wondering about what value such systems offered.The big differentiator? Slack is built as a collaborative business tool.
Today, we’re talking to Holly Allen, who helped make government software better while serving as the director of engineering at 18F. Now, she’s a senior engineering manager at Slack, a collaborative chat program where you can do most of your work through a rich platform of integrations. Holly enjoys taking a weird set of skills that make a computer do things and convincing people who know how to make computers do things do things.
Some of the highlights of the show include:

Safety engineering brings chaos and resilience engineering, incident management, and post-mortem processes together for resiliency and reliability
Slack strives to move really fast while being in complete control
Slack is primarily on AWS, but is working on a multi-Cloud strategy because if AWS is down, Slack still needs to work
Slack has a close relationship with AWS and is a collaborative company; it has immediate access to AWS staff anytime there’s a problem
Slack uses Terraform and Chef and working to determine if its production workflows in Kubernetes would be worthwhile
Disasterpiece Theater: Real scenario that might happen and surmise what will happen; don’t cause production issues, but teach Slack employees
Slack hires collaborative, empathetic people to create a collaborative environment where everyone works together toward a goal
Slack was firmly in a centralized operations model, but is transforming toward development teams to increase responsibility and service ownership
Slack doesn’t encourage remote work because it’s not in a position to put in that investment; day-to-day work happens in hallways and between desks
Slack sees itself as an enterprise software company; an enterprise software company must have enterprise software reliability, stability, and processes
Slack has thousands of servers, so events and disruptions happen more often; system needs to respond, react, and repair itself without human intervention

Links:

Holly Allen on Twitter
18F
Slack
Freenode IRC
HipChat
AWS
Kubernetes
Terraform
Chef
QCon
Datadog

Show Notes

Today, we’re talking to Holly Allen, who helped make government software better while serving as the director of engineering at 18F. Now, she’s a senior engineering manager at Slack, a collaborative chat program where you can do most of your work through a rich platform of integrations. Holly enjoys taking a weird set of skills that make a computer do things and convincing people who know how to make computers do things do things.

Some of the highlights of the show include:

Safety engineering brings chaos and resilience engineering, incident management, and post-mortem processes together for resiliency and reliability
Slack strives to move really fast while being in complete control
Slack is primarily on AWS, but is working on a multi-Cloud strategy because if AWS is down, Slack still needs to work
Slack has a close relationship with AWS and is a collaborative company; it has immediate access to AWS staff anytime there’s a problem
Slack uses Terraform and Chef and working to determine if its production workflows in Kubernetes would be worthwhile
Disasterpiece Theater: Real scenario that might happen and surmise what will happen; don’t cause production issues, but teach Slack employees
Slack hires collaborative, empathetic people to create a collaborative environment where everyone works together toward a goal
Slack was firmly in a centralized operations model, but is transforming toward development teams to increase responsibility and service ownership
Slack doesn’t encourage remote work because it’s not in a position to put in that investment; day-to-day work happens in hallways and between desks
Slack sees itself as an enterprise software company; an enterprise software company must have enterprise software reliability, stability, and processes
Slack has thousands of servers, so events and disruptions happen more often; system needs to respond, react, and repair itself without human intervention

Links:

What is Screaming in the Cloud?

Screaming in the Cloud with Corey Quinn features conversations with domain experts in the world of Cloud Computing. Topics discussed include AWS, GCP, Azure, Oracle Cloud, and the "why" behind how businesses are coming to think about the Cloud.

More episodes

Chapters

Show Notes

What is Screaming in the Cloud?