Screaming in the Cloud

Today, data service is becoming more like a utility and that affects the expectations and practical uses of the cloud in almost every form.

Today we are talking to Richard Hartmann about the logistics of serverless infrastructure from how data centers are built to how the cloud is kind of just more of the same in the technology world.

Show Notes

About Richard Hartmann

Richard "RichiH" Hartmann is the Swiss Army Chainsaw at SpaceNet, leading both a greenfield datacenter build and monitoring. By night, he is involved in several FLOSS projects, a Prometheus team member, founder of OpenMetrics, and organizing various related conferences, including but not limited FOSDEM, DENOG, and Chaos Communication
Congress. 

Links Referenced: 

What is Screaming in the Cloud?

Screaming in the Cloud with Corey Quinn features conversations with domain experts in the world of Cloud Computing. Topics discussed include AWS, GCP, Azure, Oracle Cloud, and the "why" behind how businesses are coming to think about the Cloud.

Transcript
Speaker: Hello, and welcome to Screaming in the Cloud with your host, cloud economist, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.

Corey: This episode of Screaming in the Cloud is sponsored by O'Reilly's Velocity 2019 conference. To get ahead today, your organization needs to be cloud native. The 2019 Velocity program in San Jose from June 10th to 13th, is going to cover a lot of topics we've already covered on previous episodes of the show, ranging from Kubernetes and site reliability engineering over to observability and performance.

The idea here is to help you stay on top of the rapidly changing landscape of this zany world called the cloud. It's a great place to learn new skills, approaches, and, of course, technologies, but what's also great about almost any conference is going to be the hallway track. Catch up with people who are solving interesting problems, trade stories, learn from them, and ideally, learn a little bit more than you knew going in to it. There are going to be some great guests, including at least a few people who've been previously on this podcast, including Liz Fong-Jones and several more. Listeners to this podcast can get 20% off of most passes with the code cloud20, that's C-L-O-U-D-2-0, during registration. To sign up, go to velocityconf.com/cloud. That's velocityconf.com/cloud. Thank you to Velocity for sponsoring this podcast.

Corey: Welcome to Screaming in the Cloud, I'm Corey Quinn. I'm joined this week by Richard Hartman, who has decades in open source. We met originally back when we were f staff and since then he has done a lot of other things too. You were a Debian developer, you organize a bunch of conferences, including but certainly not limited to PromCon, FOSDEM and others that I don't care to think about.

And you come from mainframes, now you're into networking, then you started building out redundant data centers as turnkey solutions, and apparently you're currently building a data center, that I choose to believe is located in the middle of a swamp.

Richard: It's actually a Greenfield Project, and we couldn't build it in the middle of a swamp because we are going for the highest certification within EN 50600 which is security and availability Class 4.

Corey: Gotcha. So among many other things, you're in town here in San Francisco and terrifyingly close to me, for Google Next, which as at the time of this recording, just finished.

You are a member of the Prometheus Core Team, but that wound up driving you out here to sit through, effectively, three full days of talking about Google's Cloud. What do you think?

Richard:

It was nice, it was interesting. Many of the talks were a little bit sales pitchy, like a little bit too sales pitchy for my liking. They usually follow them all over initially like the first third or so maybe, they had some higher level technical details, like not really into depth, then they segued their way into why you should be buying from them.

Which obviously makes sense from that perspective, on the other hand, it's not the type of conference which I'm used to, lets say.

Corey: It feels like all of the major public Cloud vendors have this problem. Once they hit a certain point of scale, they have one big Cloud conference every year. You have Microsoft Build, you have AWS re:Invent and you have Google Next, where the conference is trying to do so many things that it almost loses a sense of itself, where you're trying to sell things to people and there's that sales piece of it, there's trying to articulate a vision for the next year, there's product announcements, you're talking to engineers, you're talking to corporate buyers, there are press in attendance, they have analysts that come through and start to ideally say nice things about them.

And when you get all of that together, it's very hard to build any kind of cohesive narrative that addresses all of those constituencies. So, at some level when you're at one of these it feels like you're always in the wrong place, listening to the wrong story from the wrong people. And I've never found a good way to solve that.

Richard: I don't think there is a good way to solve this, of course, inherently you have all those different priorities and all those different goals and to juggle of them just doesn't work. At least not at that huge scale which they put together. So I'm not actually complaining, it's just an observation which I made that they seem to be this way.

There were other things, also minor, but one other thing which I noticed, the analyst's lounge, which is sitting right smack in the middle of everything, has full catering and everything, whereas the speaker lounge is basically a coffee maker and some granola bars. So that gives you a little bit of insight into the relative value which is assigned to this. But again, I'm not complaining, it's just, I couldn't help but observe that this is happening.

Corey: Credit where due. The press lounge was also super nice.

Richard:

See? That's my point.

Corey: To some extent, this seems like a bit of a departure from Google's historic positioning as engineers, first, last and always. And I think that you sort of have to, once you grow beyond a certain user profile.

It's interesting to see how that's going to be maintained going forward, I mean, there have been enough jokes made about it, but historically sticking to things that are not core to what they've always done, mainly search and ads has always been something that Google has seemed to struggle with.

So while they're saying the right things, I think people are mostly going to adopt a wait and see approach, at least for our time.

Richard: That is probably correct, I mean, from my perspective, Google has absolute top notch engineering, and this is an engineering driven company by in large, so it just stands to reason that a lot of the internal culture is also engineering driven. Which tends to disregard a lot of other needs of other people and teams and organizational units.

So I fully agree, this messaging needs to change for more traditional businesses to actually be able and willing to adopt their product. On the other hand I do hope that they don't lose this striving for technological excellence.

Corey: I would be very surprised if they lost the pursuit of technological excellence, I would be less surprised if they lost their willingness to engage with large enterprises. It comes down to fundamentally I can see them reverting back to what their company was built on, their corporate DNA as it were. I can't see them completely pivoting and abandoning where they've spent the last 20 years.

I'm not saying it won't happen, but I have a hard time imagining it.

Richard: As of right now, I would tend to agree, to be honest, on the other hand if you look at most companies, like the large ones, they had these huge growth phases and they were very very engineering driven and then at some point, what will you be promoted for. And at some point this becomes more like enterprise stuff, maybe marketing, maybe economics, so people with that kind of thinking tend to be promoted more and more as older as the company gets.

So this will, over time, change things, like, I'm not an Apple user, but looking at Apple from the outside, this kind of seems to happen where there's this focus on engineering and on excellence, just gets a little bit lost and their edge also gets lost.

Corey: It's an interesting problem. Changing gears slightly, let's talk a little bit about something you said back when we were preparing for these shows. Specifically that the Cloud is nothing new, it's old again and it's always been this way except for the fact that it's somehow completely different. What do you mean by that?

Richard: What I meant by that is that fundamentally IT stays the same while it completely changes every few years. If you look at any old monolithic application which is huge and horrible and everyone will tell you this thing cannot be maintained, blah blah blah, all these things, still you have functions in there. And functions on a very basic level are not different at all from a microservice.

You change how the API's, how the interfaces, how the service delineations are exposed, you change a little bit of the mix of how you do and it and what you do, and obviously you're always trying to raise the bar for tech as a whole.

And it also comes a little into this thing where I like to say IT breathes, where things go in and out, like you go from one extreme to the other, you internalize and you outsource. You have your monoliths, you have your totally fine grained things. And it just goes back and forth, back and forth. And every time you go towards this other extreme, you're trying to solve a or more problems. And once they've been solved, you will then have other problems.

So you go back to the middle and you overshoot a little, and then rinse, repeat. This seems to be happening a lot. If you do it with too much fervor, you might be overdoing it, on the other hand following this natural life cycle of IT is pretty nice, because you're just raising the bar again and again.

And when you look at Cloud, like all those issues which infrastructure providers have like, how to run a data center, I can tell you, running is even the small part, building it is insanely complex. Like, all these things just go away because you have a different service delineation, and you just build on top of that.

Corey: You're in town to give a talk. Now tell me a little bit first, what that talk was.

Richard: It was titled "Prometheus - What the hype is about", it was a mixture of the usual Prometheus 101, along with why people who are calling themselves Cloud developers, should care about this.

Corey: And what is Prometheus, for those who have not yet attended a Prometheus 101 talk?

Richard: Prometheus is a monitoring framework. It ingests time series data as in numeric data which changes over time, you might think service latency, you might think user count, how many errors you have, temperature, whatever. Just changes over time. It's not geared for events, so you can't put log lines or anything in there, it's purely for numeric data changing over time.

And what you can do with it is you can ingest a lot, a lot, a lot, a lot of data with relatively few resources, like you can easily do on normal hardware, or normal VM, you can easily do a million samples per second and more. It comes out at roughly 200k samples per core. Like, if you want more, just put more cores in, and you're done basically.

So it's super efficient in ingesting the data and also exposing that data back to the user. As you have these immense amounts of data, you obviously need a way to accurately get this data out again. So we have something called Labeling, which is basically key value pairs. And you are allowed to to assign arbitrary key value pairs to your data to then be able to select and slice and dice your data through this n-dimensional matrix which you are building up, so you could do by region, you could do by customer, you could do by prod or def, and all these things which normally are stuck in a hierarchical data model, are all of a sudden available to you as direct first class things.

But having those labels is only half the story. You obviously need some way to actually work with that data, and that's another one of the really nice things about Prometheus. You have this one single functional language which you have to learn, it's called PromQL, and it's basically doing vector math on your monitoring data.

So instead of just having this one graph which never changes and you can't really do anything with it, because you encode stuff into an image file, you can actually take this data and do data science on it. And it's Turing-complete language, it's super powerful. It kind of takes some getting used to but it's really nice once you learn it.

And the next thing is you use this for alerting, you use this for analysis, you use this for graphing, you use this for dashboarding. You can use it to get your data out in JSON format, you have this one single way to access all the data, and it's always the same as opposed to a lot of other systems where you have to think differently about accessing the data, depending on if you want to do alerting, or a report.

Corey: This might be something of a controversial question, or rather the question is not, the answer is probably going to be hotly debated. But at what point does it make sense to do something like that, or to implement something like that, versus deploying one of the many, many, many, monitoring vendors that purport to do not only what you've described but everything else as well?

When does deploying or building your own monitoring system make sense for an organization?

Richard: Fundamentally it's always the same make or buy question. This is no other. Obviously I'm biased, so I would tend to run things myself, which works, and for small teams and such it's super easy to just spin up a new instance and do some monitoring on whatever you want to do. Maybe you just want to do some poking or whatever you want to do and you're super flexible in what you do. But that's only part of the story.

The other thing which Prometheus enables was, it shifted the whole of IT monitoring. And again, I'm biased but from my perspective it actively, it actually shifted or uplifted a whole segment of IT as in monitoring, to a new level.

So there's a lot of vendors which now support similar things, I mean, I do have personal opinions about a few of them, but fundamentally unless they do something completely wrong, it's not a bad thing to use them.

Corey: This ties in, to some extent, to I guess, a past life and something you still dabble in from time to time of network engineering. Once upon a time, if a company wanted to do anything that even touched on IT, they needed to have someone with network engineering expertise in-house. Today, it's debatable whether that's still the case. What do you think?

Richard: You still need people who know how to do these things, but their daily workload will change, massively change. So you might not need someone who, or not a lot of people who are aware of the intricacies of ethernet, or whatever. Like, VRRP setups, tend to be somewhat icky, and if you can avoid them, by all means avoid them. But avoiding them usually means having an overlay network, or having dynamic routing. Which I think is a perfect solution, but it's quite complicated. But again, Cloud shifts the service delineation. And all of a sudden you have to do all those nitty gritty details yourself. You can't buy this as a service. Still you will need someone who is aware of how those fundamentals work.

So you might still need your VPN gateways, you might need someone to connect the VPC from on-prem to your Cloud, or to your multi-Cloud, or whatever. So you still need the knowledge about how things work, but the actual day to day job will change. And by extension obviously the actual skill set needed also changes along with it. But you still need the main experts. Same as in anything else. Like, even if you have a hosted database, it still makes sense to have people who actually are aware of how things work in the background so they can make good decisions about how to set this thing up.

Corey: To some extent people have been saying for generations now, it seems like, that in the future, you'll never have to worry about the undifferentiated heavy lifting, or the toil, you can only focus on writing business logic and doing things that move your business directly forward. I mean, I my own career, once upon a time I started off as a large scale email admin. And that was something every company needed. Today, almost no company needs that. It's click, click, done, with a hosted provider or very occasionally a small central group that runs exchange internally or something like that.

I can't shake the feeling that to some extent the level of expertise required for most companies who are not themselves deep into the, I guess, IT space as what they do, need to have a strong grounding in network engineering that work theory being able to handle complex routing situations, et cetera.

It feels like that has been abstracted away by and large in a lot of, I guess, typical companies. Is that a naïve approach? I recognize you are sitting in San Francisco, where everything here is a web app. There exists an entire ecosystem out there of companies that that does not apply to. I understand that.

Richard:

I wasn't aware you had a career.

Corey:

My parents still believe I don't, it's fine.

Richard: Okay. Yes, like, again, the subset of skills needed changes dramatically. And a lot of those details are just abstracted away behind a new service delineation. So a lot of the things you don't really need in your day to day anymore, like, it still makes sense to maybe have one person, it might not even need to be in the same company who just knows that stuff, of course else, you're bound to make mistakes from the past again and again. Of course, you will always need at least some knowledge of how things work.

But I fully agree that this depth of knowledge fully moves to infrastructure providers. And it's probably a good thing, because most people like most enterprises, at least from my networking perspective, have a really hard time even getting networking people because they just don't care about this type of network.

So, hiding this behind a proper service which is managed by experts absolutely makes sense. At least for those who can do actual Cloud, like you still have tons and tons and tons of legacy implementations, and you have fields and industries where IT is currently nice, but it's not essential. And those have completely different needs. Completely different needs from anyone who's in Cloud web app API world and just living a quite nice life to be honest.

Corey: I refuse to accept that here at Twitter for Pets headquarters. So on a similar vein, serverless has been sort of taking over the world with similar promises, that the only thing you'll ever have to worry about in the future is application code, that it's going to be a magic coming of almost paradise where only pure developments matters, everything else is handled for you by one of several Cloud providers. And everyone's touting this as a new thing. Is it?

Richard: Yes, I think CGI-bin is pretty new. So, on the one hand, again, it's old and it's new at the same time. And it also ties a little bit into this Toil thing, of course to some extent Toil is good, because it lets you learn about how the underlying things work, so you have a better understanding of why something might be happening in a certain way, but jumping back to serverless, the concept of putting a piece of code in some place and having this executed when an external event comes along is not new.

CGI-bin is fundamentally the same. You have a web browser usually, and this makes a call to a thing, and this thing gets executed and it returns with some data and then it dies. So exactly the same thing happens in serverless. Like, you have a lot more emphasis on different API's, you have a lot more emphasis on events. You have this awareness that these events will usually not be generated by a human or by a web browser but by something else. So a lot of those things are evolving in a good and in a nice and more efficient and effective way but fundamentally it's the same as before.

Corey: One of the misnomers that I tend to see from time to time when talking to people about serverless, is that there is a belief that, well, I have some code, and now it's going to take that code and it's going to run it for me. I have a wristwatch that can do that, there's not a lot of value in being able to say that yes, I have a computer. What is more interesting to some extent is, yes, what you said, the event model being able to impact when that code runs, what it takes in, and what it returns. There are economic factors that feel different this time, and maybe that's a bit of a red herring. But the idea of not having to worry in any traditional sense about scaling, that was always a concern with CGI-bin. Not having to worry about paying for things to sit idle, when they weren't being addressed. Instant on, consumption based, economic models start to be transformative for some use cases.

What I think is also very interesting and differentiates this somewhat from CGI-bin, is that there is a thousand different ways to write serverless functions. Most of them are absolutely terrible. Especially things with custom run times, write it in whatever language you want, to my recollection, CGI-bin was mostly a Pearl requirement, wasn't it?

Richard: It started as Pearl. There were other languages which were shoehorned onto it.

Corey: The entire challenge that I see in, I guess trying to view this as sort of the second coming of CGI-bin, is that everything old is new again. What I'm wondering, was there anything in between CGI-bin and serverless. Because we haven't talked about CGI-bin for 15 years in most shops, and serverless as a thing is four or five years old. What happened in between?

Richard: Well, serverless might be the third coming of CGI-bin, of course you have app engine in between. And you had, this ties back to this engineering driven excellence thing where Google was kind of trying to tell people, hey this thing exists and maybe you want to use and maybe we also use it for our own services and have quite some good experience with it, but people didn't really care. It probably just wasn't the current time of, like, in the global market, the time of this shift going in this direction again.

Corey: So as far as CGI-bin versus modern serverless, one of the big benefits of modern serverless is elasticity. The ability to only have things on demand, when you need them, and you don't pay for them when they're not hanging around. Back in the days of CGI-bin I still had to provision servers, carry on an awful lot about capacity planning, screw up an awful lot of capacity planning, and then resign in disgrace.

How does that look today?

Richard: I would argue that in the days of CGI-bin, there was also this promise of someone else is taking care of scaling or of running servers for you, which is not very different from what we hear today. Of course fundamentally it's more or less the same. But the thing which in both cases made people go back towards the other extreme, or which will happen with serverless at some point, at least in my opinion, you still need to keep state. Like that's the dirty secret. You have superflume complexity where people just add features because they can and it's just new and cool and they just do whatever, and you have the system inherent complexity and you cannot reduce this. You can put it behind different services, you can have different API's, you can have different service delineations, but this complexity needs to live somewhere. And as a networking person, one of the main complex things is keeping state for long term. And to persist it in a way that you can still access all those pictures of cats or whatever, long term.

None of these questions are answered by serverless, it's just, okay, it's someone else's problem. But then when you scale up more and more and it's all the time someone else's problem, which is super nice, at some point you will probably hit that wall of I need this to be faster. I need this to be more performant. And so you might be tempted to just bring your code and your data and your state more together again. And this is probably something which we'll be seeing in, I don't know, five years? Ten years? But it will happen. That's always the case. Mark my words. People who listen to this 2030, I've been right.

Corey: And we'll probably be having the same debates in 2030 when that happens.

Richard: Of course.

Corey: And it's going to be different terminology, different buzzwords, my Twitter for Pets reference will seem incredibly dated and oh, Google, the same way we talk IBM today. Because nothing is new again. It always seems that history rhymes.

A question I have for you as someone who is building data centers in swamps in 2019, what is the story for data center economics in a world where for most use cases, a Cloud provider is going to have economies of scale that no traditional data center provider will have, they will be able to offer greater elasticity, they will be able to offer armies of people to fix relatively routine issues, that a typical provider would have to be concerned with, what is the case for a data center in these days?

Richard: On a very fundamental level, the case is where do you think the Cloud is running. Look at all the numbers which are being pushed out about capacity. I mean, you can play a bullshit numbers game and talk about how you use two Eiffel towers of steel or 20, which is a pretty arbitrary measure, of course, you can build in concrete or in steel, so you can even change that.

Corey: That was one of the things that surprised me, in the Google Next keynote. I didn't realize when you ordered steel from a supplier, that you ordered it in the units you used were numbers of Eiffel Towers. That was strange to me.

Richard: Yes, it's a totally, like it's best business practice to order steel in just fractions of Eiffel Towers. That's super common.

Corey: I'll take two Eiffel Towers and two thirds of a titanic please.

Richard: Yep. But you'll have to dive for the latter one.

So anyway. Tons and tons and tons of energy are being poured into building data centers, into making data centers more efficient, so this Cloud is running somewhere. So all those big providers also need to have data centers. That's one part of the answer. So while people might forget that data centers exist, and this is totally fine, of course, you also don't daily think about power plants. Yet if you plug into the wall outlet, you have power. It's just something which exists and it's there and just works. And you have this clearly defined service delineation which because power hasn't changed in a few decades, or maybe 100 years. But still you have this thing and you rely on it. This is the definition of infrastructure. People don't think about it and it just works. And if it stops working, they're really really upset and for good reason.

So for smaller providers, building a data center still makes tons of sense. Of course there is tons and tons and tons of industry and of customers who are not able or willing to go into the Cloud just as of right now. It might be that they have certain legal requirements, especially in Germany, a lot of them are a lot harsher than anywhere else in the world, so a lot of external people who need to okay how a company is run, especially when it comes to financial data, or how it comes to health data or something like that, you can't really put this in the Cloud, unless you run that Cloud yourself. Which is also called Hybrid Cloud. Obviously you can squeeze maybe two or three bucks out more if you go all in on the public Cloud but this gives you less control.

So a large part of building data centers and running data centers if you are not one of those huge players these days, it would be those customers who need co-location, who need really top notch service in those data centers, and need them to be up and running 24/7 guaranteed.

So this is the market we are chasing. And to be honest, we see quite some interest. Like, there is huge interest. It might be part of the filter bubble, that you're just not as aware of this especially in the bay area for obviously reasons. But there is a huge market still.

Corey: The challenge that I see is that, when I do leave the bay area, as happens from time to time, it turns out planes do fly everywhere, I find myself talking to a lot of quote unquote traditional companies who are in heavily regulated industries that are making at least partial shifts to Cloud. They're still investing in data centers of course, but that investment is now being made with an eye towards tapering off further and further over the next ten years.

Richard: Ten years is usually, like if you have a medium to large size contract, ten years would probably be a good measure for default contract time. So it makes sense that this is also the length of time people would be talking. I'm not fully convinced this means they will move fully away within the time, it might just be that's their planning horizon so that's how far ahead they can plan and do plan. We'll see what happens. For the foreseeable future, there's definitely no shortage in people who need this, who really, really, rely on this.

Corey: And I think that's part of the challenge that everyone struggles with. One of the things I love about these large Cloud conferences is that we're able to talk to people who have very different use cases from our own. It's always nice to envision a use case we hadn't personally considered, or talk to someone who is building a thing that you didn't realize existed. That's fun. It's always neat to step outside of my Twitter for Pets bubble.

Richard: Yes.

Corey: If people enjoy what you have to say for some unforeseeable reason, where can they hear more of it?

Richard: The best places are probably either my Twitter account @twitchih or any random conference I happen to walk through and give a talk at.

Corey:

Perfect. And we will put up a picture as well, so that people know what you look like so they can stop you at random and share their opinions with you.

Richard: Great.

Corey: Richard "Richie H" Hartmann. Former freenode staff member, current Debian developer, conference organizer, Prometheus core team member, and friend.

I'm Corey Quinn, and this is Screaming in the Cloud.

Speaker: This has been this week's episode of Screaming in the Cloud. You can also find more Corey at screaminginthecloud.com. Or wherever fine snark is sold.

Speaker: This has been a HumblePod production. Stay humble.