Serverless Chats

In this special episode, we look back at the first year of Serverless Chats.

Show Notes

Watch this episode on YouTube:



Episode #1: Alex DeBrie
Asking about best practices and the reality of implementing them...
Alex: I think it's pretty fascinating to see. Like you say, if you're on Twitter and you're following a lot of the big time people doing serverless architectures in this space, they have a lot of great tips around best practices, and this is what you should be doing, all that stuff. But I find, as I'm building serverless applications or as I'm talking to customers and users that are building serverless applications, there are times when there's tension between what the best practices are and what their circumstances are. This could be because maybe they're not coming in with a green field application, or maybe they have a data model that doesn't fit DynamoDB or something like that. It's difficult on how you sort of square that with recommending something that you know isn't the best practice or the most pure serverless application, but you also gotta help people ship products, right? I think balancing that tension can be tough at times.

Episodes #18 & #19: Michael Hart
@30:25 (Ep. 19)
Michael: There's nothing special about Lambda in this respect. It's like, this is just sort of best practices if you were calling any API or if you're writing any API that if you're waiting for many, many, many, many seconds, then you might want to deal with that. And those are the sorts of use cases where I think, okay, fine, that's perhaps not a good practice. You actually, you asked me. We use this at Bustle. So we have a Lambda that renders our frontend HTML code. It's a preact app. It does service side rendering of the HTML, but it delivers to the, you know, to the browser via API Gateway and a CDN and things like that. But it calls our other Lambda directly, which is a GraphQL backend. It calls that to pull in the data that it needs to render the HTML page. Now in the browser, it also will call that GraphQL backend , but it will do it via API Gateway. Because it's coming from the browser, so it needs to make some an authenticated HTTP request into the function. But when you're in the Lambda world, well, that Lambda can just call that Lambda directly, and call the GraphQL Lambda, and that goes to Redis and Elasticsearch wherever it needs to pull the data and send it back. And we just make sure we have the timeouts tuned such that, you know, I mean it responds within milliseconds. It’s not even a thing we would really run into.


Episode #2: Nitzan Shapira
Asking about the need to automate instrumentation...
Nitzan: Yeah, by the way, it's not just worrying. It's not just the fact that you can forget. It's also just going to take you a certain amount of time - always - that you're going to basically waste instead of writing your own business software. Even if you do remember to do it every time, it's still going to take you some time. Some ways that can work [are] in embedded in your standard libraries that you work with. If you have a library that is commonly used to communicate between services, you want to embed that tracing information or extra information there, so it will always be there. This will kind of automate a lot of the work for you. That's just a matter of what type of tool do you use. If you use X-Ray you're still going have to do some kind of manual work. And it's fine, at first. The problem is when you suddenly grow from 100 functions to 1000 functions — that's where you're going to be probably a little bit annoyed or even lost, because it's going to be just a lot of work and doesn't seem like something that really scales. Anything manual doesn't really scale. This is why you use serverless, because you don't want to scale service manually.


Episode #3: Marcia Villalba
Asking about what service owns the data when using AppSync...
Marcia: Well, then, it's the question on who owns the data, and that's something, at least with AppSync, I'm still trying to figure out how to really architect my application, my graph qualifications, because I've been using GraphQL with microservices, and usually I do the filtering in the microservice because the microservice knows the data, knows who can see it, and I don't want to leak that information out. But with AppSync, at least applications and have been building, they are mostly contained into Dynamo tables and Lambdas. So I think when I'm coding this that AppSync is the owner of this data and, then I do the filtering in the resolvers. So I think it's always a question of who owns the data and who is able —where is the level that you want to leak the information out? I don't know if it's clear.


Episode #4: Chase Douglas
Asking about the complexity of the development workflow...
Chase: Yeah, for all the benefits you get from serverless, with its auto scaling and its capabilities of scaling down to zero, which reduces developer cost, you do have some things that you have to manage that are a little different than before. One of the key things is, if I've got, like, a compute resource like a Lambda function in the cloud that has a set of permissions that it's granted and it has some mechanism for locating, the external service is like SQS queue or an SNS topic or an S3 bucket. So it has these two things that it needs to be able to function the permissions and locations. So the challenge that people often hit very early on in serverless development is if I'm writing software on my laptop and I want to test it without having to go through a full deployment cycle, which may take a few minutes to ah to deploy the latest code change. Even if it's, ah one character change up to the cloud service provider. How can I actually test with proper permissions and proper service discovery location mechanisms from my laptop? What mechanisms are there to do that? That's something that we are always evolving.


Episode #5: Mike Deck
When asking about event-driven architecture...
Mike: I think that it's probably easiest to understand it when contrasted against kind of a command-driven architecture, which I think is what we're mostly sort of used to. So this idea that I've got some set of APIs that I go out and call and I kind of issue commands there, right? So I maybe have an order service. I'm calling create order or I've got downstream from that. There's some invoicing service now, and so the order service goes out and calls that  and says, "Create the invoice, please." So that's kind of the standard command-oriented model that you typically see with API-driven architectures. An event-driven architecture is kind of, instead of creating specific, directed commands, you're simply publishing these events that talk about facts that have happened, you know these are signals that state has changed within the application. So the order service may publish an event that says, "hey, an order was created." And now it's up to the other downstream services to, they can observe that event and then do the piece of the process that they're responsible for at that point. So it's kind of a subtle difference, but it's really powerful once you really start kind of taking this further down the road in terms of the ability to decouple your services from one another, right? So when you've got a lot of services that need to interact with a number of other ones, you end up kind of with a lot of knowledge about all of those downstream services getting consolidated into each one of your other kind of microservices, and that leads to more coupling; it makes it more brittle. There's more friction as you're trying to change those things, so that's a huge kind of benefit that you get from moving to this event-driven kind of architecture. And then in terms of kind of the relationship to serverless, obviously with services like AWS Lambda, you know, that is a fundamentally event-driven service. It's about being able to run code in response to events. So when you move to more of this model of hey, I'm just going to kind of publish information about what happened, then it's super easy to now add on additional kind of custom business logic with Lambda functions that can subscribe to those various different events and kind of provide you with this ability to build serverless applications really easily.

Episode #30: James Beswick
Asking about thoughts on embracing asynchronous patterns...
James: I think a lot of what we're building makes distributing computing just easier for developers, and when you think about the scale of lots of developers now have to face with their applications, even things like mobile apps, these are complicated problems to solve when you get spikey workloads and just huge numbers of transactions coming through. So a lot of these tools just make it that much easier.

But the mental hurdle is going from this synchronous model to this asynchronous model. And so if you're used to building synchronous APIs, initially it can seem a bit alien trying to figure out the different patterns that are being involved. But it seems like the natural evolution given the fact that you've got all these services in the middle that have to handle all of this traffic, and the timing issues involved, you know, start to evolve from where you are in the synchronous space, but I think what's been put in place is not too difficult to understand.

Once developers start using this, they find actually for many cases, it's the right way to go, but it's interesting to watch this because I know that just even 12 months ago people were talking about the API Gateway, this 29-second, 30-second limit problem, do all this stuff throughout your infrastructure. Or you heard about the Lambda limits of five minutes, then fifteen minutes because people were trying to work this way.

I think now we're going back to thinking about how do we break up these tasks. So it's shorter-lived tasks that run between services in an asynchronous fashion. So the whole model is really evolving.

Episode #40: Eric Johnson and Alan Tan
Asking about storage first...
Eric: Yeah, and I do call it storage first, or sometimes I do a presentation called Thinking Asynchronously, but the idea ... Let me go back to my earlier statement. The most dangerous part of an application that I'll ever build is my code. Right?

So, when I build an application, I want to get that data stored first. That's the thing. I tend to go DynamoDB because that's what I like, that's what I use, but there's different purposes.

I know Jeremy, you and I have had this conversation before, and you're an SQS guy, so that's where you tend to go, and we do this because we look at okay what's the pattern for the retry or the DLQ or different things like that.

For me it's because I'm going to continually write back to Dynamo. On the app, I'm specifically thinking about it. But the idea is if API Gateway can directly integrate with the storage, be it S3, be it DynamoDB, something like that, then I've stored the data and I don't have to go back to the customer if my logic fails, right?

So, in an application I've stored the data, let's say I'm using DynamoDB, I do a stream, it triggers a Lambda, I start processing that data. If somewhere in there, something breaks, and again, it's going to be my code, but let's say something breaks, then I don't have to go back to the customer and say hey guess what, I blew it.

Can you give me your data again? Can you resubmit that? And continue to trust me, because I'm sure I won't lose it again. Instead, I've got that data stored, and I can write in some retry or take advantage of the retry from an SQS or an SNS or something like that.

So, I think it's a really cool pattern for building resilience into our application. Serverless comes with a lot of resilience anyway, that's how AWS has approached this on look as much as we'd like to say nothing ever breaks, let's write as if it does, right?

So, let's degrade gracefully. I think this adds even another layer of that, where I can degrade in my code and know hey I've still got the data. I can write some retry logic. I can use existing retry logic. I think it's a safer pattern.

It does require ... The storage first is the pattern I call it, but it requires thinking asynchronously. What can I do after I've responded to the client and how do I work with them?

Episode #41: Paul Swail
Asking about the things developers need to think about with asynchronous applications...
Paul: There are a few things. Number one, I would say distributed tracing. If you have a multi-step use case where there's a lot of data processing going on in the background, you probably now have multiple log files to search through. There are... If you were doing that synchronously, you could just look in the one place, more often than not. There are strategies around using correlation IDs within each message so that the same, say in CloudWatch you can query on for that correlation ID and get an aggregate of any log entries across your different log groups, which have that correlation ID within it. You need to build that in to your application, your Lambda code, that doesn't come out of the box. Another consideration is testing, writing automated tests. It's just harder for asynchronous workflows, so if you're writing asynchronous... Say I write in Node.js and Jest test frame work. If I have a synchronous Lambda, it's generally pretty easy to write an integration test for that. You just hit the end point, invoke the Lambda function, and just verify the response. If you have a multi-step asynchronous data processing workflow, then you need to test each one of those individually, during an actual...Writing an end-to-end test is difficult. But I just like having a wait step, that just waits until background processes have, you hope, have completed and then you can do whatever verification steps you need. Just generally understandability, it's not a thing in itself but it's just for if you've got a new developer on your team... A lot of teams that I've worked with are more full stack web developers which are used monolithic synchronous workflows. Got a new guy on your team and it's just explaining to them how each piece of the pie fits together. That's just going to take time and documentation really is the only solution to that. It's just... Good documentation is important when you've got these asynchronous workflows.


Episode #8: Ran Ribezaft
On the importance of distributed tracing…
Ran: Up until recently, like recently, like two or three years, I would say that distributed tracing is not a mandatory thing that each R&D team needs to have as part of its arsenal of tools. Today, I think it's almost like a crucial or vital thing that you need to have. The main reason is that we already know that applications are becoming more and more distributed. So, for example, once a user is buying something at your store, you want to make sure that it gets the email to him with the receipt and the invoice and so on, as soon as possible because otherwise he's hanging there, waiting for confirmation or waiting for something to get to him. And in a monolithic way, it's been pretty easy because you had something specific, a single thing that will take care of everything. But now we've got, like between 3-300 services that might take care of this operation: one that will get the API request from the user, from the Web server, the other one that will parse the user request. The third might be something regarding billing that will charge through Stripe or through another service. The fourth one could be something that is mailing users, and all of them are connected to each other, with some messages that are running from one to another. It could be like a star, or it can be like 1-to-1, all the way up until it gets to the email service. And without distributed tracing, you wouldn't be able to ask yourself this question: how long does it take for a user once you buy something until the moment he gets his confirmation. Because if it takes, let's say, for example, a ridiculous number. Let's say one minute. It's not good. I don't want my user to wait one minute in my website for confirmation. I want it to be, let's say, sub-second or let's say sub-five seconds. Other than that, it doesn't meet my SLA. And only with distributed tracing can I really measure end-to-end traces and not just a single trace every time.

Episode #12: Emrah Şamdan
Asking about when to take action when there is an error...
Emrah: You know, most of the tools that, both with CloudWatch and with the other monitoring tools, did the alerts are just for a single error. So you're just having a one Lambda invocation, then an error happens, and most of the time they're paging an alert. But we thought is this something that is actually wanted by people. Is this something that prevents people from alert fatigue? You know, we are coming from OpsGenie, and that's why we are very, very careful about not putting people into alert fatigue. So we ask people, “What is the definition of failure for you?” Like we asked tens of people,”What do you think? When do you think that this serverless architecture has failed?” And the response is that not a single error, like most of the time, it’s not an error. So when I call something an incident, when it causes something cascading failures. So I have a problem with Lambda function, and this Lambda function should should have triggered another Lambda function through SNS, and this triggers another Lambda function to, let's say, SQS. I'm just throwing out a scenario here. So this first Lambda function fails and the others couldn't even start. So in this case, we can understand that we are in very big trouble, that we lose some transaction there. That’s a failure for most of our people that we talk with and the other stuff is that for, at least, especially for upper management, the invocation duration, invocation count, any kind of abnormality about these metrics, are not very important. And they are seeing cost as a signal of failures. So let's say when they want to allocate $10 per day in to the serverless architecture and let's say, one dollar per a function. In such cases, they want to get alerted when the cost exceeds this threshold. They are not interested in if the function is running more than expected because of a third-party API. They're not interested in if the problem happens because of an input error. They just wanted to see if the cost is exceeding something, some threshold. Because all off their motivation was, when joining to Lambda, to save cost. And they don't want to read that with a problematic situation.


Episode #6: Erik Peterson
Asking about cost...
Erik: There is that tension there. I mean, sometimes it's a healthy tension, but there is that tension there, and it kind of, I mean it goes like this: imagine you needed to explain, let's say, a very complicated system that you constructed, and now you're trying to explain it in French to the Germans, right? You're speaking a different language. And that's the hard part, right? You know, you can ask the question, "Well, why did we spend $20,000 this month on EC2 more than we spent last month," for example. And well, it's because the product team had a new initiative. We had to do a migration. We had to do this. We had to move data from over here. We had security requirements, so we needed to encrypt the data. So we're calling the KMS API a lot. And then that resulted in a whole bunch of new storage and processing. And you're talking, talking, talking, and then you look up and there's just a glazed-over look on the finance guy's eyes and they're going, "Yeah, no, no why did we spend $20,000 more this month? And how much are we going to spend next month?" And they go, "What? I can't talk to you. Get out here." Right? And ultimately you want to tie it back to well, look, this product initiative cost this much money, and we forecast it to be X. And we have an idea, before we actually go down that path, how much it's going to cost and cost has been a part of it. Because there, I mean, for a long time in engineering has been a notion of non-functional requirements, right? What kind of performance requirements do you have? What kind of uptime requirements do you have? And the hard question that I think organizations need to ask themselves is, "Well, what kind of cost or budget requirements do you have?" And at what point are you going compromise the the budget for the user's experience or vice versa? You know, you are you gonna go, "You know what. User experience matters at all costs. Even if it's $1,000,000 in extra spend this month, our users must be absolutely happy." Okay. Make that decision consciously. Today, I don't think anybody's consciously making that decision.


Episode #9: Gunnar Grosch
Asking about common things missing from resilient applications...
Gunnar: One common thing that I see when we're performing these types of experiments is that, like we said before, we don't have graceful degradation, so that the systems they show ever messages to the end users or, parts just don't work but are still there. So we don't have UIs that are non-blocking. And that's a perfect use case for a chaos engineering to be able to find those on and, well, then fix them.

Episode #51: Adrian Hornsby
Asking about acceptable levels of staleness in the data...
Adrian:  Well, even if you claim your application is very dynamic, and you claim that, no, I need to, I can't cache because for example, it's a top 10 list of real time trends on Twitter. Let's say Twitter trends, right? People expect that it's real time. So, I would say by default, if you think about real time, people wouldn't think, "Okay, I need to cache that." But, if you have millions of clients around the world requesting that data, absolutely you're going to fake it real time. It's, you might query your downstream server or service that tells you the trend, but maybe if you have thousands of clients connecting at the same time, you don't want each of those clients to query your service, you will just serve it from cache, or make sure that the requests are packed into one single request and then that's the downstream service and then serve back the content.

So it's just this idea of like any application out there, even if you think it should be a, must be real time, it's very important to think about the staleness. And staleness is how real time my data needs to be, even if it's maybe three seconds old, is it really that old or is not usable? Because it's also something you can fall back. So if your database is not accessible, it's like, maybe you can serve back the trend of Twitter, that was maybe an hour ago, and just instead of... and you can say to your customers, you can say, "Oh, we're experiencing issue, this is a trend one hour ago." And that's fine, it's a good UI. It's good use of stale data. And why would customer say, "Oh, you're cheating on us?" No, it's like... I think it's a good example of that.


Episode #10: Slobodan Stojanović
Asking about local testing…
Slobodan: So in the early days, when I started working in serverless, I tried to just do the things that I did with non-serverless applications. So my first try was like to install DynamoDB locally and use it as a local database and test against it and things I got. But then there were so many different services such as like Cognito or SNS or many other things that they can, cannot just install locally. So there are some things that can simulate them. But these are simulations. You don't want to run your tests against simulations because they will not give you the right results. And then the second, my second try was basically to mock these things. So I tried to find some complex mocking libraries and things like that that will mock everything and return realistic results and things like that. And even that is like leading to so many errors. And some things are not mocked. You have some special things in your code or we're using the old library, so you need to send some pull requests and who knows what. So these things become more and more complex, and in the end we just decided not to do that. Instead, we want to run our tests locally. Unit test mostly. And whenever we want to test, to run integration tests, I want to test my code against real DynamoDB, which is on AWS or real SNS or real services that are on AWS and that they're working in the cloud.

Episode #47: Mike Roberts
Asking about integration tests...
Mike: Yeah. And what integration tests are about are testing your assumptions effectively. So when I talked before about functional tests, I said, "So we're going to mock or stub the response that comes back from DynamoDB and make sure that we're doing the right thing with that." That makes an assumption that we've correctly defined what comes back from DynamoDB. And so what integration tests do is validate those assumptions, they validate how you expect your code to run within the larger environment and the larger platform.

And we absolutely advocate for doing that, but remembering that running and maintaining integration tests is a costly exercise. They take a long time to run and they also take a long time to maintain because things change over time. And so, we put a lot of work into the integration test section of the book. And John did this extraordinary thing with Maven, and those of you that are Java developers understand this, but where we run Maven test, which is one command line, and what that does is it brings up an entirely new stack of all of the components in our application, runs all the integration tests against it, and then if the tests work, then it immediately tears that stack down.

We wouldn't have gone into that amount of effort to get that stuff working if we didn't think integration tests were valuable, but we also understand that because they're expensive based on in terms of our time and computer time, that we want to minimize the number of those that we write. And so we're looking normally at just a few, but capture hopefully a number of cases, but again, we're thinking, we're not testing the code when we're writing integration tests. We're testing our assumptions about the larger environment. If we want to test the code, that's when you write a unit test or a functional test.


Episode #11: Hillel Solow
Asking about tools that can help protect developers from runtime attacks…
Hillel: My number one cliche would be: let's focus less on mitigation and more on prevention. I know that's super cliche in security, but I think here, one of things that we see a lot is that you get a lot more mileage out of trying to make sure that the things you're deploying are deployed with least risk than you do at trying to chase after attacks. And again, that's not to say we don't need to do both. We will forever need to do both. No amount of proper configuration hygiene is going to prevent every type of injection attack or cross-site scripting attack, or whatever is on our infrastructure, right? We need to mitigate all of those things. But in cloud applications and particularly in serverless cloud applications, the value of hygiene and posture is much greater than it was in the past. You know, whether it's the things we talked about earlier, like just leaving around old stuff that could put you at risk but you don't need, or it's getting IAM configured properly, or it's things like setting timeouts to their minimum threshold, if you can. All those things are going to give you a tremendous amount of value in making it hard for attackers to do what they want to do on your system, before you even started looking for a SQL Injection, right? You still need to look for a SQL injection. We still need to run the tools that we’re going to run. But before we get there, before you start worrying about blocking and mitigating kind of runtime attacks, spend a significant amount of energy on: What do I have? Where is it? Do I need it? Is it configured in a way that gives me the least risk? Have I isolated the things I can isolate? Am I doing all that continually? That would be my number one focus.

Episodes #23 & #24: Ory Segal
Asking about the legitimacy of serverless security threats…
@32:02 (Ep. 23)
Ory: Exactly. But you have to remember that as a security practitioners and specifically as researchers, we are trying to flag potential future risks. If we were to only look at what's being used and exploited today, we will always be in a dog chase with attackers. So, I think it's very good that security experts and security researchers look for the next attacks in a new technology and finding it before it's being exploited. And so you can then teach developers how to avoid these and hopefully, reduce the attack surface.

So I think it's not necessarily bad that we're pointing out things that haven't been exploited yet. And as you mentioned, regarding the evidence, usually attackers there aren't web forums where attackers share war stories of how they hacked into a system. So obviously, attackers don't publish anything about their techniques. And especially if you talk to the application owners and companies, most of them also don't like to share information. In fact, usually when they give the server a security conference talk, at the end, there's that five minutes that you save for questions. Usually, I know that nobody's going to actually ask a serious technical question because they are embarrassed. It's like something that you don't want to talk about around other people from maybe competing organizations.

So there's no resource to go and look at and see how people are exploiting and what are the vulnerabilities. We collected information from customers and prospects, I've reviewed dozens if not hundreds of serverless Apps at this point, and we collected this information to see what are the most repeated risks that people do.


Episode #17: Brian Leroux
Asking about why Architect chose DynamoDB…
Brian: Yeah, I mean, it's a decision making process. And it's one that a lot of people are aren't comfortable with. It’s a managed database, which is a nice way of saying that it's a proprietary database. It's owned and run privately by Amazon. And, you know, after, our history has a, or our industry has a long history of being gun-shy of these databases because of Oracle, frankly. And I don't blame anyone for painting Amazon with that brush. “Oh, my database. That's my data. I don't want them to have that. I want to control it.” The only people that say that, by the way, are people that have never sharded a database, you've sharded a database once, you are happy to let someone else manage that for you. You are more than happy. How much does it cost? Fine. Less than a DBA. So that's going to be a good deal for me. So once you get over that initial concern, which isn't a real concern, by the way, that free tier is extremely generous. You could run a local instance of this thing yourself headlessly if you want for testing and building out locally, so you don't have this requirement of the cloud. And the free tier’s insane. I think you get something like 20GB in the free tiers. So, like you could build a lot of app with 20GB. A lot of app. You could put images in there, you don't want to, but you could. Yeah, it's a great DB. I guess the other thing that people get a little tripped up on is the syntaxes. It’s a bit strange. It’s coming out from a different world. I don't think it actually is that strange, for what it's worth. I'm pretty sure if you'd never seen SQL before and I showed it to you, you’d be like, Well, that's strange. I think just what you're used to. It's a sadly verbose query language. It takes a lot of directives in JSON form to make it do pretty trivial things. We've written a few higher level wrappers for it to make it a bit nicer to work with, but it's all about the semantics. Single digit millisecond latencies for up to a MB at a time querying, no matter how many rows I have? That's unreal. But we've never had a database that can do that. And I'm happy to pay for that capability.

Episodes #34 & #35: Rick Houlihan
Asking when NOT to use NoSQL...
Rick: So NoSQL is really suited and as we talked about, we have to denormalize the data, right? Which does that means I have to structure it and tune it to the access pattern. So if I don't really understand those access patterns, if they're not really well-defined, then maybe what we're looking at is a different type of application that's not necessarily so well-suited for NoSQL, right?

And that's really what it comes down to. There's two types of applications out there. There's no OLTP or online transaction processing application which is really built using well-defined access patterns. You're going to have a limited number of queries that are going to execute against the data, they're going to execute very frequently and we're not going to expect to see any change or we're going to see limited change in this collection of queries over time.

And that's a really good application for NoSQL because as I said, we have to kind of tune the data to the access pattern. So if I only have a small number of access patterns, then it makes sense, but if the customer comes in and tells me, "I don't know what questions are going to be asked. This is maybe my trading analytics platform and who knows what the brokers are going to be interested in today or tomorrow and I look at the query logs of the server and there's a thousand different queries and some of them execute once or twice and never to be seen again and others execute dozens of times."

These are things that are indicative of an application workload that may be, is not so good for NoSQL because what we're going to want is a data model that's kind of agnostic to all those access patterns, right? And it has that ad hoc query engine that lets us reproduce those results. So lucky for us in the NoSQL world, that's actually a small subset of the applications, right? 90% of the applications we build have a very limited number of access patterns. They execute those queries regularly and repeatedly throughout day. So that's the area that we're going to focus on when we talk about NoSQL.

Episode #36: Suphatra Rufo
Asking about the adoption of NoSQL databases...
Suphatra: Yeah, yeah. I think the way that consumers behave, the retail industry is a good example. You probably didn't know that Sears, Kmart, Barneys New York, Party City, I can name a dozen more retailers that just last year, either completely closed down or had to significantly reduce their number of stores, just last year. It's because retail isn't done the same way anymore. Those spikes are now a common part of life and people are having a hard time figuring out how to handle it. Tesco, which is the largest grocery chain store, I'm not sure in America or in the world, I'll have to check that... But they, in 2014, crashed on Black Friday because they couldn't handle the spike in the demands they were getting online. So they lost an entire day of business on Black Friday because they couldn't handle that workload. And then the year later, they went on a NoSQL database and now they can handle that load.

I think what people are seeing is that normal day to day business operations are fundamentally different. For example, the fashion industry used to have only four clothing seasons. Your mother probably remembers buying a new outfit every season... So winter, spring, summer, and fall. And so women's clothiers would go and create new clothes four times a year. Now the fashion industry has 52 seasons, so every week is a different season of women's clothing, which means there's a spike every week for every launch of every new clothing line. So that's another big database problem that's now just becoming a regular part of life. A decade after NoSQL databases are invented, it's really not a new invention anymore. Now this is just the way of business.

Episode #39: Lynn Langit
Asking about the tools enterprises were using for big data...
Lynn: Well, change comes slowly and change is usually induced by some sort of pain. And so the pain in my case and my customer's case was through IoT data because IoT data increased the amount of data exponentially because the event based data. So I had some customers, some of the big, big like the biggest appliance manufacturer in the United States. Customers, I can't name, but you can guess who they are. And this was maybe eight years ago, so it was still a while ago. They wanted to IoT enable their devices.

And again, to be very clear, the majority of the enterprise applications that I would work with would be SQL plus NoSQL because they would have a need for transaction. And again, that's really important because I saw those startups go just directly to NoSQL and then they would call me and they would try to tune their transactional consistency of their Mongo and it would be clustered and it would be a mess. And then we just pull that out and put it in MySQL. Just the whole space was super interesting. So meanwhile the cloud vendors are evolving and Amazon of course comes with DynamoDB. And I have to tell you that initially I was super resistant. I was like, how do you even query that? I actually did some time tests and blogged about ... this is like seven, eight years ago.

You write SQL query, everybody knows how to do that. You write a Dynamo query, it takes 15 minutes because you have to research the query and how much is that in your dev time and dah, dah, dah, dah, dah, dah, dah, dah. So there was resistance including me. The service on the cloud that really stunned me and still does is BigQuery because BigQuery offered SQL querying, which I think it's extremely important when you're evaluating different kinds of database solutions to look at what is the ramp up time to understand how to get data in, how to take data out. And the more different the query languages are, the more errors you're going to have too, and this is your data. So I've seen a lot of bad things where developers overestimated their abilities. And because the query languages were really idiosyncratic or esoteric for the NoSQL databases, it was all kinds of problems.

But BigQuery's idea of, okay, you get around the scaling problem by using files and you then just use SQL and you just pay for the time. I mean, I literally, I got goosebumps. I was stunned when BigQuery came out. I was stunned. I really got it from the beginning. And I've written about it, I've used it. I would often add it for customers as sort of an incremental rather than NoSQL.

Episode #44: Alex DeBrie
Asking about migrating data and changing access patterns…
Alex: Yeah, absolutely. And this was actually a late addition to the book, but I just got so many questions about, I don't want to use Dynamo because what if my access pattern's change, or how do I migrate data? Things like that. So I actually went through it, I think it's not as bad as you think. And I split migrations into two categories, basically. First off, they're just additive migrations, where if you're just adding a new application attribute to existing items, you just change that in your application code, you don't need to change anything in DynamoDB. Or if you're adding a new type of entity that doesn't have any relational access patterns with an existing entity, or if you can put it into an existing item collection of an existing entity, you don't need to do anything. It's just a purely application code change there.

The second type of migration is, I need to do something to existing items, either because I'm changing an access pattern for an existing items or I'm joining two existing items that weren't joined together, or I'm adding a new entity type that I need a relation and there's no existing item collections to use there.

So now you don't only need to change your application code, but you need to do something with your existing data. And that's harder. It seems scary, but it's actually not that bad. And like, once you've gone through one of these processes, these ETL migration processes, they're pretty easy. It's basically a three step process. You're going to have some giant background job that's going to scan your table. You're going to look for the particular items you need to change. So if it's an order, and you need to add GSI2PK and GSI2SK for it, you find your orders in that scan for each order that you find, then you add these new attributes on them.

And then if there are more pages in your scan, you loop around and do that again. So it's just this giant wild loop that operates on your whole table. Depending on how big your table is, it might take a few hours, but it's pretty straightforward, that three step process: scan your table, identify the items you want to change and change them.


Episode #20: Sheen Brisals
Asking about how LEGO developed confidence to go serverless...
Sheen: Yes, that's a very, very good and important point because when we often look at a monolith, we often get confused. Where do we make a start? Because it looks everything big. But the thing is, we need to start looking more closely part-by-part. So then we will be able to identify some small entry point into the system that will give us the comfort to, you know, try out something new. And also, when you have, when you work in the organizations, you need to prove or showcase these things to stay stakeholders, you know, to get their buy-in. So for that, it's important that we identify a part of the system that is not complicated, small enough that we can experiment with the new ideas and show them, show them the proof that it's working and that’s feasible for us to go forward to everyone around — not just the engineering team. Bring the business stakeholders, everyone together and yeah, so that's very crucial when, especially when we start this monolith to microservices serverless journey.

Episode #22: Bret McGowen
Asking about the transition of existing applications to serverless...
Bret: I think one of the messages I want to kind of get across is that maybe moving to something like containers and Cloud Run, it feels like you're taking on way more than just getting started with functions as a service, right? Your Lambda, your Azure functions, or Cloud functions. And there is. There is a little bit more to manage. I don't think it's a huge amount of overhead, but I think what this really does is it enables a lot of your existing workloads to start to be serverless. Because we all have apps that have we started before serverless was a thing. Even if it's not perfect, we would love to get them to be serverless.

Episode #15: Mark McCann and Gillian Armstrong
Asking about advice for people thinking about adopting serverless...
Gillian: So I tell people, especially in big enterprises, the same for both serverless and AI, which is: start now. You’re already behind. If you haven't started, you need to start now. It takes a while to learn all the things that Mark’s just said — a lot of things. It takes a while to move your mindset from highly architected things before to serverless. Serverless is very different, even the microservices. So even if you're familiar with microservices. This is still a different paradigm. So it just takes a little while to learn. It takes a little while to move all your existing practices and thinking about how you build your systems. So you need to start. You need to find places that are sort of safe-to-fail places where you can try things out and then gradually scale up. I think the big thing is, if you run the company, do you create time for people to learn. Do let them have that space and, you know, find your people who are really, really passionate about it and then let them loose.

Episode #32: Ken Collins
Asking about advice to other companies adopting serverless...
Ken: Yeah, I think it's always to look at what your business is doing right now. And, where you need it to be sort of performing at first of right. So, always drives my success. I'm a very huge believer in DHH the sort of creator of Rails. That you do the majestic monolith first. You build an application out, and then you sort of look at where it needs to either be performing or broken apart. For Custom Ink. If, your story is anything like ours, it would basically be starting with the monolith. Looking to where sort of business units lie in that monolith and then breaking out into what we sort of call key domain services. So, we would extract the design lab from the monolith. We would extract the product catalog from the monolith, we would extract, a group order form and quoting systems and things like that from the monolith.

So, that to me is a really good way to sort of adopt the cloud. If you know you're going to be breaking up to this monolith into smaller parts. Some of them could be Lambdaliths, some of them could be say Fargate or EC2 instances, whatever. But, I think when you look at what's happening with your current application, let your success drive your architecture. And, I think Lambda is a good place for either moving apps, but it's also a really good place for, if you're in AWS. To question if that's an opportunity for you to look at doing data, and events and units of work in a different way.

Episode #33: Yan Cui
Asking about the biggest roadblocks that companies adopting serverless are running into…
Yan: Well, the biggest one I feel is by far is just education. Like I said, Lambda itself is getting more and more complicated because of all the different things you can do with it. Other roadblocks includes for example some organizations are still holding onto the way they are used to operating. With centralized operation teams, cloud teams. The feature teams don't necessarily have the autonomy they need to take full advantage of all these different tools that you get and all these power and agility you get with serverless, your team can build a new feature in a week, but it's going to take them three weeks to get anything they need done, provisions and to get assets to resources they need. Then again, you're not going to get the full benefit of serverless.

So a lot of that legacy thinking at the organization is still there and is still a prominent problem and roadblock for people to take full advantage of serverless. But in terms of actual adoptions, a lot of it is ... In terms of technical roadblocks, there's some, I think the last question you had was around some use cases that just doesn't fit so well. When you've got a really high throughput system, the cost of serverless can become pretty high. So imagine you've got something that's relatively simple, but how to scale it massively like your Dropbox, not a super complex system, but have to scale to massive extent. So for them it makes perfect sense to move off of S3 and start to build their own given hardware so that they can start to optimize for that cost.

For a lot of companies, they do have that concern as well. They may not have a very complicated system that requires a hundred different functions on this massive event driven architecture, maybe they just have five end points. But those five end points are running at 10,000 or 50,000 requests per second. So in those cases, the cost for using Lambda and API gateway would be excruciating and you'd be much better off paying a team to look after your community's cluster or your containers cluster than have them running them on Lambda.

But that's always a tricky balance. Because, oftentimes you can always get the reverse argument whereby, "Well, Lambda is expensive, so I'm going to just do this myself." But then you're hiring someone for $10,000 a month to look after your infrastructure, and your Lambda bill is going to be, I don't know, $100.

Episode #37: Peter Sbarski
Asking about the challenges of training people how to build using serverless…
Peter: Look, I hate to use the word paradigm, but it does feel, it is really a paradigm shift. Because serverless, it feels like, this is what cloud was supposed to be all along, right? You're not dealing with low level infrastructure concerns. You're not provisioning your servers and thinking about memory capacity, but you're thinking at a high level of abstraction, you're thinking in terms of code, you're thinking in terms of functions and services and event driven architectures. That's interesting. It's different and it requires people to really think in new ways.

Look, I think, honestly, the adoption of serverless will hang on education. If it can educate people, serverless as a concept as an idea will be successful. I think that's what we're all working towards. This is what you do nearly every day, right? You educate people on serverless. You blog, you talk, because this is the way we get people to understand.


Episode #25: Farrah Campbell and Danielle Heberling
Asking about the serverless community being more welcoming…
Farrah: I definitely do. Serverless is a new approach. It's inventing a whole new way to build software applications and it's sticking. And this community, I feel like everybody has a lot of work to do, a lot of big things to accomplish and everybody's at a starting point. Everybody's willing to have open conversations without putting others down or explain to you why you're wrong about something. Everybody's at the same starting point and just trying to learn from one another.


Episode #28: Nader Dabit
Asking about the idea of the full-stack serverless developer...
Nader: Right, right. We think that what we're doing is a little different than anything that's kind of been out there before, I think. And we don't really have something to compare it to, but we talk about it in a couple of different ways. One of the things that we talk about is this idea of a full stack serverless development, where you're a developer, or you're a team, or you're a startup, or you're a company and you want to be able to enable a developer or a team of developers to build the front-end and the back end, versus having the traditional maybe engineering team where you have a backend developer and then you have a front-end developer.

We're looking at it like, what if a developer could just be looked at as a full stack developer like we've seen forever. But instead of the traditional full stack developer where the backend developer might be in charge of creating servers and creating a database and patching and dealing with all of the different backend resources, we could take the serverless philosophy, use that and then apply the front-end developers and merge that together and enable a single developer to build out these full stack apps, or a team.

Episode #50: Guillermo Rauch
Asking about the shift from compute on demand to pre-rendering...
Guillermo: Yeah. I think a lot of people in the industry have over focused their attention on computing on demand, which is what Lambda enables, right? You're literally firing up a VM. It's amazing how easy AWS made it. It's almost like a miracle that you deploy your function so quickly, and it executes so quickly, and it's secure and in a VM sandbox, and their underlying technology is absolutely incredible with FireCracker, but the question that you have to take a step back and ask yourself is that do I really want to be computing so much? Do I want to be burning electricity and competing cycles so much?

This is where when we really sat down to analyze this problem, we realized the vast majority of pages that you visit every day on the internet can be computed once and then globally shared and distributed. So it's like the technique of memoization and functional programming where you compute once and then of course, you want to read it from that intrinsic automatic cache that you get, is different from caching because caching requires a lot of developer effort and thinking. Memoization gets closer to what I envisioned to be the foundation of serverless front end, which is basically static generation where the computation happens once probably as a result of some data pipeline, something that changes, computation happens. HTML is spit out.

That is all, and even in the case Vercel, it's powered by functions too by the way, but the funny thing is that the developer never even thinks about functions. They just think about building pages that then get pushed to the edge and then consumed by visitors. Now, that's not to say that the on-demand use case doesn't have any merit. Not everything can be computed statically. There's lots of pages where you sign in to a dashboard, and you have to query data that could absolutely not be cached. A great example is you log into your bank, and imagine that you were trying to statically generate your dashboard with your bank account balance, but you just want to check that your payment went through for utility.

You're not sure if what you're reading is up to date or not. You would go crazy, right? The movement of front end has also led us to where that dashboard is a single-page application, most likely, that is also served statically from the edge. Then there's JS code that runs on the client's side that then queries that back end. What we found is that front end is really powered by this set of statically computed pages that get downloaded very, very quickly to the device, some of which have data in line with them. This is where the leap of performance and availability just becomes really massive, because you're not going to a server every time you go to your news, your ecommerce, your whatever.

You're just downloading it from your very own city, but even in the case of like, "I may have to make a strong read, not a read that could be stale," you're basically also downloading static content that then runs JavaScript on the client, and then that goes to a server. Then the question becomes, "Who's writing that server, and how much of that server are you writing?" This is like the other big question that is, I think, coming up. We're confronting that in the serverless world is like, "Okay, I have all these amazing primitives to build everything in the world that I could imagine from scratch, but does it make sense to build everything from scratch?"

Does it make sense for you to build your own authentication function with Lambda if you could be reusing a standalone authentication service? That's why this interesting world is coming up where there's a rise of the front end, but then there is a rise of the API economy. What I mean by the API economy is that we have services like Stripe and Twilio and AWS Cognito and Auth0, and MagicLink, and all the services where you're just making some quick API calls sometimes directly from the client side, right?

That is a serverless world that seems so much more in my mind attuned with the actual ideal and the actual, original promise of serverless. I think we are too much. You're giving the example of SQS and Dynamo. We erred too much on always rebuilding from scratch a little bit, so focusing on the front end allows you to reprogram your product strategy in a way, where like, "Okay, I'm going to think about the customer first. I'm going to think about building my back end very, very low in my priority list, right?"


Episode #31: Aleksandar Simovic
Asking about voice automation
Aleksandar: Naturally, of course, you can order things from Amazon or whatever, but you're now slowly starting to see, for example, print, give me some report, or send this or send a message to someone else. And hidden and seen in the appearance of these Echo Shows in the past two years, where you can even see when something is happening in front of you.

Jeremy: Right, it's like a visual interface that's on top of your voice commands?

Aleksandar: Exactly, and what my kind of prediction, and things that I'm working on, I'll talk about it later, is that we're going to come to a point where things are going to be automated using Alexa on many manual things that we're already doing right now, I don't know, office, office manager thing, office manager tasks, like I don't know, send somebody a reminder or schedule a meeting.

Actually we already can see that using Alexa for business, but all these small pieces are starting ... Like people are starting to discover how can they easily use it, but as serverless evolved, and now people are actually building huge applications on like enterprise scale applications on serverless, and we saw that on Reinvent, at this last year.

This is how things are going to evolve with voice as well, so we're going to see an explosion of higher order kind of software, like more complex software, that's ... You might have an intelligent agent or an Alexa skill that's going to be able to do some financial or maybe do your taxes, you don't know, you know?

So, things are slowing building. These building blocks are appearing. We see AWS is building this whole serverless ecosystem around itself, where it's going to be a piece of cake actually combining these components and creating something out of the box.


Episode #42: Susanne Kaiser
Asking about building better software with domain-driven design...
Susanne: So your domain driven design comes with a core statement that in order to build better software we have to align its software design with the business domain, with the business needs, and the business strategy. So domain driven design helps you with aligning your software design with the business domain needs and the strategy and it's very crucial for building your software solution, because otherwise, you are building something that, for example, are matching the requirements of your users. Instead you have to collaborate intensively with your domain experts to gain domain knowledge and to understand the problem first before you're solving it. We are tending to jump directly into solving a problem technically and, yeah, yeah, we can just let's deploy it on a Kubernetes cluster, but we have not understood the problem first. That's really crucial to the build better software.

Episode #49: Jared Short
Asking about building code for transparency...
Jared: Sure. Yeah. And I mean, a lot of companies are building towards, I'd say short-term right now value versus long-term stability. And I get that. It makes a lot of sense, especially financially in certain cases. If I need something right now versus, what's this going to look like in six months, when we circle back to it.

But ultimately building as if you're going to open source at any time, I think forces you to at least think if somebody was looking over my shoulder right now, right? If I was building something and say, "If Jeremy's looking over my shoulder right now, am I really going to put like my GitHub magic key, or whatever into this line of code and just hard code and deploy. And be like, 'I'll fix that later?' I feel like Jeremy's going to be back there and be like, 'Really man? I respected you and now no, like there's nothing.'"

So I think it helps you justify the few extra minutes or in some cases, hours or days, to make the right technical decision. And I get tech that's a thing. Look, go look at open source code. There's tons of stuff out there that's like to do actually make this work appropriately or optimize this thing. That's fine. Nobody's going to judge you. We all get it. People write software and they understand software is hard. But they are going to judge you pretty hard if you make poor security decisions or poor architectural decisions where it just doesn't make sense. And I think having that fictitious open source gazer over your shoulder, it just helps you make those decisions and kind of think to yourself.


Episode #48: Linda Nichols
Asking about trusting cloud services...
Linda: Yeah, absolutely. And I mean, I think in the cloud in general, because when we were talking about standing on the shoulders of giants before, we were talking about using NPM packages or Ruby gems or whatever. And like you kind of just don't even know who's written those. And there's some security kind of considerations there. And also, like, you just don't know how much they're tested, you don't know if that maintainer is just going to go find another job somewhere. When you're talking about cloud services. This is true for all the major clouds. I mean, they are tested by millions of people, they are used billions of times. So I mean, if you... No one's going to say like, oh, I just don't think that... Fill in the blank servers, like Cosmos DB. Like oh, I just don't think it's really tested that much. Or like, what if the person who works on it leaves. Well, there isn't one person, right? It's a whole team of people, and there's a company that supports it.

So I mean, I think it's somewhat it's kind of ridiculous when people don't trust cloud services. If you don't want lock in that's a whole other discussion, right? But you if you're already in a cloud, I mean, a lot of people are multi cloud also, and they kind of spread things around to try to minimize that sort of lock in feeling. But really you get locked into libraries too right? If you're writing... If I'm writing a Node.js app, and I'm using some NPM package, yeah that thing's going to stick around forever. How often do you go back and switch out your NPM package?


Episode #7: Taylor Otwell
Asking about the future of serverless...
Taylor: Yeah, I think the next five years will be huge for serverless, I really do. I think it is the future, because what's the alternative, really? Like more complexity, more configuration files, more weird container orchestration stuff? I don't really think that's the future, you know, that people are gonna naturally gravitate towards. I think people want simpler things. And I think at the end of the day, serverless is simpler. It's going to only get more simple as the tooling gets better, as the platforms get better. And to me, it's the real endgame, you know, of the whole server thing, just deploy your code and you focus on your code and let the provider focus on the infrastructure.

Episode #38: Ben Ellerby
Asking about what serverless looks like in five years…
Ben: I think it's going to be more abstraction and more consolidation around how to do things. So in five years it's going to be more obstruction around that configuration so you're not having to manually configure retry policies out of the box, you're sort of being able to sort of, well maybe not pointing click, but in a very short amount of yaml we'll be able to have an event driven architecture and maybe that becomes formalized, an event sourcing service rather than just an event bus. Maybe other areas of event driven become more formalized, but it's always going to be an increase in abstraction. We went from virtualization to containerization to function as a service and other things as a service. Now we're sort of building more event driven. We can have obstructions at different levels, so there might be obstructions in particular services or abstraction of the whole architecture.

Things like the serverless framework have tried serverless components, AWS has tried the serverless application repository. And those things have varying degrees of success, but built into all of these services, although we're going to give it more, we seem to have had a spike of complexity recently as so much has been announced and so much has been released. I think we're getting more abstraction. If we take the amazing work you did about integrating RDS with Lambda and you built a whole sort of open source project that really helped people with that. Recently, although there's still a need, AWS has abstracted a lot of that with the RDS proxy. They've seen a need from the community and that abstracted that. If we take EventBridge, people were doing CloudWatch custom events kind of before and hacking it and then they formalize that and provided a level of abstraction. Now is that abstraction going to be driven by the cloud provider or by the community? Well I think it's going to be a bit of both. AWS and other cloud providers are going to add more services but also increase abstraction as it goes on and the community is going to build amazing open source projects that increase abstraction. So I think the move to more abstraction, which means less configuration and configuration is just code. So it means less code to achieve the same business value. So for me it's more abstraction, but right now it feels like less abstraction.

Episode #52: Tim Wagner
Asking about if not addressing state will prevent serverless adoption…
Tim: I have these two strong reactions to that statement, right? One of them is I would say in some ways the most successful thing Lambda has done is to challenge thinking, right? To get people to say, do you really need a server stood up, turned on taking 20 minutes to fire up with a bazillion libraries on it and then you have to keep that thing alive and in perfect condition for its entire life cycle in order to get something done in terms of a practical enterprise application? And challenging that assumption is one of the most exciting, important and successful things that I think Lambda and other serverless offerings have accomplished in our industry. The flip side to this is to be useful, sometimes you have to be practical. And it's equally true that you can't walk up to an enterprise and say, "All right, step one, let's throw all your stuff away and then step two, you're not going to get past step one."

It's funny, we talk about greenfields, brownfields, it's all brown in the enterprise. Even if you write a net new Lambda function, it's running against existing storage, existing data, existing APIs, whatever that is. Nothing is ever completely de novo. And so I think to be successful and be as adopted as possible in the long run, serverless offerings are going to also have to be, they're going to have to be flexible. And I think you see this with things like provision capacity. I mean, when I was at Lambda still, we had long painful debates about is this the right thing to do? And for understandable reasons, because it is less stateless. It took the ... it's obviously optional. We don't force anyone to use it. But by doing it, it makes Lambda look more like a conventional, well, server container, conventional application approach because there is this piece that is a little bit stateful now.

And I think the arc here is for the serverless offerings to not lose their way, to find this kind of middle ground that is useful enough to the enterprises that still challenges assumptions that gets people to write stuff in a way that is better than what came before and doesn't pander completely to just make it feel like a server. But is also practical and helps enterprises get their job done instead of just telling them that ... because just sermonizing to them is also not the right way to do it.

What is Serverless Chats?

Serverless Chats is a podcast that geeks out on everything serverless. Join Jeremy Daly and Rebecca Marshburn as they chat with a special guest each week.