Jeremy chats with Alex DeBrie from Serverless, Inc. about the choices facing developers when building serverless applications, and when a practical approach sometimes trumps best practices.
About Alex DeBrie:
- Blog: alexdebrie.com
- Twitter: @alexbdebrie
- DynamoDB Guide: dynamodbguide.com
- Serverless, Inc.: serverless.com
Alex: Hey Jeremy. Thanks for having me on.
Jeremy: So you are an engineering manager at Serverless Inc. — that's Serverless with a capital "S," not to get confused. They're out of San Francisco, but you actually work out of Omaha, Nebraska. So why don't you tell the listeners a little bit more about yourself and what Serverless Inc. is up to.
Alex: Yeah, sure. I've been at Serverless, Inc. for two years now. I started originally on the growth team, and now I'm working on the engineering team. But, you know, Serverless, Inc. we're the creators of the serverless framework, which is a tool for developing and managing serverless applications. It really reduces the tedium around setting up API gateway and IAM and all that stuff, and really helps you write your business logic and use AWS Lambda and serverless technologies effectively and quickly. There's a huge community of advocates, plug-ins, and best practices around the Serverless framework. I think we just crossed 30,000 stars on GitHub. So I'm really loving what we're doing here.
Jeremy: That's awesome. Yeah, I think if somebody doesn't know what the Serverless framework is yet, then they haven't been paying attention for the last couple of years. So you also write a blog, and you have a really, really good resource for people who are interested in learning DynamoDB, and people who are using DynamoDB and want to learn how to use it better. That's DynamoDBguide.com. That and your blog — what's going on with that stuff?
Alex: Let's start with DynamoDB Guide first. This was when I was still on the growth team at Serverless. I was doing a fair bit of content writing, and we were using DynamoDB a lot. I watched the 2017 re:Invent talk from Rick Houlihan, who's this wizard that works on DynamoDB at AWS. He did a talk on some best practices and I just loved it. I think I watched it four times in two or three weeks. This was Christmas break 2017, and I'm like, "I've just got to get some of this stuff out here." I wrote the resource that I wish I had when I started with Dynamo, because I thought I knew it well, and then I saw Rick teach it, and I did not. So DynamoDBGuide.com — it has a walk-through of all the different API stuff around DynamoDB, secondary indexes, all that stuff, as well as some data modeling examples too.
Jeremy: And your blog is mostly serverless and S3 batch stuff, all kinds of stuff like that, right?
Alex: Yeah, my blog I would say is a lot of, again, sort of like DynamoDB Guide, just the guides I wish I had when I started. I think both with DynamoDB Guide - and then a lot of the content on my blog - it's stuff I was familiar with, and then I want to teach it to people. And then when I teach it, I find I learned a lot of stuff that I actually didn't know. So it helps me, and I hope it helps other people as well.
Jeremy: Great blog, and the DynamoDB Guide is awesome. And yes, Rick Houlihan is a wizard and I don't know how he does some of things he does. But I have watched his 2018 podcast or the 2018 [re:Invent] has a podcast version that I think I listened to maybe 50 times on like .75 speed, so that you can maybe understand it.
So anyways, I wanted to have you on because I want to talk about this idea of serverless purity versus practicality, right? I think that we see a lot of debate on Twitter - and forget about what serverless is and what serverless isn't - but more so, what's the right way? How should we build a serverless app versus how we can practically build a serverless app? I think there's a lot of things around that, whether it's developer experience and that sort of stuff, but what are your thoughts on this sort of debate?
Alex: I think it's pretty fascinating to see. Like you say, if you're on Twitter and you're following a lot of the big time people doing serverless architectures in this space, they have a lot of great tips around best practices, and this is what you should be doing, all that stuff. But I find, as I'm building serverless applications or as I'm talking to customers and users that are building serverless applications, there are times when there's tension between what the best practices are and what their circumstances are. This could be because maybe they're not coming in with a green field application, or maybe they have a data model that doesn't fit DynamoDB or something like that. It's difficult on how you sort of square that with recommending something that you know isn't the best practice or the most pure serverless application, but you also gotta help people ship products, right? I think balancing that tension can be tough at times.
Jeremy: Yeah. I want to dive into a couple of these discussions that we've been seeing on Twitter, and I think, like you said, there are a few champions who sort of lead the effort for each one of these. But let's talk about the API Gateway service integrations. So we know that the typical serverless model would be API gateway to Lambda function, and then access something else. But it's possible to do that without using Lambda, right?
Alex: Correct. Like you're saying, you can do what's called an API Gateway service integration, where maybe you take that incoming HTP event, maybe you validate, authenticate it, maybe twist up the shape a little bit, and then you can put it directly into a different AWS service, like SNS, SQS, Kinesis, something like that, rather than going through a Lambda function first.
Jeremy: If you're just sending the data straight in, and you've got maybe a Lambda authorizer, that's one thing, but what about if you're transforming the data? That's seems like a different beast than writing some Node or some Python.
Alex: Yeah, absolutely. API Gateway allows you to write what are called VTL templates, so it's in Velocity Template Language, which I believe is an Apache project. It's a semi-declarative templating language where you can take some input, like a JSON payload body, the headers, all that stuff, and create a different shape that you want that satisfies the API format of whatever service you're integrating with. It's doable, but I would say not a lot of people have experience with VTL, so it's definitely a learning curve there. It has some quirks and unexpected stuff for people that are new to VTL.
Jeremy: You mentioned the quirks. I'm thinking to myself, here I am writing an application. I've got all my tooling in place. I've got my testing frameworks and I can test all this stuff. Then I say, okay, well, I'm gonna I'm gonna go pure - API Gateway to DynamoDB - and I'm going to write some transformations in VTL. I do that, and then how do I test that?
Alex: That's the tricky question, right? You probably have to deploy your application up to API Gateway, then send in HTTP request, and then check the DynamoDB table to make sure it got there all right. You're probably gonna have more of a a cloud native integration test suite than a local unit test suite that you could go through to validate some of that logic.
Jeremy: But is that a mental burden on developers? I mean, are there trade offs? What's the benefit of doing it versus just saying, look, I can write the transformation in Lambda because I know that. But if I move over to using these VTL templates and things, and I have a hard time testing it, I've got to test it in the cloud — should I be nervous about making changes or things like that. What are your thoughts on that?
Alex: Great question. To me, I think it does add a fair bit of burden around the development process. I think you really got to balance what your needs are and what your comfort level is with VTL versus running something in Lambda and having full test coverage. To me, my rule of thumb is if you're not doing any sophisticated transformation or if you're not doing any fine-grain authorization, I think it's fine to to use API service integrations. The example that I go to is maybe you have a front end or an IoT device or something like that that's just sending in data and you're only validating the shape of it, or maybe transforming the keys around a little bit, before sending it into SNS and Kinesis. Then, it's going to get processed by a different system. I think that's a totally valid use of API Gateway service integrations, if you want to use it.
On the other end of the spectrum, you mentioned connecting with DynamoDB, and you can write directly to DynamoDB there. I don't love it for more sophisticated use cases where maybe you need to pull off a key from the incoming JSON body. Maybe that's going to be your partition key for the DynamoDB table that you're querying. Maybe you're checking a sub-property on there to make sure this user has access to this thing and rejecting, if not, or if you're reconstructing a complex JSON object after you retrieve the DynamoDB item. Any of that stuff where it gets more complicated, my rule of thumb generally is if there's an `if` statement or a `for` loop in your VTL, now it's code. It's not config anymore, and you probably should do it in a Lambda rather than in VTL.
Jeremy: I totally agree with you on that. I love the idea of service integrations, but I think if you start making developers do that mental shift even more so than just moving to serverless, you start to introduce resistance. I don't know if that's the best word. But anyways, so now, we've got maybe a judge's ruling on this. If somebody says, hey, look, yes, I could use the service integration, but I just feel more comfortable using code to do it, what would you say to those people?
Alex: I don't judge those people at all that, because that's mostly where I am on things. I think API Gateway service integrations are interesting and useful in some aspects, but I don't necessarily think it's a best practice, that if you're not doing it, you're doing something wrong. I think it's a choice you can make and and it depends on your circumstances. How often is that code going to be changing? How much control do you need over error handling and messages that you're returning to the client that's calling you, things like that that would determine whether you should use it.
Jeremy: Great. Let's move on to this idea of monolithic Lambda functions versus single purpose Lambda functions. Obviously the AWS best practice is single purpose Lambda function — do something very, very simple. Do one thing well. But I think you and I both know from seeing developers move services to serverless, the most common use case probably is transport an Express app, and have 50 routes in there. What are your thoughts on this?
Alex: I agree generally with the single purpose Lambda function best practice. I think that's that's generally the best way to go, but I think there are two strong exceptions to that. The first one you mentioned is just directly porting over an Express app. It could be porting over an existing Express app, or it could be "I'm very familiar with Express, and and that's what I want to use. That's how I'm productive, but this is the easiest way for me to host Express." This is easier and cheaper even than Heroku or something like that. The first sort of bucket of use cases I would say, where it's OK to have a monolithic Lambda function, are those people that want to run Express, or if you're running Python, maybe you want to run Flask, and handle all your routing within a single function, and you get a pretty great local development experience, but you also get the scaling and easy deployments of the serverless experience as well. I think that's a valid use case for some people.
I think it's also a great on-ramp to serverless because usually what happens is you start with that Express or that Flask app where you're serving Web APIs. Then you decide "I need some background processing, so in addition to this Express app, I'm going to have an SNS or an SQS integration," or "now I need some stream processing, I'm going to use Kinesis." As you do that, you become more familiar with the Lambda model. Now you have all your routes in the the Express app, but you also have these other functions and events that are single purpose, and you start to learn how that works, and then maybe your next project that needs HTTP end points, maybe you reach for a native Lambda single purpose function, rather than going for something like Express or Flask.
Jeremy: You also are going to have the problem too, I think, one of the issues we run into - when you start to build complex serverless apps that that deploy multiple functions with multiple end points and other services that are interacting with them - is you start running into CloudFormation limits as well.
Alex: Yeah, absolutely, and that's the second use case where I recommend people put multiple end points into a single function. CloudFormation only allows you to have 200 resources in a single stack. It may sound like a lot — you're like, how am I ever going to have 200 end points. But the reality is you're not going to get 200 endpoints, because if you ever hook up an endpoint using Lambda on API gateway, it's going to create five resources for each endpoint unit created. It's going to create the function; it's going to create the function version; it's going to create the log group, and it's going to create the API gateway method and path or resource. For every single function you get, you're going to get five resources. Now you're talking max 40-or-so resources or 40-or-so endpoints, but also you got to think about additional things that happen in your service, like DynamoDB tables, SQS queues, SNS topics, other infrastructure as well, so you're probably not going to get even 30 or 35 end points.
Jeremy: How would multiple single purpose functions fit into a concept of a microservice? We often hear the term nanoservice, which I don't really like that term at all, but you have multiple functions working together to form some sort of microservice. What are your thoughts on how you would set that up with serverless?
Alex: I think a lot of the same general microservice principles apply. Find out where your bounded contexts are, and that's where you can split things up. It's probably not one function. It's not the one function nanoservice that you're talking about, but it's probably a whole set of CRUD functionality around a particular resource, and maybe even two or three resources that live together and interact pretty closely with each other, are related to each other. Anything that touches the same databases or shared infrastructure might be in a service. Anything that the objects really relate to each other pretty closely, I would split those into a service. If you can do that and stay under the 200 resource limit and CloudFormation while having a function for every endpoint, I think that's great. If you can't, that's when you start to look into other options of putting it all together into one function.
Jeremy: I really like the Serverless framework, putting everything together within a single microservice, because then you test it altogether. It just makes it a little bit easier to reason about how these services his work together, as opposed to just deploying 30 functions from one serverless YAML file that you have no idea or that they don't necessarily work together or work in concert. That's for me anyways.
All right, let's see if we can make Paul Johnston's ears ring and talk about using relational databases with serverless. That's another one of these topics where you've got relational databases are already RDBMS versus something like DynamoDB. Obviously we've got Aurora Serverless now, which again is very flexible. It grows, but it's still resource limitations, and they've introduced the Data API so a couple of different ways that we can kind of interact with that. If you listen to Rick Houlihan, and you follow the Church of Rick, you believe that DynamoDB is the future, at least for OLTP apps or DSS, and that these are the kind of things where if you can fit your model into it, it seems to make sense. What are your thoughts of relational versus DynamoDB and how it fits into the serverless world?
Alex: As we talked about the beginning, I made the DynamoDB guide. I'm very much a believer and lover of DynamoDB. I think it's just a great tool. Specifically, I think it just works so well with AWS Lambda, right? With AWS Lambda, you have this world of what I call hyper-ephemeral compute, where your compute can scale up to 1,000 invocations in a minute or can scale that back down to zero just as quickly. Something like that doesn't work very well with a relational database where you need to set up a persistent TCP connection and maintain that connection, and in your database, probably has a maximum number of connection limits — connections that can be established at any time. If you scale up to 1,000 instances of your Lambda function, all trying to hit your database, you're going to run into limits, and that's going to be tough to debug. Now you have to set up something like a PG bouncer, like a connection pool, between your Lambda Functions and you're already RDBMS.
The additional problem, too, with these sort of more server-full databases like Postgres or MySQL, or anything like that, is often you want to have those network partitioned where they're not accessible to the public Internet. Now you need to put them in a VPC, which means your Lambda needs to be in a VPC, which means at least for right now, there's a VPC cold start penalty of multiple seconds. It's something that your users are really going to notice if you get that. I think the connection model, and that model of RDBMS does not work well with Lambda, and DynamoDB does work really well. And that's what I love about DynamoDB, HTTP connection, IAM authorization, very, very high scale ability, potential if you need it. But the data modeling aspect is the tricky part in there.
Jeremy: Let's talk about data modeling for a minute, because I can take pretty much any entity relationship model and normalize it, go to 3NF or whatever and I can do that. I can visualize it in my head. I can probably do it without even writing it down, and I could do that. I think that most people who are designing and building databases or building data models, they understand that as well. But when you can't do third normal form anymore, and we're starting to de-normalize that data across all this stuff, there's a learning curve. It's like a giant sudoku puzzle sometimes trying to figure these things out. Does that outweigh the benefit of using DynamoDB? Does it outweigh that trade off on that learning curve?
Alex: That's a great question. I think that's really the crux of this whole issue. I think there are two issues there. One is that short-to-medium term learning curve that you're talking about, and it's a steep learning curve, because you really got to figure that out. It's in such a sensitive area: your data. You don't want to lose data, or having to perform a migration down the road is costly. You really want to do it right the first time, but you often don't have enough experience the first time, so that can be pretty tricky. I think there's that first issue of just learning it. To me, I think it's worth putting in the time to learn it, maybe using it in some smaller areas first, really getting a feel for it, and then figure out how you can use it in more areas down the road.
The second issue with DynamoDB - and this is a more persistent one - is DynamoDB is great, if you know your access patterns and they're not going to change. You know all the ways you're going to read from your database. You know all the ways you're gonna write to your database, and that's going to stay persistent over time. That's going to scale up as as high as you want it to go. The difficulty if you're bringing a new product to market or something like that, your data model is probably shifting as you're adding your features. You're adding new entities. You're changing how you're querying, how you're filtering all that stuff, and DynamoDB is not well suited for changing your access patterns down the road. It's schema-less, but schema-less doesn't mean free-flowing, "do everything you want," because you really got to think about how am I going to access this data? If that changes, you're in big trouble. I think that's one area that's that's still pretty tricky. Serverless is so great for rapid experimentation, really shipping quickly, changing stuff, and focusing on business value. But then, you have a database that's locked in once you've set up your core data model, and how do you both evolve your data model with your app?
Jeremy: I think that for most people, if you start talking about overloaded indexes and adjacency lists and things like that, it gets a little bit confusing. I totally agree with you on the changing access patterns. It's just that I look at this now and I say every time I want to do something, small app - and again we're not talking about a full-blown application with 50 different access patterns - we're talking about maybe a small microservice that does something very specific. There may be 10 access patterns or five access patterns. I look at something like that and I say, do I really want to set up a database for this? It seems like it's possible to store this stuff with all the magic that we've learned from past from past re:Invents. All that magic is possible. We can do these hierarchies. We can do these one-to-many joins, and many-to-many joins, or not really joins, but we can represent these relationships in a DynamoDB.
Is it worth it to take the time to learn these these skills so that when you have that next project maybe you do know the access patterns or that they're going to be relatively simple? Does it make sense to do that? Should this be the de facto? Should DynamoDB be your de facto database, if you're developing a service application on AWS?
Alex: I'm a strong proponent of yes there. I mean, I wrote the DynamoDB guide. I love it, so I think yes. I think it fits so many things about serverless that it's just a nice fit with Lambda. Both the pricing model, the connection model — everything I think really works well with the service application. You're absolutely right that there is the learning curve. But I think those type of use cases that you mentioned, where you have a microservice and it only has 5 to 10 access patterns, maybe you don't even need a secondary index, or maybe you just need one. That's really the perfect use case for Lambda, where you've got something isolated, just a few types of entities (you can have as many rows those you want). I think that's a great fit for Dynamo and for serverless applications.
Jeremy: I think the other piece of it too is I've had people say "well, yeah, but I can't do counts. I can't do aggregations. I can't do those sort of things." I've always found it very, very easy - whether you're doing scans of the database - to dump that on a regular basis. Or now that you can use DynamoDB streams, you have the ability to replicate that data and either aggregate it yourself as part of that calculation or just dump it into another database. I've done that before, where I've taken Dynamo and dumped it into SQL, or into MySQL, so that you could actually do some magic on the back end. In terms of my app being able to scale and access those things, my users don't need to run complex queries. They just need to get data, put data, maybe see a list of some data that's associated with this, and that, to me is a very, very good use case for Dynamo. With all the other services around that, it seems crazy to me to start with the assumption that you need a relational database. But again, I don't know. I always was on the other side, and now I think I've become a convert, but only because I've seen what's possible.
Alex: True. One thing I want to mention that I just found out today - I think this is so cool - but you can actually copy data straight from DynamoDB into a Redshift cluster. If you're using red shift for analytics, which, at my last company, I did a lot of that, you can go into that Redshift database, and run a copy command. It is fully managed, pulls out all the data from DynamoDB, puts it into a table on Redshift, and now you can query on it, which I think is really cool. You don't need to mess with streams or anything like that. It pulls it all out for you.
Jeremy: That is very, very cool feature. Okay, let's move on to one more topic here. This has to do with optimizing your Lambda functions, or optimizing your serverless apps in general. Obviously you see a lot of people saying, "oh, my Lambda bill was 18 cents this month." I've talked to other people who said, well, if that's the case, then maybe you're not running an enterprise serverless application, if it's only 18 cents. Is there such a thing as premature optimization? Are we trying to make our package sizes smaller? Do we go through all this extra effort? Ben Kehoe, for example, said a number of times that the code that they used for iRobot, it's not worth it for them to optimize that code, because it runs just fine and it doesn't cost a lot of money. Of course, they are enterprise. What are your thoughts on this idea of optimizations?
Alex: That's a great question. That's interesting to hear from Ben Kehoe. He's a great guy to hear from. I think it depends on your use case. It depends on how your application is being used. Ben Kehoe works for iRobot, and it might not be a lot of user-facing stuff, where maybe the robot vacuum is sending up data. Or maybe they're processing data in a background, offline fashion, and it's not like a user-facing thing where a user's going to notice any latency. But on the other side, someone like Brian Leroux, who I really like, he's the creator of the architect framework. He's vigilant on package size, and he was saying the other day that their CI/CD process actually fails the build if their package size is over 5 MB, which I think is really interesting. His point there is that the bigger your package size, the longer your function cold start is going to be, because now it's got to load all that code into memory before it starts executing, and it's going to take it a while. If you're building a user-facing application, now that's something you think about where it's not going to be as snappy for your users. It's something to think about, for sure.
In terms of purity versus practicality there, you need to think about your use cases and what matters to you. If you're not gonna have a user-facing application, I wouldn't worry that much about optimizing it, or if your bill's not that high right now, don't worry about optimizing it. Most importantly - I think this is true of serverless or non-serverless, but I think it's been a focus in the serverless community - focus on building a product that brings value. If speed is something that brings value to your customers because they want a quick, responsive app, then maybe focus on speed. Otherwise, focus on building those features in that core experience that your users are really going to care about. Focus on that first rather than some of the optimization techniques.
Jeremy: I don't think a cold-start latency every once in a while is going to be what puts the nail in the coffin of your application.
Alex: I hope not.
Jeremy: It's likely going to be that you don't get to market fast enough or you don't iterate on it enough. I know me personally; I'm the worst when it comes to front-end. I'm not a typical front-end developer, but I do a couple things here and there. I maintain a React app and I do some of these things. I will spend hours just trying to get something to align right sometimes it seems. It's an incredible waste of time, and the optimization there really isn't worth it. And I liken that to the back-end of building serverless applications.
I think it goes back to what we originally talking about with the service integrations. If you're getting started and you're building an app that you're testing, you're trying to get out there fast, then maybe all of these things we highlighted - build a monolithic Lambda function, if that's easiest way to do it; use Lambda; don't worry about the service integrations, if it's not going to cost you that much more, especially if the app doesn't have a ton of traffic; use a relational database if you need to, if that's the quickest way for you to build an application and model it and figure out what your access patterns are going to be - do that. And you know what? If you're using a bunch of dependencies in order to make the app run, do it. I just want people to jump into serverless and start using it, and then to figure out these things later.
Maybe we just wrap this up on this topic. You hear a lot of people talking about "we should be using service integrations." I don't know if they're saying that you have to use service integrations, but they're saying this is a good way to do it, and let's use DynamoDB because that's a good way to do it. Lately, I've been sort of guilty of that myself. And oh, [they say] "let's make the package sizes smaller; let's use these single purpose functions." What is your advice to the developer who is getting started with serverless. What is your advice to them when they hear all this noise about all these best practices and stuff?
Alex: I'd say jump in. You can you can worry about the best practices and all that stuff, but really, you've got to jump in, start building some stuff, and figure it out. Do a little research on best practices, but don't take it as gospel, because you'll build some stuff. You'll figure out what works for you and what doesn't, and you'll get better over time. Ben Kehoe often promotes that there's a serverless spectrum, or even serverless as a ladder, where you start at just pulling off little pieces of your app and putting them in Lambda or using some managed services. Over time, you start to bite off more and more of that serverless mindset and get into the serverless ecosystem. I think that's true here, too. You don't need to go whole hog the very first time you build a serverless application. Get the core of the benefits, which is the easy deployments, the scaling, the pricing model, the time to market really, and the total cost of ownership. Then, you can get more and more pure as you go.
Jeremy: Well, I don't think we can add much to that. So let's wrap it up. Alex, thank you so much for joining me and sharing all of your serverless knowledge with the community. If people want to find out more about you, how do they do that?
Alex: Sure, Jeremy. Thanks for having me. You can find me at Twitter. I'm @alexbdebrie. My blog, as Jeremy mentioned at the beginning, is https://www.alexdebrie.com/. I've also got my email there, if you want to email me, my Twitter DMs are open. I'm always happy to hear from anyone that has questions in the community.
Jeremy: And you are on GitHub (https://github.com/alexdebrie) and LinkedIn and all of those other social platforms.
Alex: I'm findable.
Jeremy: Great. We'll put all that in the show notes. Thanks again.
Alex: Sounds great. Thanks, Jeremy.
What is Serverless Chats?
Serverless Chats is a podcast that geeks out on everything serverless. Each week, Jeremy Daly chats with another serverless champion to explore and do a deep-dive into specific topics in the serverless space.