Serverless Chats

In this episode, Jeremy chats with Guillermo Rauch about the difference between front-end and backend serverless, how we should think about and build for scale, why latency down to the first contentful paint is so important, and so much more.

Show Notes

About Guillermo Rauch:
Guillermo Rauch is the CEO of Vercel, but before starting the company in 2015, he was CTO and co-founder of LearnBoost and Cloudup, acquired by Automattic in 2013. Guillermo is also the creator of several popular Node.js open source libraries like socket.io, mongoose and slackin. Prior to Node.js, he was a core developer of the MooTools frontend toolkit.
Watch this episode on YouTube: https://youtu.be/iRNxV9vRg6o

Transcript

Jeremy: Hi, everyone. I'm Jeremy Daly, and this is Serverless Chats. Today, I'm speaking with Guillermo Rauch. Hey Guillermo, thanks for joining me.

Guillermo: Hey, thanks for having me.

Jeremy: You are the CEO of Vercel, which was formerly ZEIT, so I'd love it if you could tell the listeners a little bit about yourself, your background, and what Vercel is all about.

Guillermo: I'm the CEO and co-creator of Next.js, which is the React framework for front-end development and JAMstack development. Vercel is the platform for deploying projects like Next.js and many other frameworks. Vercel focuses on making the lives of front-end developers really, really easy, allowing them to push their pages to our edge network, and have a very delightful serverless development experience.

Jeremy: That's what I want to talk to you about today. The last time, I think, we saw each other in person was back in... Was it back in Milan, I think, right? Almost two years ago at this point, maybe it was last year. I don't even remember. Quarantine has lasted so long at this point that I can't keep track of time. The last time I saw you, I was speaking about this idea where I felt like serverless was getting harder and harder and harder.

That was or it seems to be the wrong approach, right? We want serverless to become easier. This is something where, I think, this idea of maybe I think you call it front-end serverless or serverless front-end is where you're trying to go with Vercel. I'd love to just get your thoughts on that, just that complexity that we're now pushing towards the back end, and where you're trying to go with the front end.

Guillermo: I think you nailed it. I think the serverless world is big and complicated. I think when we first met, we really connected on this idea of like, "What is even the right definition of it?" We were both presenting at Milan trying to give a definition for it. It's a pretty silly game to play to try to even fight that fight. When I think about serverless, I think about wanting to give people a very good recipe for leveraging that kind of technology.

I think anything that relates to serverless or infrastructure really needs to disappear. It has to be all about letting people focus on their products, focus on their pages, focusing on the things that they're publishing to the internet. That's why front end really is the place where, I think, all the serverless action is happening, and the techniques and technologies that we're using in some ways are the original serverless because much of what we're doing today is this idea of taking pages, generating them statically and putting them at the edge, which means...

To me, the most fundamental serverless technology out there is CDN. They've been around for a long time even they predate a lot of the serverless movement. Yet, they had that critical idea that there is no management to do, that it accelerates you, obviously, because it's putting your content next to your customers. The very technology that this accelerates is the front end. I think what we're about to see is that a lot of what we've been advocating for in the serverless world is really starting to become much of a reality with front-end developers.

Jeremy: I think that actually makes a ton of sense, because whenever I was thinking of serverless, I would always think about the actual computations that were happening behind the scenes, so whether that's something where you're running a Lambda function, and it's pushing it into SQS, and you're connecting to DynamoDB, and you're doing all these different things with the data. A lot of that is still necessary, right? There's a lot of complexity that has to happen behind the scenes in order to make a full-fledged serverless application run.

But I think the funny thing is that a vast majority of the applications you see out there are just a collection of static pages. That's, I mean, with a little bit of API happening in the background, but that shift, that thinking of compute versus static pages, isn't that really where we want serverless to go to is just this super easy precomputed system?

Guillermo:Yeah. I think a lot of people in the industry have over focused their attention on computing on demand, which is what Lambda enables, right? You're literally firing up a VM. It's amazing how easy AWS made it. It's almost like a miracle that you deploy your function so quickly, and it executes so quickly, and it's secure and in a VM sandbox, and their underlying technology is absolutely incredible with FireCracker, but the question that you have to take a step back and ask yourself is that do I really want to be computing so much? Do I want to be burning electricity and competing cycles so much?

This is where when we really sat down to analyze this problem, we realized the vast majority of pages that you visit every day on the internet can be computed once and then globally shared and distributed. So it's like the technique of memoization and functional programming where you compute once and then of course, you want to read it from that intrinsic automatic cache that you get, is different from caching because caching requires a lot of developer effort and thinking. Memoization gets closer to what I envisioned to be the foundation of serverless front end, which is basically static generation where the computation happens once probably as a result of some data pipeline, something that changes, computation happens. HTML is spit out.

That is all, and even in the case Vercel, it's powered by functions too by the way, but the funny thing is that the developer never even thinks about functions. They just think about building pages that then get pushed to the edge and then consumed by visitors. Now, that's not to say that the on-demand use case doesn't have any merit. Not everything can be computed statically. There's lots of pages where you sign in to a dashboard, and you have to query data that could absolutely not be cached. A great example is you log into your bank, and imagine that you were trying to statically generate your dashboard with your bank account balance, but you just want to check that your payment went through for utility.

You're not sure if what you're reading is up to date or not. You would go crazy, right? The movement of front end has also led us to where that dashboard is a single-page application, most likely, that is also served statically from the edge. Then there's JS code that runs on the client's side that then queries that back end. What we found is that front end is really powered by this set of statically computed pages that get downloaded very, very quickly to the device, some of which have data in line with them. This is where the leap of performance and availability just becomes really massive, because you're not going to a server every time you go to your news, your ecommerce, your whatever.

You're just downloading it from your very own city, but even in the case of like, "I may have to make a strong read, not a read that could be stale," you're basically also downloading static content that then runs JavaScript on the client, and then that goes to a server. Then the question becomes, "Who's writing that server, and how much of that server are you writing?" This is like the other big question that is, I think, coming up. We're confronting that in the serverless world is like, "Okay, I have all these amazing primitives to build everything in the world that I could imagine from scratch, but does it make sense to build everything from scratch?"

Does it make sense for you to build your own authentication function with Lambda if you could be reusing a standalone authentication service? That's why this interesting world is coming up where there's a rise of the front end, but then there is a rise of the API economy. What I mean by the API economy is that we have services like Stripe and Twilio and AWS Cognito and Auth0, and MagicLink, and all the services where you're just making some quick API calls sometimes directly from the client side, right?

That is a serverless world that seems so much more in my mind attuned with the actual ideal and the actual, original promise of serverless. I think we are too much. You're giving the example of SQS and Dynamo. We erred too much on always rebuilding from scratch a little bit, so focusing on the front end allows you to reprogram your product strategy in a way, where like, "Okay, I'm going to think about the customer first. I'm going to think about building my back end very, very low in my priority list, right?"

Jeremy: Yeah. No, and I think just this idea of what could be static content versus what needs to be generated dynamically, and I mean, I think of an ecommerce site, for example. Every product page, every category page, every set of recommended products for a particular product or related products, or things like that, all of that stuff could be precomputed and pushed out to the edge. Then the developer never has to think about processing the scale of that, because if you think... I know you used to work at WordPress, right?

That was one of the things you did before. As you know, WordPress loves to query that MySQL database on every single page load.

Guillermo: That is a great example. That is a great example. I think this is the difference, you just nailed it, between ahead of time computation or generation static pages versus just in time. With the just in time model, which is what WordPress is doing every time you go to index.php or blog.php, you're creating all this load. You're sometimes issuing dozens of queries. Something you and I were talking about before the show is that because we're at peak cloud in the amazing power and capacity that we have our fingertips, anything, it seems, could scale today. If you use the new serverless MySQL service, I'm sure Jeff Bezos will sell you enough MySQL on-demand capacity.

With his incredible database engineers, they'll make MySQL scale so much that you might actually make that work, but the question is like, "Do you actually want to? Do you actually want to, first of all, pay all those database bills? " Secondly, it seems like we're increasing the entropy of the universe, and we're producing all this heat and carbon emissions for no reason. The point is the writes that happen are here and there to those product pages, to those blog posts, to those marketing pages.

Somebody at a marketing team might go and say like, "Today, I'm going to edit the headline of this page, or today, we're going to work in a blog post," so you write and the writes are not that frequent. Then that is what creates this asymmetry. If you can take the opportunity to convert that write to the database into HTML when it happens, and then share that super easy HTML stream that gets downloaded from an edge server, you can't compete with that with any other serverless architecture. You can't compete with that from a speed of light perspective, but also, cost wise, you can't compete.

We've talked a lot about over the years about how amazing it is that Lambda gives you 1000 concurrency and whatever, but at the same time, just imagine 1000 VMs in a rack firing up to respond to your blog post. It doesn't seem very appealing. Then from a developer experiences standpoint, this is really what we're enabling with Next.js at a very large scale is that we also don't want people to necessarily have to think or remember to apply caching. This is why we took that idea of the CDN, but now we're really taking it to the next level because CDNs always require this calibration of components where the front-end layer has to coordinate with several layers of caching.

Over the years, I've talked to so many people that have front ends that combine a Redis cache, and then beyond the Redis cache, there is the CDN cache. Then there is all this brittle purging and invalidation of strategies all over the place. Then when you peel all these layers of complexity, you remind yourself, "Oh, I was just working on this simple page that had this simple content." If you think about the ecommerce example that you just talked about, the underlying JSON data structure for that page that renders the ecommerce item, and recommended products and so on, it's very simple.

The idea that you could convert it into HTML, and serve every market in the world with that precomputed HTML is extremely compelling.

Jeremy: Yeah. I mean, and if you think about, like you said, this multiple layers of caching, the last big company that I worked at, every time there was a problem, the engineers were always like, "It's a caching issue. It's just a caching issue," because it was like CDN. Then in front of all of the application servers, there was a Varnish cache, and then there was Memcached in the back and all kinds of these other things that were just layers and layers and layers, and you never knew where it was.

Guillermo: Yes.

Jeremy: It's funny that you mentioned this idea of these infrequent writes to basically massive reads. I think about going back to the WordPress example. There are people who might argue well, but you need to keep your comments up to date. You need to see the freshest comments. Well, I think about any installation of WordPress that's getting comments. Even if you are getting comments at a very rapid click, which is probably unlikely for a WordPress installation, you could just take the write, once that write happens, and then generate the static content or the comments list or whatever it was, and push that back out to the edge.

We're a point now where I feel like the edge has become the only cache we might actually need if we do these things right.

Guillermo: Yes. Yes. Yes. Thanks for using... Also, reminded me of the Varnish example because the cleanest architecture you could think of is one where there is no layers of caching that need to be coordinated, right? That's why I don't think about this new wave of edge as necessarily a cache. For those of you that have done versions of this by hand with S3 and CloudFront, when you put content that gets generated into a bucket, and you know that this bucket is super highly available, and it's super easy to think about how you could invalidate the edge once this specific writes happen to that bucket, you don't really think as a complicated caching scheme.

You think more about it's just a simpler model for reasoning about your architecture. Let's analyze that comment example for a second. Let's say that you're the New York Times, and you have a very high comment throughput. First of all, there's high value comments that they in-line with their page, because they contribute substantially to the narrative. So they're the highlighted comments. There's not going to be lots of highlighted comments. There's going to be maybe five or six. They're almost like an extension of the article at that point.

They're like just like you would want to statically inline the paragraphs, you want to statically inline the highlighted comments. Those, again, are not subject to this strongly consistent read system. If the comment that gets promoted to the highlighted ones takes a second to reflect in the global cache in the global edge, that's totally okay. It's the right trade off to make. Then when you paginate, when you read the long tail of trolls or whoever, you can [inaudible 00:17:16] in the client's side, right?

Jeremy: Right.

Guillermo: You can actually go against your database that has high read throughput without a cache there either. You could go to Dynamo or whatever, and tell, "Give me the very latest comments. At this point, I want a strongly consistent read. Give me your very latest comments." At that point, by the way, also, New York Times performs moderation on their comments, so they're throttling the writes anyways to make a better quality product for their pages. The system just fits like bread and butter, I think.

At the end of the day, we're all in this business of publishing quality content that we want our visitors to consume ideally in that first TCP packet. What I say is that I'm a really big fan of deleting code. I don't want any code to execute anywhere. If I'm going to newyorktimes.com, and I'm from Argentina, in Buenos Aires, I don't want all this Turing complete circuitry to being between me and landing page of an article. I just want to go direct to that HTML stream. Browsers are so good at rendering a stream of HTML already. Think about it, we've deleted all the code.

There's no JS that needs to run on the client side to give me that first paint of the article. Everything has already been precomputed, so there is no function execution in between me and the content. It's this crazy combination of availability, performance, greener for the world, and just overall better.

Jeremy: Right. I want to jump back to your comment on databases because you mentioned DynamoDB there, and I'm a huge proponent of DynamoDB. I love this idea of just having these super fast, single second or single millisecond latency to retrieve back data, but you're still oftentimes querying a database, right? It's still technically a database. You still have to wait for that computation to happen to bring that back. I like things like DAX, being able to put a cache in front of that, but I also find that if you do it right, you can cache whatever that GET request is or that query to DynamoDB.

You can even cache that at the edge, right, if it's responding back from an API call.

Guillermo: Totally.

Jeremy: The point that I wanted to make was people using Aurora Serverless or RDS or whatever it is, and using these relational databases, there is absolutely a need for relational databases, right? You can't run analytics on DynamoDB.

Guillermo: Totally.

Jeremy: ... these other BI tools and things like that. This is something I talked about actually a while ago, where if your front-end customer, if the person accessing those comments, for example, they don't need to sort them in 10 different ways. They don't need to join them in a bunch of different ways. They don't need the power of that. What I found is I've been able to eliminate almost all of the clusters of databases that I've had, and use something as simple as Aurora Serverless with two or four ACUs, so really, really small by simply replicating data out of DynamoDB into that.

Now, I can serve massive load with DynamoDB, but then when I need to write those queries, or even run reports, and I'm not having hundreds of thousands of people hitting my reporting site or my analytic site, that my ability now to have a smaller footprint there is...

Guillermo: Totally. Totally.

Jeremy: Again, I think this is what we're trying to do with serverless, right? We're just trying to reduce the amount of footprint and the amount of extra computing power that you need.

Guillermo: Yep. Yep. Yeah, I think you nailed it, because you're going at the core of the database problem and data access problem, which is understanding how the data is being accessed. You mentioned something there, which is like, "What is my throughput of queries for analytics and complex joins?" Like, "Let's find the top 10 most active commenters on my website and things like that." It's very rare. It fits very much systems that can respond more slowly, that it can take their time to scale up and scale down. Then you have the other layer that you touched on, which is, again, it doesn't matter how fast my database is, how real time, how scale, how serverless if my customer just wants a bunch of HTML of a certain set of comments in that case with the example that I gave of maybe the first five or the most voted ones and so on.

I think what's important for the developer to always think about is that think about that access pattern of your data from a read perspective, from a write perspective. I will say also not just the ratio of volume, but also the consistency. That's, I think, what's really important as well is that when I go to a breaking news page on COVID-19, and I work at New York Times, and I'm going to push an edit to it, I can afford for one second to pass before I can make a strongly consistent read of the typo that I just fixed.

I will have wanted that within that second to all the reads to go uninterrupted globally in the world, because this is so much more important for that smooth line of low latency access and highly available access to my story of COVID-19 that for everybody in the world to be able to read my writes in a linearizable fashion. I can fix my typo, and I can say, "It's 99% likely that within a second, everyone will be able to read my write." Now with Dynamo, they give you other characteristics. They tell you, "Well, you make it right, and as soon as you query that same API that you're servicing Dynamo from, you can immediately read your write."

But then everybody in the world has to go to that specific Dynamo cluster, right?

Jeremy: Right.

Guillermo: Again, what apps or websites need that? You have to really think hard about that because it's not going to be ecommerce, or at least for most of that front end of ecommerce. You might have some specific things where you really, really want to read your writes in that very low latency fashion. That might be the case. For example, when you go to the logged-in section of your website and say like, "Give me my latest five recent orders," you don't want that to be a weird stream of static generation that every time an order happens, you're custom making a static page for the logged-in administrator.

Then you start adding complexity to your permission system, and everything becomes chaos. That's why I said when you think about pages that need more granular data access, more stronger consistency, that have more complicated permission systems, more complicated queries, then that's better served still by a static page, but that runs JavaScript on the client that can query those APIs. Just to give you that idea of why the front-end economy relates to the API economy, now, if we continue this example of the ecommerce website, maybe that ecommerce API will be a headless ecommerce API.

Shopify and BigCommerce and WooCommerce and many others are now giving you very rich GraphQL APIs and REST APIs for querying this type of data as well. You even have to wonder, "If I'm making this really slick new ecommerce experience," maybe you're starting a new microsite. Maybe you're going after VR eCommerce. Maybe you're thinking of reinventing your front-end layer. The question becomes, "Am I going to be writing a serverless API with 10 queues, one million Lambdas, four Dynamo clusters, DAX, Aurora Replication, if I could have bought an API from the shelf?"

Guillermo: That's, I think, a question that a lot of people will be facing in the coming years.

Jeremy: You made the point about the top 10 comments. This is where this is something, I think, people don't get about serverless. I'm not trying to be like, "I understand it better than anybody else." It's just for me, some of these things or at least for me, the way I feel about serverless is a lot of it has to do with asynchronous operations. It's not responding immediately to a request, and that is one way in which we can get the latency down. The edge is one piece of that, but with your top 10 commenter thing, that's the kind of thing where, again, "Do I maintain a database cluster that has 50 instances running so that I can calculate on the fly who the top 10 commenters are, or is that something I could delay and maybe run every minute if it was really added to be that much and just run that off of that small cluster and then push that to a cache somewhere or to the edge somewhere so that that is pre-calculated?"

Guillermo: Totally.

Jeremy: Anyways, I love the idea of precalculation. I just think that this idea of being able to access stuff as quickly as possible is just insane. The point that I want to make too, because this is something I noticed, I was on the Vercel site the other day. I went down. I was looking at all of your edge locations, and you've got a really great page on the site that shows you all the different edge locations. It shows you where you are assuming based on IP address or whatever. Then it gives you the ping and the latency to each one of these edge locations. I think the closest one to me...

I'm up in Massachusetts in the US. The one closest to me, I think, was in Montreal, and it was like 25 milliseconds was the latency or something like that. Alright, here's the problem with calculations or computations. They have to run somewhere. You're not going to necessarily run your computations at the edge, so that's another huge disadvantage is that if you're running your DynamoDB cluster and all your Lambda functions in us-east-1, and you're trying to access it from Brazil or from wherever-

Guillermo: Absolutely.

Jeremy: ... then there's going to be a huge delay in latency out there.

Guillermo: Trip. The only time where you can justify that trip is where you need to very strictly read the writes that happened at that origin, right? I can't tell you I'm going to cache your latest five stock orders or your bank account balance. I'm going to cache it in Brazil so that Brazil customers have a better time, right? No, I'm just going to give you a static page that then gives you a skeleton placeholder of your balance, and then goes and fetches it from wherever the brain is of that bookkeeping database. That goes at the heart of like, again, the vast majority of pages on the internet should already be within 25 milliseconds of you.

You can accomplish this with layers of caching, but things get really tricky because sometimes, that cache gets a miss, and you have to go to the origin at that point. You were talking about how there's been a rise of the usage of functions for background processing. Functions are notoriously... They have a cold problem unless you're provisioning or like things get even more complicated, so you have to think about what happens when you go to the Montreal edge, and you miss. Again, we host customers with cardinality of pages in the orders of millions. Then we also see millions of deploys per week as well.

That's why I went to that idea of think about static as something that you put in a bucket, and Vercel is the process that automates that process. Don't think about like, "Well, there's always a server there that needs to be hit," because you nailed it. When you're doing this kind of background computation as a result of events, as a result of your data changing, we now generate a static HTML, and now we can put it in a highly available bucket that we're also able to distribute around the world. From a perspective of performance and availability, there's this idea that now, every time you go to the page, no computation ever happens.

We're just manipulating very basic static objects. The fact that you can run incredibly large websites with just this primitives is very reassuring from the DevOps perspective. Every time I talk to people, and I tell them that whatever they invested in that was a server could have been static, there's always this incredible desire to go toward that kind of place. Even if AWS has done this incredible job where Lambdas have 99.99% SLA, and every system that AWS monitors is automatic and they have an incredible track record for reliability, you still want to pick the architecture that has basically the fewest number of moving parts.

You can think of code as a moving machine.

Jeremy: When I think your point about latency, I mean, you mentioned you're not going to cache somebody's banking records all across the world in case they happen to log in. I happen to be in Europe. I want my banking records immediately. There is a certain amount of latency that obviously is acceptable depending on what it is that you're doing. I don't know if you've given this example before. I think we've talked about it before in the show. For every 100 milliseconds of added latency, Amazon.com loses 1% of their sales or something like that.

Latency is a much more important number that I think a lot of people give credit to.

Guillermo: Yes. Let's take that example, that exact figure. There's this famous internal memo from Google about numbers that every developer should know, and one of the key numbers is the coast to coast Netherlands to California. It's somewhere around the ballpark of 150 milliseconds just to do the complete round trip. We've improved our routing networks so much that we've optimized California to the Netherlands so much that it's just close to the physical limit. When you think about incremental static generation and putting pages next to customers, you already have there...

You mentioned you from Montreal, you said 25 milliseconds. We've already have a leg up of 125 milliseconds. That is unsurmountable for the traditional serverfull or function plus CDN case that event sometimes has to go to origin. It's absolutely unsurmountable. Then we don't stop there, because we make massive investments also in the Next.js layer, for example, to make sure that that content also when received by the web browser renders as soon as possible. We have several integrations with Lighthouse. Now, we ship the integration with Chrome web vitals. That allows the developers to measure the time to the first content full paint.

This is where I want to stress that I want to delete all the code from the world. I don't mean no code in the webflow sense. I mean, no code in that... If you have a stream of HTML coming in from the Netherlands, and it's some product that you want to buy, then when the browser starts interpreting it, if it has to boot up into the V8 VM to start executing JS code, then you're going to waste another 100 milliseconds for sure. 100 milliseconds, V8 loading JS and starting to interpret it in 100 milliseconds, that sounds like Nirvana. You know what I'm talking about?

If everyone goes to their terminals right now and they run "time npm --version", you can see the V8 warmup time in real time. You'll see it. I'm going to run it while we're talking now just to get my own measurements here.

Jeremy: Sure.

Guillermo: "time npm --version". What's happening here? We're executing node, which is booting up V8, which is executing a bunch of JS. That code hit that I just performed in my machine, which is also a V8 because it's executing the stream that we're doing, 812 milliseconds. That just sounds insane, right? 812 milliseconds for V8 to boot up, MPM's code to get parsed and compiled and returned back to extend the route. I run it again. The universe seems hotter, is 174 milliseconds. Still awful, right?

This is why I don't want to run code. I don't want to run it at the edge. I don't want to run it in a worker. I don't want to run it in a function. I want the function to have been executed at some point in the life cycle, but not in between my customer and the page. Then I want to have as little JS as possible also when that page runs in the web browser. This is why we started with why Vercel is focusing so much in the front end. I suspect that a lot of your audience, my audience, also sometimes over indexes in measuring back end, measuring Dynamo latency, measuring ELB latency, and then they forget that there's this universe of complexity that we're shipping to the web browser, that is adding that 100 milliseconds that you just talked about times 10.

It's not even a couple 100 milliseconds like what we just talked about with that MPM version exercise is saying, "The best case scenario of blocking your entire page on JS booting up, we're talking about downloading the JS." Hopefully it's a hit from a cache. Hopefully it's a hit from a cache on the local computer, which by the way, a fantastic essay just came out that we all know especially all of us that have worked extensively with AWS, the disks are pretty slow. IOPS are expensive, but also the distinction between a hard drive and a SSD, and what Google Cloud calls the local SSD like the one that's wire right into your instance.

We're talking about lots of milliseconds there, even in your web browser, retrieving a cached CSS asset and JS asset from the local computer. This is why we're so obsessed also about edge precomputation. We don't even trust, and we have the data to back it, that even if you have a stable JS resource and CSS resources that's been cached from the device, we don't even think that we can afford to revive that asset, bring it alive, and execute it very quickly. This is why I'm now endeavoring towards getting rid of computation, because if I can give my customer a stream of HTML, and inline CSS for the critical parts of that page, then what I end up is with something that can actually rival the performance of amazon.com when it comes to their own products, because what do I get?

I get from the edge. I get the precise image for the size of the device that I'm serving for the product that I want to buy. I get the styling for the buy button, which is what I want my customer to press. I get that first paint in 100 milliseconds. We've altogether removed JS from the equation. JS is not being executed at the edge, is not being even executed by the page on the local device, so we can make that dream happen of your product is in front of your user's eyes in 100 milliseconds.

That's doable even for 2G connections or legacy Android devices. It's totally possible. It's just that we really need to shift our thinking and our obsession from back-end architectures and AWS charts connecting one million pieces into now thinking about what we're serving to our users. That is a big shift that's happening.

Jeremy: I love that idea because I think there's been a movement lately of sites that run completely JS-free. You do not need JavaScript to make your dropdown menu work.

Guillermo: Absolutely.

Jeremy: You don't need it to place an order. You don't need it to do a lot of these things. It's funny, but they invented this thing called HTML and CSS that allow you to do a lot of really interesting interactivity without using JavaScript. Now, obviously, Vue.js and React and all these other things add really cool features, and if you have it enabled and once you've got it loaded on your machine and interacting via the APIs, that's great.

Guillermo: That's why with Next.js, we're giving you that, but we have to find that balance. You're right, for the first paint of most of the pages that you visit, there's very little need for JS, but even when it's needed, it has to be consumed in small amounts. This is why one of the big hits that we had was that we eliminated the idea that you have to configure bundler and webpack when you use Next.js, and when you produce these pages, because that's when things start to go really bad.

We want the bundler to be so ingrained into the system. Let's say that that buy button, when you press it, you do want to use some JS because you think that, for example, just like Stripe does with their credit card modal, you think that you're a PM at ecommerce company, and you think, "If I transition them to another page, and if I didn't memorize their credit card details, and if I didn't auto complete their credit card and whatever, we're going to lose sales, right?" That's a valid argument, but that's the point is that JS needs to load just in time for that specific interaction that's going to happen.

We have to be smart so that we bundle JS in minimal amounts only for the interaction that's likely to happen, which in this case is buy. Maybe as you start scrolling, it's loading product recommendations or loading the carousel of related products. That's why static generation also always gets combined with loading strategic amounts of code on the client's side with JavaScript so that you can bring interactive experiences. Another example is, I believe, I've seen that Amazon loads some 3D navigations of some of their products. You don't want that bundle or that feature, which is this long tail feature that maybe some customers use for that to be blocking what we call the time to interactive of the buy button.

That's, by the way, what's happening to every website of every visitor of most every website in the world today is that the bundler is... We talked about that MPM example that my computer took 800 milliseconds. What's happening there, for most of the seasoned JS optimizers in the audience, you probably know this, is that V8 is crunching to vast amounts of code that have nothing to do with rendering the version. MPM is not being bundled in such a way that all we did was execute, return version 9.2.3.

Instead, we're probably doing lots and lots of unnecessary stuff before. This is what happens with the web at scale today is that in order for that buy button to become interactive, we're still downloading the code for the 3D navigation carousel. Again, this is what, I think, makes it really, really compelling for companies to stop thinking so much about full stack development, and instead focus most of their attention to this kind of problem, start measuring what actually happens when their products are being delivered to users.

Jeremy: Alright, so we talked a lot about front end. I think there are so many optimizations that could be made there. There's so many cool things that we can do like the pre-rendering, the pushing stuff to the edge, getting those latencies down, not trying to load stuff from local cache even. I mean, there are a million things to think about there. In a perfect world, we could do everything we wanted to do that way, but the reality is like you said, sometimes we have to load, pre-load credit card data or something like that for a user.

Obviously, sometimes we're going to have to make an API call. We're going to have to access the database or DynamoDB, but as you said, how much of that calculation do we want to be doing ourselves? How much of it can we do directly from the front end? Vercel's got the serverless function capability, and I think you had mentioned this as a use case where maybe you're doing authentication and you need to send in some private key into Auth0 or something like that to trigger the workflow.

Guillermo: Correct.

Jeremy: That's something where, I think, you'd advocate build a really simple function that just triggers that thing there, but don't do all that calculation yourself.

Guillermo: Correct. Yes. Basically, I think the... You mentioned... Okay, how do I execute my functions? Do I execute them as a result of pipelines like SQS, SNS, whatever, or do I execute them just in time? Both of them are absolutely amazing, right? The executing just in time case throughout the years has had ups and downs, let's call them, because we know that P99, you have to be smart about the size of your function. You might want to provision your function for very, very, very amazing P99. In some ways, I feel like the background async computation usage of functions has been the home run, and the just in time has been more of a incremental adoption.

It's definitely going to happen, but it's been more of an incremental path for a lot of people. The one that we've found that is awesome in the just in time space is giving the developer team, especially the front-end developer team, a way of gluing services together. A great example would be, what you mentioned, is, "I have to create a function that talks to API systems that already exist, because I want to aggregate a bunch of API calls together, because I want to talk to Stripe, for example, in a private manner with an authentication token from the user."

Let's say they want to commit a charge. Stripe gives you products that you can invoke directly from the front end, but you start hitting some limits at some point. You want to customize their UI. You want to do something more fancy. Maybe you have some recurring charge. This is where, I think, the world of serverless infrastructure is getting really neat, because we don't have to reinvent Stripe from scratch. We don't have to reinvent Auth0 from scratch or Cognito or whatever. We can now use functions as a way of mediating between the front end and the services that already exist.

That's not to say that you can't use this function to talk to Dynamo directly, for example, right? That's still a use case that'll exist, but I think what's more compelling for a lot of people is to not necessarily have to reinvent the wheel and be smart about their investment into these functions, because the difference broadly between functions and also in pre-generated content is that functions now have an on demand cost. You have to be careful about their availability as well.

Now, your uptime for those functions depends very much on the uptime of the services that you depend on. We talked a lot about, Okay, how do you even ascertain that those services that you're depending on are functioning correctly, and why it's so much more appealing for you to be interfacing with a high level API like Auth series for the users table rather than using Dynamo as your users table, right? Then you start worrying a lot about rate limiting. You start worrying a lot about, we talked about, Dynamo auto scales until the end of time.

But do you actually want to let people have this function that is invoked on demand by themselves endlessly, and that now you scale with them for both Dynamo and the function? Things can get hairy, so function I see as this important tool in the tool set that has to be used when it's necessary to use, necessary from a data consistency perspective, as we talked about. Another need by the way that's very strong, no pun intended, is when we talk about precomputation, we're talking about vast categories of pages that are public in nature.

We gave that New York Times example, that amazon.com example. Of course, you would want to push those pages to the edge. They're all public, and they're all shared by lots of users. Even if they have some, what I call, page variants, which is something that we're incorporating into Next.js, which is like, "This page will be in a certain language for the Netherlands, and this page will have a built-in promotion for Texas." Those are what I call variants of pages. But what they all share in common is that they all address vast numbers of users, and that from a security perspective, they don't contain anything that is sensitive.

Now, let's think about product recommendations that are user-personalized. Now, let's think about your order history. Now, let's think about your credit card, inputting your credit card and whatnot. Those are all things that no longer fit that neat world of precomputation from many perspectives, one of them being that it becomes prohibited from the explosion of combinations of precomputations that it could make, but also security.

Jeremy: Sure.

Guillermo: Again, I don't want to cache personally identifiable information at the edge and from a data strong consistency perspective. Again, I don't want to go to stale cache for performance and availability reasons. I want to go directly to the data source. That's where functions come in. Now, some of those functions will be written by your team. Some of those functions will be assisted by other teams, because those functions can call to all those other functions, and some can even go directly from the front end.

Those three are all amazing, legitimate use cases that get enabled by executing code on the client's side.

Jeremy: Right, and I think that when you start talking about Cognito, and you start talking about Lambda functions and DynamoDB, there are a lot of primitives that exists in the cloud right now that you can stitch together. As you said, functions are great for gluing these things together. There's other ways to glue these things together. Even though there are these amazing primitives out there, though, it doesn't mean that building a serverless back end is easy.

Guillermo: Correct. Correct. I think what's amazing about serverless is that it's exposed the essential complexity of the problem. It stopped developers from sweeping hacks under the rug. The best example that, I think, from this is you can no longer do async computation as a result of invoking a function that easily anymore. In the world of no JS, I would see a lot of customers just put lots of state in a process. When they respond, they continue doing things behind the scenes in that same process.

Functions have altogether made this impossible, but for a great reason, right? They were exposing, "Hey, that side effect that you were computing, you should have not been doing in that same process. You should have used a primitive like a queue to put your side effect, your event there, and then use other functions that respond to that event. Then it's so smart that they also put the developer into this state of success of saying, "Well, if it's a side effect that now can no longer be retried by the client," because the client is executing the function.

The function is responding with 200, and it queued the side effect, so there's no reason for the client to retry. Now, the side effect is loose in the universe of computation. That means that we need a system that can retry it because we want that side effect to run to fruition. Now, it forces you to put that into a queue, and queue can retry and then eventually also fail and go into a dead letter queue. So now just like going through all this in my head, I'm going crazy about the amount of complexity.

But here's the thing, and this is why I love serverless. That was to begin with the essential complexity that had to be managed to begin with. What we were doing before was chaos, was side effects that maybe sometimes run correctly and sometimes not, was unscalable systems and so on and so forth, but it is a complicated world.

Jeremy: Yeah, I mean, and I think you mentioned this idea of scalability. That's one of those original promises of serverless like just everything can scale infinitely. You know what I mean?

Guillermo: Yeah.

Jeremy: I think you mentioned the DynamoDB can just keep scaling up, but there are limits. Eventually, that does stop. There's soft limits in place for Lambda functions, and of course, there's wallet limits, I think, for anybody out there who is eventually going to say, "This is more than I want it to be." You mentioned on-demand serving of data versus the static piece of things. You had shared an example with me before about a site that was mostly front-end serverless, had a little bit of back-end serverless, and even though it scaled up, I think it was 10s of millions of hits or something like that, that it was able to scale gracefully.

I'd love for you to tell that story, because I think that is the perfect example of how we should be thinking about building serverless applications, because even if you think your application will scale infinitely, there are a lot of reasons why it will not.

Guillermo: Yeah. I love to give this example because on one hand, we deal with customers that have very traditional websites that fit under this incremental static generation umbrella. Our most recent we onboarded a couple weeks ago is barstoolsports.com. They fit under this category of the New York Times website that we're talking about. Their business is assisted by subscriptions and ads. They want those conversions to happen very quickly. Frankly, they want their customers to get their news as soon as possible. This was a larger team.

They were proficient. They had already chosen Next.js, but I'd like to give this other example that happened that same week of a meme that went absolutely viral throughout the entire internet, and where the promise of serverless really came to fruition in a very... I mean, again, this meme is weird. It's called billclintonswag.com. What's incredible about this meme is that you would go to Twitter. We noticed a spike on our edge network because we get alerted when there is abnormalities, and this was a very large abnormality that normally we would confuse it even with an attack, because it was like, "Holy Moly."

We went from this little thing went like this, and which went completely vertical. This meme is very basic. It's a static page that presents a photo of Bill Clinton holding three albums. Then the visitor can auto complete. This is where clientside JS comes in, auto complete and find their three favorite albums. Billclintonswag.com, it should obviously still be around, but the meme has faded a little bit like the serverless spike patterns. That's when computations are happening on demand.

Going back to the glue pattern, the one individual developer that... By the way, again, this is a meme that is quite controversial to begin with, so that's why he got all this traffic. Think about this. You can create with one person a thing that dwarfs the traffic of most websites on the internet for a very short amount of time in this case. We're talking about 10s of millions of hits per day. It is scaled infinitely, and it was created by one person. But here's the thing, he could afford this because he designed it with this static first mindset.

There was no function that was getting executed when people were first going to the website. Then he didn't write a database of records. He was using the Last.fm API. Now, guess what, the Last.fm API for whatever reason was not consumable from the front end directly. What he did is he created a serverless function in Python, also hosted on our platform, with 128 megabytes of memory, so very lean, also caching at the edge. You mentioned this earlier, by the way, but what he discovered was that these meme makers all like the same albums.

We're talking about a very large number, so we're talking about trending topic on Twitter large numbers. They were all auto complete into the same things. You know what I mean, like Beatles? Well, actually not that one, but like, I don't know, who's the famous new... Kendrick Lamar. You know what I mean? K-E-N. The function was lean. The infrastructure was actually Last.fm. He also added his own caching at the edge of... The query parameter was K-E-N for Kendrick Lamar, so he was responding from his serverless function with cache control.

We're caching that at the edge as well. Meaning that you are getting the suggestions for albums for Kendrick Lamar in milliseconds. It was really amazing to see how this combination of what a lot of people call JAMstack. The first paint is a static. It affordably went to hundreds of millions of hits that week. Then when he needed to do a little computation, he was careful to pick the best provider for his data source that exposed an API to begin with. Then he also cached the computation of that.

It was honestly amazing because with these two pages, he navigated the entire world. If you think about doing this with the old patterns that we had, he would have collapsed several times over, or have been incredibly expensive like we talked about, because if he had made the landing page a function, it would have been a pretty hefty bill. If he had made it a server, it would have not scaled very quickly, not very easily. Frankly, the ease at which he also put this together was just a great validation as well.

He happened to be a Python developer, so he chose Python for his functions. A lot of our customers use Next.js, where index.js is your react page that gets built statically, and then you have your functions on the side. It was like a micro example of how... I think we're going to see this a lot at scale in the future, where very small teams that use primitives in a very smart way for front end can now go from experiment to world takeover. Again, this is a silly meme, but this is the same thing that can happen to your product.

This is the same thing can happen to your news story. This is the same thing that can happen to your ecommerce site. In fact, this is also a micro story of that because he was selling some product as well. Going back to the 100 millisecond thing, he created a delightful experience that was very fast to load. Then he was selling something as well. This is the whole story of the internet. Publish fast. Make it accessible to everybody in the world. Have a great success story, where your back and your front, everything is actually working, and it's not collapsing.

Then find that fitness function that allows you to evolve your business. Like I mentioned, for a lot of the publishers of stories that have chosen the Vercel edge, for a lot of them, it's optimizing so that they can render an ad or a paywall or whatever it is that they need to render to make money. What you need to think about is like, "Okay, what..." You mentioned that 100-millisecond rule. There's so much greatness that went into the development of amazon.com, but for me, that was one of the biggest ones is the realization of the idea of latency in correlation with business success.

Jeremy: Yes.

Guillermo: When we talk about serverless primitives, you can't just over index on wanting serverless if it's not really enabling that business success for you. When you think about 100 millisecond, 100 millisecond is just what it takes for a function to cold boot in a really, really good case, right? It's probably more than that. I think this is what the big lesson is, is that you have to really shift around your mindset to that customer finding you whether it's... in this case it's Twitter or LinkedIn or ads that you're buying, and then how quickly and effectively can you go to that page?

Jeremy: Yeah. It's funny, I can verify the first time a few years ago that one of my articles made it to Hacker News, it immediately killed my WordPress installation and knocked that down. It's funny because I feel there are so many people now talking serverless first, serverless first, serverless first. I love that idea. Yes, serverless first, but static first is a really, really interesting twist on that, because I think that... I mean, if you just think about that for a second, it's like, "What can I pre compute? What can I put on the edge? How can I reduce the number of computation that I need to do?"

Then anything I have to go beyond that, how do I build the infrastructure as cheaply as possible without reinventing the wheel?

Guillermo: Yeah, and also what's more serverless, right?

Jeremy: Right.

Guillermo: That's a big thing.

Jeremy: Very good point. Very good point. There are a bunch of other things that I had wanted to talk to you about, but we've already spent quite a bit of time. So the last thing I want to touch on simply because we're in this new COVID-19 world, and we've got all these people working from home, and we've had all these big companies tell us in the last few years like, "Oh, working from home doesn't work." I know that Vercel is a huge fan of this. I know you came from several companies that were also a fan of this, so I'd love to get your take on distributed teams, and how effective those can be.

Guillermo: I think we're very lucky to be a distributed-friendly company, remote-friendly company before COVID happened. We're so well positioned to provide the best tooling possible to people that now are in this new same position. We dog food our product very extensively. I love what you just said about being on top of Hacker News because yesterday, Deno, the new JavaScript runtime was at the very top of Hacker News in a very top way. I think it's one of the most upvoted things on Hacker News in a long, long time.

Their website is not a Deno server. Their website is a Next.js website precomputed at the edge, and hosted in Vercel. How did they choose this? How did they start using Next.js, learn it, they created their website and import it into Vercel. Well, we never talked to them. It just happened. It's because we're out there designing this distributed systems of collaboration, especially with the open source community, people are using Vercel every day without even noticing it, because they go to GitHub. They push to a repo, and guess what?

That repo is connected to Vercel, and it automatically builds and deploys your website at the edge. Then you get back a URL. We're enabling this workflow for teams all over the world to collaborate in some cases without even the need to teach each other anything. Somebody comes in a team that is using GitHub, installs the Vercel app. Now all of a sudden, their website is getting built for every push, and you get back your deploy URL. Now, you can share that deploy URL on Slack, on Zoom.

You can give that deploy URL to your end-to-end testing service, so now we created this new world of it's a distributed workflow where the primitive is the URL to the front end that you're building. It's inherently incredibly shareable. It's obviously fast for everybody in the team, right? This is something that we always obsessed about a lot was like, "Hey, we're building our website with Vercel. It better be fast for everybody in the team, right?"

Our own team is in Japan, and it's in China behind the firewall. That is an amazing fitness function, by the way, I have to say, because if you try to use the internet from China, you better be very, very lean, static, precomputed and cached right outside of China, because a lot of people escaped that firewall through Singapore or Hong Kong or Tokyo. We were so well positioned for this world. Now, I'm not celebrating COVID, but I have to say the week that COVID was... The stock market was imploding.

We saw the biggest peak in creation... We measured this particular metric which is builds, builds of this pages. We have built concurrency. That is quite a sophisticated system, and builds obviously take a lot of CPU power, so we're constantly monitoring it and so on, but that week of COVID mayhem was our largest week in build concurrency. That same week, the CEO of Slack shared that he also saw his biggest week in a number of concurrent connections to Slack, number of WebSocket connections or whatever concurrently to their servers.

So really, what's happening is not a recession like a lot of people think, is really an acceleration of teams now being exposed to the right primitives to publish pages to the internet faster, to build pages faster, to collaborate faster, to collaborate from their homes. My daughter just interrupted a meeting, but that's okay. Soon she'll learn what I'm talking about. There's all these advantages to this new world. I'm excited about it.

I'm excited about giving our tools to everybody that needs them that can find themselves in this world that it's harder to collaborate in just because you're not used to it. It's not hard to be productive in a distributed manner and a remote matter if you install the right tools.

Jeremy: Totally agree. Awesome. Well, listen, Guillermo, thank you so much for spending the time with me and talking about front-end serverless, because I think this is something that is not on a lot of people's radar. They're not thinking static first, and it should be static first. Serverless second maybe is the new way we should think of it. Anyways, if people want to get a hold to you or find out more about what you're doing and what Vercel is doing, how do they do that?

Guillermo: For sure. About me, on Twitter, @RauchG is my handle. I talk a lot about these topics, so it may be entertaining. Second, a lot of people asked, "Okay, I love these abstract ideas that you talked about, static first, whatever. How do I put it into practice?" Next.js is our open source framework. If you go to nextjs.org/learn, we'll walk you through creating your first page of this manner. Then a lot of you are more advanced in this trajectory. They already have... They're already using Next.js, already using Gatsby, Vue.

They're already making single-page applications. They're making... They're using static site generators, so they can deploy them to Vercel. I think what really sets Vercel apart is this global distribution of pages that is faster for the customer. Builds are much faster than you trying to do this yourself. I talk a lot with customers that they've created versions of this with complicated pipelines from GitHub to Circle to S3 to CloudFront.

Sometimes they purse. Sometimes they forget a purse, sometimes they cache their CDN. Sometimes they use their CDN as a dump pipe. What I tell them is try importing your repo into Vercel. It might simplify your life quite significantly. It might speed up your visitors quite significantly. That's the three things I recommend.

Jeremy: Awesome. Alright. Well, I will put all that into the show notes. Thanks again, Guillermo.

Guillermo: Thank you so much.

THIS EPISODE IS SPONSORED BY: Amazon Web Services (Serverless-First Function May 21 & 28, 2020) and Stackery

What is Serverless Chats?

Serverless Chats is a podcast that geeks out on everything serverless. Join Jeremy Daly and Rebecca Marshburn as they chat with a special guest each week.