Episode #11: Serverless Security in the Real World with Hillel Solow

Episode #11: Serverless Security in the Real World with Hillel SolowEpisode #11: Serverless Security in the Real World with Hillel Solow

00:00 42:57

Jeremy chats with Hillel Solow about what's different with security in serverless, which attack vectors are actually being targeted, and how we can significantly increase our security posture with good coding practices.

Show Notes

About Hillel Solow

Hillel is passionate about security innovation, and is driving product innovation and security at Protego. Prior to co-founding Protego, he was CTO in Cisco’s IoT Security Group, where he worked on innovative security solutions for new technology markets.

Twitter: @hsolow
Blog: protego.io/blog
Protego: protego.io
Twitter: @ProtegoLabs
LinkedIn: https://il.linkedin.com/in/hillelsolow

Transcript

Jeremy: Hi, everyone. I'm Jeremy Daly, and you're listening to Serverless Chats. This week, I'm chatting with Hillel Solow. Hi Hillel! Thanks for joining me.

Hillel: Hi, Jeremy. Thanks so much. It’s a real honor to be here.

Jeremy: So you're the co-founder and CTO at Protego. So why don't you tell all of our listeners a little bit about your background and what Protego is up to?

Hillel: Sure. Thanks. So Protego is a security company focused on serverless security? We've been around for a couple of years. Prior to that, I had spent about 20 years in security at companies like Cisco and various other companies. And we really started Protego because we saw that serverless and cloud native was going to really usher in a wave of changes in how we deploy applications and build applications. And that was really going to upend a lot of what we do in security. And so we really focused on trying to ground up understand what is it about serverless and cloud native applications that changes? What's the best way to secure them? What do people worry about? And how do we help them solve those problems?

Jeremy: Awesome. So I wanted to talk to you about serverless security in the real world, and by that I mean the things we are actually seeing. Because I think that there's a lot of misinformation that is out there. And I know there's a lot of security companies starting to focus on serverless and cloud native. And every once in awhile we here about these security breaches in the news, so I think this is just a good opportunity for us to talk about what we really have to worry about. I mean, obviously want to have a good security posture for whatever we do in the cloud. But maybe we could start by discussing a recent, sort of, high profile, or highly publicized, successful attack like Capital One, for example. So I know this wasn't serverless related, but what are your overall thoughts on that attack? Does that scare people when they see something like the Capital One thing?

Hillel: Yeah, it is interesting because I think Capital One has done a really great job of leaning into the cloud and taking advantage not just from a development and deployment perspective, but from a security perspective of everything that cloud can offer. So it's a bit unfortunate now that they're going to get hit on the head here. I don't think it's a result of them moving to the cloud. To a large degree, this kind of attack that we’re looking at, it's kind of similar to the other kinds of Equifax attacks in some ways. You know, it's some misconfiguration and some access to an EC2 machine machine that then had access to some S3 buckets that shouldn't have had access. So those kinds of things, you know, obviously they can happen across any kind of infrastructure. The fact the Capital One is leveraging, you know, Amazon to do a lot of the securing of the infrastructure below what they're doing is great. It does highlight the fact that at the end of the day, though, we're all responsible for our own applications. And Amazon says that you know, day and night. And so for us to focus on the things that you know, that we deploy our business logic, that's really important. It’s important, obviously for Capital One, and I think you know, they do a great job of it for the most part here, and obviously they're going to have to improve. But I think for all of us, it's a lesson in how careful we need to be about applications security and about how we're using the cloud. Because just because Amazon is securing the underlying platform might lead us to believe that we don't have to deal with security. And it’s obviously not true.

Jeremy: Yeah, definitely. All right, so let's talk about the first aspect of this, because like I said earlier, I think there’s misinformation out there about what it means to be serverless and what your security posture becomes once you go serverless or even just move to the cloud in general. So there's this concept of FUD, right? This fear, uncertainty and doubt that you tend to see a lot of people and companies using to maybe “exaggerate” the risks. And I know your team is great at sort of shutting down the FUD, right, just giving people real, honest answers. Which is really refreshing. So maybe we can jump into that, and just give me your thoughts on how you feel about — you know, this idea of people scaring people, by spreading misinformation about the security of serverless.

Hillel: Yeah, look, I don't want to discount the value of fear. You know, I think if you're a security company, it's nice to be selling a product that solves the problem people are really worried about, and that's obviously important. But I think this notion of us becoming hysterical about things that aren't really issues is something we need to avoid. And specifically for us, as we’ve looked at serverless and how it changes security, I think one thing is really clear. Serverless is not less secure than other things. I think, you know, in a lot of ways, serverless applications stand to be the most secure applications that organizations deploy for a bunch of interesting reasons. They do raise some interesting challenges in terms of where do I put the stuff that I used to run on machines or where do I put things that don't scale the way that I want them to scale in the serverless world and things like that. And obviously they do create different types of opportunities for attackers. They do change some of ways attackers are moving, but overall I mean, my strong belief is if you're making the move to serverless, you're going to get a net win on security. You just need to take advantage of a lot of what's out there. And for us, you know, I'll talk a little bit later about what we do, but in particular, a lot of what we focused on is: hey, what happens when you move to serverless and cloud native? What new opportunities are there? And how do we leverage those for security in a way that maybe in the past was challenging?

Jeremy: Yeah, and I think the other piece of that, too, is that you have developers that are now much closer to the stack. And I've said this a million times, but this always makes me a little bit nervous because there are some new things that a developer might be responsible for when deploying and securing your application code in serverless. And like you said, the infrastructure security provided by the cloud providers already gives you this great foundation. But, if you don't have those skillsets or you're just not used to implementing IAM policies because maybe they were handled by Ops people or there were tools like WAFs and things like that, that gets a little bit scary for me anyways, when I see what some of the younger or junior developers do. And certainly that’s part of their cloud learning experience, but without proper controls in place, it does open up risks. So let’s talk a little bit more about what's different with serverless security versus more traditional security systems. And one of those things would be, speaking of IAM roles, this move to very, very small fully managed compute units, as opposed to the security of an entire machine or maybe a container where you have full access to the execution environment. So what's the difference there?

Hillel: Yeah, sure. So I think first I’ll state to your earlier comment, you know and again, I have nothing against young developers. I would like to think I'm still young developer in some ways, although I don't think anybody else thinks that. I think a couple things have happened over the years. I think we spent a bit of time, you know, 10 years ago and 15 years ago, focusing on getting developers to be better about security. I think we took our foot off the gas little bit on that, and over the past 10 years or so, I think a lot of what we've done in security is focus on trying to wrap up developers in an environment where they're kind of sandboxed from evil so developers can write any stupid things they want. We've got scanners and WAFs and agents and all sorts of things to try to secure them. And I think one of the things that you see in the move to serverless - and again I don’t think it's a serverless-only thing, but I think it's in serverless more than anywhere else thing - is that sort of divide between security people and developers. It's not really tenable and, you know, in serverless, a lot of the security controls that security used to own are now security controls that developers control. Like configuring IAM roles and setting up VPCs and things like that. And so in a lot of ways, we've actually put more responsibility on developers, but we haven't necessarily empowered them in real ways to make security decisions, and at the same time, we haven't given security people a way to meaningfully understand and audit some of those things when they don't necessarily understand what the application does or what the code wants to do. So I think that's been a big change, and I think that's true across a lot of cloud applications. But it's just truer in serverless applications. You're kind of forced to reconcile that. The other thing about serverless applications specifically that we like to talk about is the fact that developers have gone from an application that comprises 10 containers to one that comprises 150 functions, you know, could create all sorts of nightmares in testing and monitoring and, you know, deployments and things like that. But for security it’s an interesting win there where you get to apply security policy, IAM roles, runtime protection, at a very fine-grain level — you know, at kind of a zero trust, small perimeter level. And that's if you can do it right, if you can do it at scale and automatically, that could potentially be a huge win, really for, you know, mainly least privilege, reducing attack surface and reducing blast radius. You know, something goes wrong; my developer left a back door accidentally into a function. But now that function really can only do right to one particular table, as supposed to, you know, in the old world, where that gave an attacker a lot more capability. So I think that's an opportunity that is on the table. It is challenging to capitalize on that. Like you said, there's less time. There's less gates between developers and running production code. And that means that, you know, how do we automate and capitalize on a lot of that value without trying to slow everybody down? That's the big challenge.

Jeremy: Yeah. You mentioned this idea of developers running their code in sandboxes. And when you put traditional tools like WAFs in front of incoming web traffic, obviously it inspects that, and we hopefully take care of basic things like SQL injection and cross-site scripting attacks and things like that. But what about all these other events? Because that's one of the things that I always like to address, and I don’t bring up to scare people, but serverless certainly promotes event-driven architectures, and AWS Lambda has something like 90+ event sources, plus custom events, and really only two of those, I think, would even look like traditional web traffic. WAFs aren’t designed to understand these other types of events. So what are the WAF equivalents for other event, or is this just all about writing good code?

Hillel: Yeah, absolutely. And I think it's interesting to see the evolution because I think if we go back about a year and a half, you know, there were, I think, 16 event types and not as many as there were today. But it was still interesting to see that I would say 90% of what was going on was kind of API gateway and then CloudWatch timers. But over the past year and a half, I've seen a lot of evolution in terms of how applications are built, and people are really learning how to use some of these triggers that are out there to build more interesting applications — applications that are more efficient that bypass a lot of the bottlenecks that existed in the past by using, you know, things like AWS IoT or AppSync or we're using, you know, Kinesis to directly upload data. So a lot of those triggers are now out there, and you're right. WAFs typically are somewhere between difficult and impossible to put in front of most those things. Again, that's true in non-serverless applications as well. It's just that the norm now is to build some business logic that's triggered by some data coming in from one of these APIs that I can't necessarily put a WAF in front of. So, yeah, we need to reconcile where are we putting security so that it's more agnostic to where things are coming in, where are we putting security, so it's not assuming that all the bad guys are outside this big perimeter that has one big front door and all the good guys on the inside. And again, I think we should be assuming that for the rest of the cloud as well, but we just don't have a choice when it comes to serverless.

Jeremy: Yeah, and the other thing, too, and I guess this would apply to the entire cloud infrastructure, or infrastructure as code really — is where it's very easy for you to spin up all these dev and staging environments, and you see developers and companies creating lots of functions and lots of versions of functions, API Gateways, and things like that. But those are all out there, and often never get cleaned up. Is that a security risk, in your assessment?

Hillel: It's interesting because the rest of the world tries to decrease friction, right? That's what we're focused on. How do we do things with fewer barriers? Security kind of likes their friction. We really enjoy a little friction. The fact that there's friction means it's a little harder for a developer to do something. And maybe there's more things along the way that might prevent him from doing something silly. So the fact that we can now spin up machines instantly, we can deploy functions instantly, that's obviously great for productivity in a lot of ways, but it does make security a bit of a nightmare. And yeah, I think the move to serverless as a mindset, but also as a technology. I mean the fact that you could literally go into the console in Lambda, hit, create function and write some code, hit save, and that thing's in production? That's incredibly powerful, but it's a lot more scary than it is powerful in the real world. And so what we see a lot is we see sprawl really quickly. You know, in a lot of ways, the last protection from kind of resources you didn’t need that were hanging out there in the cloud world was your CFO with somebody at the end of the month going “Our Amazon bill is what? Oh my God, let's do an audit of that. You know, what's that machine called Hillel Test 1 running in Australia. Do we need that?” And I’ll go, “Yeah, probably not. It's called ‘test.’” So thank God we paid $200 a month for it because from a security perspective, somebody wants to care about it. Now you can throw functions and stacks and versions, old versions of functions and API gateways instantly. You pay virtually nothing for them unless they're being called. So from a developers perspective, okay, I put some tests back up. What do you want? I put it up in US West-2, which is where we don't put our production applications. So it's easy to realize it's not production. And I tried something out and I didn't delete it. But from a security perspective, well, that's all callable. That's accessible. It's probably got very little error-handling because you were just testing something out, and it's open to the outside world, and it's probably not going to go away unless we’re really mindful about it. So yeah, I think you know, we see people go from zero Lambda to hundreds of Lambda where they really only need 13 or 14 Lambda functions in production, you know, within the course of months and a lot of what you need to focus on — whether it's through tools or honestly, whether it's just through good hygiene — is what's out there, what's deployed. Why is it out there? Do I need it? Is it part of something important? Can I prune? Prune everything. You know, it's really gotta live in a Spartan way in the cloud.

Jeremy: Yeah, and I think, too, that when you start piling up all these old versions of functions and you have all these old API gateways that are just for testing things, as you mentioned, I’ve seen this quite a bit, where people will publish an API gateway to a test function that does something like dumps a whole bunch of information that you probably shouldn’t be dumping. And security by obscurity only works for so long, so I can definitely see how all these shadow APIs can certainly open up some security holes. Plus the other thing you mentioned too, about your “Hillel test” server in Australia, I get the higher bill can be somewhat beneficial from a security perspective, but even small resources can start to add up. So I see this too where people will create something like a DynamoDB table, but not use the on-demand pricing and then all of a sudden, they've got a bunch of read and write units that they're getting charged for, and you don't realize it because it’s often spread across lots of services. So I find that to be an interesting side effect, as well, and pruning would certainly help there, too.

Hillel: Yeah. Yeah, for sure. Kinesis is the killer. That's the one that really gets you.

Jeremy: Yes. When you're paying for those shards. Sure. Okay, so let's talk about this idea we mentioned regarding real-world serverless or serverless security in the real world. And so you've obviously seen quite a bit of security stuff. You’re a security guy; you run a security company. So maybe you can push past all of these anecdotes, and discuss some of the things you're actually seeing happening out there.

Hillel: Yeah, it's great. It's really an opportunity, before I even say anything else, to say it's worth remembering that the people out there who are trying to attack our system, they're the same people who are out there before, whether they're, you know, personal people, state actors, you know, criminal actors. Whatever it is, it's the same people. They may or may not even be aware that we've decided to build our application based on Lambda or Azure functions or something. So they want the same things. By and large, they're going to use a lot of the same technologies. They are starting to adapt a little bit to the fact that it's a serverless environment. They are starting to understand how serverless environment scale and how they could benefit from that, how they can do more brute-forcing, perhaps in certain cases, and they might have been able to in the past without being detected. But by and large, we're not seeing a huge shift in what are attackers doing. I think from a defense perspective, from the things I care about perspective, we're definitely seeing that there are new things you need to focus on — you know, the whole debate between denial of service, and then I look wallet or, as do I leverage the cloud to scale horizontally tremendously for me so I don't worry so much about being knocked over by a denial service account. But then maybe I'll pay a tremendous amount money at the end of the month. Or do I try to figure out, you know, what's my minimum concurrency I could set up and still service my customers and try to avoid paying through the nose? Those are interesting discussions that maybe weren't as relevant in the past and are relevant now. But mostly I think from the outside things are quite the same. I think denial of service in general, this is a category of attack that, you know, broadly speaking, we need to focus on a lot with applications. Denial of service not so much in the “what if I get hit by a million requests at the same time?” You know, that's what Cloudflare is there for, or AWS Shield will help you with things like that. But denial of service more in the what happens when people exploit the business logic of my application to just make me consume resources. And you know, you can hit a login endpoint and consume a lot of Dynamo resources. And as you said, you know, in a lot of these cases where there isn't on-demand billing, we've got to set up a capacity. So we, you know, until Dynamo had on-demand, we needed to decide how many read units we set up for Dynamo, and if we were storing our hashed passwords in Dynamo for some reason, and looking them up during login then we needed to figure out, you know, what's the right capacity so we didn’t get knocked over by that. But we've seen more complicated cases, and a lot of this has to do with architecture. I mean, we recently saw someone who got attacked, but that attack didn't just manifest in, “Hey, my stuff gets really busy.” It manifested in “My stuff got really busy. My Kinesis pipelines and analytics pipelines applications got full of all sorts of spurious data that were part of the attack.” That data wasn't getting consumed or drained from those pipelines in the way that maybe they should have been or with the speed they should have been, because of the way I had structured my application. And this customer was down for a couple of days until they could figure out how to properly, you know, drain these attacks out of their system and then figure how to mitigate those things. So a lot of the ways that people architect around these managed resources, and these more complicated, more distributed architectures could be really important, not just for performance, but also for security.

Jeremy: Yes. So the point about the design of the architecture, I think that is right-on, because I see this problem quite a bit, especially with things like cascading failures. So what are those trade-offs that we have to make? What do you feel is appropriate? Do we want to continue to scale up to handle all these extra requests, if we do get some sort of a flood? And like you said, Cloudflare or something like AWS Shield will knock down the DDoS type attacks. But what about the more common ones, the ones where somebody's maybe trying to brute force your password form or flood a webhook endpoint or something like that — where do we draw that line between, you know, letting it scale up and shutting people down?

Hillel: Yeah, it's always the eternal questioning in architecture in general, especially in cloud architecture, and when you factor in security, it becomes a little more complicated. I think, obviously, the earlier you can know something is wrong, and something should be ignored or discarded, the better. Right? And so one of the one of the mistakes we often make with distributed architectures is we figure out that the logical place for something to be checked is somewhere down the line. You know, Lambda function puts something into SQS, gets pulled out by another Lambda function. It does the processing on it, then spins off three other Lambda functions and one of those third Lambda functions that’s all the way down the ranks, they're the ones who are going to go, “Wait a minute this request isn’t signed properly,” or “This request doesn't match the account ID it came from,” or something like that, because, logically, they have access to the data at that point. And that's great. Except that now you think about that from a security perspective, it took a long time and it consumed a lot of resources before you figured out that that request was something you'd actually like to ignore. So in some cases, figuring out what's the earliest place that I can filter out bad data from good data, if I can recognize bad data, how quickly can I do that? How can I avoid looking things up in a database before I know something's a problem? Those they're going to be helpful. And obviously on the architecture side, trying to make sure you're using resources in the cloud that scale really well, resources that let you handle edge cases, you know, of crazy overflows or lots of extra data inside your pipeline. How do you deal? How do you deal with an SQS pipe queue that's full of requests you don’t want to handle? How do you drain those? How are you going to handle that? So some of those are architecture decisions you have to make to make sure you're not scaling out too rapidly; you're not consuming all your resources. But at the same time, you can handle that sort of massive scale that you want to handle. The whole point is to say, “I build this application.” Just as an anecdote, we have a customer who started with us last year. They had about 100 million requests, you know, per month invocations per month on their Lambda infrastructure. They’re at close to a billion now. They haven't changed their operations team. They haven't changed their code all that much. So they've really capitalized on the fact that you can use serverless to build things that just scale magically without having to worry about it. You want all that, but at the same time, you will make sure your security concerns are mitigated that, you know, in that same system.

Jeremy: Yeah, and that point you mentioned about using a lot of resources because you're just passing data through without inspecting it, that is another very interesting trend that I think can be quite a problem. But it’s also not impossible to protect against, either. If you're streaming something into Kinesis or you're putting something into an SQS queue, there are ways for you to validate or verify, and it might go beyond just validating the signature too, but looking more specifically at fields like phone numbers or whatever, and make sure that they meet a certain format. And that rather than you pushing that into the Kinesis stream or into SQS or EventBridge or whatever you're using now, to verify that before, just do some even basic checks on it before you start flooding downstream systems. I think that's a really, really smart point. So what else do you see developers struggling with? Because we now have this thing where, like you said, our production systems aren’t running in traditional sandboxes anymore. And a lot of the burden of security is now on us as developers to make sure that our applications are secure. So what are the things that you see developer struggling with?

Hillel: Yes, I mean, I think the number one thing we see a developer struggling with is IAM roles and permissions. This is something that developers have recently inherited. I wrote last year about a company we had talked with, and are working with, where the developers all got together and said they were going to quit unless the security team gave them ownership of IAM roles. And the developers made an interesting point, which was we can't do our jobs if we have to go to security every time we change IAM role, because we're now talking about IAM that governs 5000 API calls we could make, you know. In a world where we're expected to move really rapidly, we have to own that. And I think that makes a lot of sense. Developers aren't necessarily equipped to make those decisions. And when you look at how developers work and they say “Okay, now I'm reading from a database. So I'm going to have to put Dynamo something there. I don’t know. Is that query or scan or both? Let's just throw a wildcard at it because that's the safest thing to do. And then, which resource? I mean, the table name depends on whether I'm in staging or production. So let's just put a wildcard on resource because that's the thing that's going to make sure I don't break,” right? To a large degree, I understand developers. It's the same reason why the second most common configuration for the duration of the Lambda function is the maximum. The first one being the default, right? Because if I'm going to change it, I'm going to change it to the maximum. That way I don't have to worry about a timeout; that's great. So I think the same way we see people struggling to deal with security configuration in a way that really, you know, meets least privilege and minimizing risk, I'm not sure I blame them. I mean, as a developer, I also I want to move rapidly, and I want to get things done quickly, and I'm now being, you know, pushed even ever harder to write code faster, deploy code faster, less testing, more automation, more deployment. So that's the thing where I think people struggle the most with and it has a huge impact, right? I mean, negative and positive. As I said earlier, properly configured IAM roles on lots of little Lambda functions will give you a tremendous amount of security joy. It will melt away huge swaths of your attack surface. And then at the same time, when you discover that your entire account is using a single roll, by the way, sometimes because security people demanded that they control that one roll and it's got IAM wildcard Cognito wildcard, Dynamo wildcard, S3 wildcard on it. That's going to put you at a huge amount of risk. So I think that's the place where we're saying, “Hey, developers go do this,” but we're not empowering them to make an easy, good decision and as security people, we don't necessarily even know what the right decision is, and that's a real struggle.

Jeremy: And I think that this idea of the star permission too — and I don't know if it's just an education thing, but it almost seems like it's sort of becoming a joke, like everybody's like “don't use stars,” but, obviously I totally agree with that — but when you do use star permissions, what are the risks of doing that? Because again, if you were running, let's say you're running, a EC2 instance, whatever permission that EC2 instance has — which is probably a lot because it has to interact with a bunch of different services and connect to the databases and do all these other things, plus, it probably has its own role that has access to different parts of the network, you know — why is that different than opening up permissions on, say, a Lambda function?

Hillel: Sure. So fundamentally obviously it's not really different. And a star here and a star there are equally bad and I've seen organizations that have, you know, policies in place to prevent stars from being deployed, and that's all great. At the same time, we need to recognize that least privilege is always important. But if you asked me whether I would spend 20%, 30%, 40% of my security effort on getting to least privilege on my EC2 roles, I'd probably say no, I want to spend those people on other things, because at the end of the day, those EC2s are still going to have pretty big roles. So I would take a quick stab at, you know, carving out stupidity from those roles, and I'd probably accept the fact that having extra privileges there, it's probably not terrible compared to what I could use those other people to do configuring security groups or doing audits or configuring my WAF. But when you go to a Lambda function, which probably does one thing, accesses the same resource is every time, probably just reads from that one table, and writes the other table, the difference between “*” and ListTags is incredible. Right? You know, we've seen functions that literally only need to list the tags on resources and they get a star because nobody wants to figure out what the right permission was for list tags, which, by the way, is ListTags. So it's not that hard to figure out in this case. But that star means that that function can not only list tags on, say, IAM or, you know, on an EC2 instance say, but it can delete it or created or spin 10 up or spin 1,000 up, right? And so that's, in the world where this function literally needed to do one or two things, and you gave it everything, that's really challenging. The other thing to remember is that wildcards are not categorized by risk, right? So it's not like someone said, “Hey, there's really risky star and a little bit risky star and pretty benign star and totally benign star,” and you could just say, “I just want totally benign star,” right? There's a little bit of read and write stuff you could do, but for the most part, a star somewhere gives you access to something really benign along with tremendous risk. And so you really have to be mindful about that. And again, yes, you should be mindful about that in EC2 as well. You shouldn't use stars in EC2 as well, but your mileage will vary it when you do it on small resource is like Lambda functions or Fargate containers, it's going to really, really do a lot more good for you.

Jeremy: And especially the Lambda function, right? Because they're ephemeral, and they can be used over and over again. So if there is an exploit there, that’s sometimes hard to see what somebody may have done to exploit it. And if you give it a star permission, and like you said, I mean, DynamoDB is the example I like to use, if you give DynamoDB star permissions, people are like, “Oh, I can get items. I can write items and oh, I can delete items,” but no, you can delete tables. You can create new tables that you can change provisioning capacity. There's a lot of things that you can do and so IAM is so incredibly powerful. But again, those stars just make it a little bit dangerous and I’m not trying to scare people. It's just one of those things where it's like if you could tell anybody you know the one thing that's probably the most important security piece that you have full control over is IAM permissions. So, learn them and use them correctly. Alright, so what about things like public buckets? Are people still doing that? Because S3 now makes them private by default.

Hillel: Yeah, I think one of the things that we see around the world is that stupidity doesn't go away that quickly and you know, part of this is some things are just easy to do wrong. And I think one of the things we focus a lot on in security is the best security solutions are the ones that make the easiest thing the right thing. That's not always so easy to set up — maybe sometimes impossible. But if you could make the easiest thing to do the right thing, that'll be great. And to a large degree, the cloud providers have done better recently by trying to make some of the defaults better, making the process of making a bucket public harder to do, more friction. You know, again, like we mentioned before. That's great. Yeah, you know, it's easier in the cloud to often configure things. And it's not just public buckets, it’s API gateways, it can be, you know, DynamoDB or AppSync resources that can more easily be set up to be visible from the outside world. So that's something that people have to be mindful of. But I think the most important lesson about public buckets is just because you remember this was a problem five years ago does not mean that problem has gone away. And I think in some cases, you know, we talked a lot about other problems that are large resurgence in, like SQL Injection or other things like that. There’re some interesting reasons why what I call “millennial stupidity” is back. You know, things that we did 15 to 20 years ago, and we thought we eradicated. you know, the malaria of coding is SQL injection, and I think we thought it was all done. And now we're discovering no, it's back. It's back with a vengeance. Maybe because we're writing in languages like JavaScript and Python more than we're writing in languages like C Sharp and C++, but also because we're demanding people to deliver things faster, and the fastest thing you could do is concatenate strings. And so you know, that's going to be the quickest way to your database, and we're not focusing enough on “do it slower, but do it secure.” And we’re making that harder. The hardest thing to do in those languages is to write some complex query that's going to protect you from SQL Injection. The easiest thing to do is concatenate strings. That's obviously a software challenge. So I think a lot of those things, they're not going away so quickly. We need to get better about them. We need to detect them. We need to have a process. Part of problem with public buckets, for example, is sure, I've got public buckets. They’re part of my web app or they're part of my single page, you know, application or something, and they should be public. How do I make sure the ones that are public are the only ones that should be public and nothing else is public? And I think a lot of that has to do with hygiene and posture. You could automate a lot of that. But you also have to just be diligent in mindful about a lot of that.

Jeremy: Right. And I think that the mistakes that people make on a regular basis are probably the ones that, like you said, we thought we solved 15 years ago with things like WAFs. And honestly, nobody ever taught me about SQL injection, I actually learned it the hard way very early on in my career. So I like the term “stupidity”, because people still tend to do some pretty stupid things like check credentials into GitHub, but a lot of it is probably just ignorace and inexperience as well. For example, there's been a recent movement away from using environment variables to store secrets in Lambda functions. And I think things like this are all smart things that we should be doing to minimize risks, but there are still plenty of posts (probably even some of my older ones) that say that’s okay. You mentioned this idea of hygiene and posture, and I think that probably brings us to this idea of mitigation, maybe? So we have tools. There are tools out there that we can use that can help us mitigate some of these things. So what are your thoughts on how we do this? Can we just put tools in place to try to block some of the stuff and save developers from themselves?

Hillel: My number one cliche would be: let's focus less on mitigation and more on prevention. I know that's super cliche in security, but I think here, one of things that we see a lot is that you get a lot more mileage out of trying to make sure that the things you're deploying are deployed with least risk than you do at trying to chase after attacks. And again, that's not to say we don't need to do both. We will forever need to do both. No amount of proper configuration hygiene is going to prevent every type of injection attack or cross-site scripting attack, or whatever is on our infrastructure, right? We need to mitigate all of those things. But in cloud applications and particularly in serverless cloud applications, the value of hygiene and posture is much greater than it was in the past. You know, whether it's the things we talked about earlier, like just leaving around old stuff that could put you at risk but you don't need, or it's getting IAM configured properly, or it's things like setting timeouts to their minimum threshold, if you can. All those things are going to give you a tremendous amount of value in making it hard for attackers to do what they want to do on your system, before you even started looking for a SQL Injection, right? You still need to look for a SQL injection. We still need to run the tools that we’re going to run. But before we get there, before you start worrying about blocking and mitigating kind of runtime attacks, spend a significant amount of energy on: What do I have? Where is it? Do I need it? Is it configured in a way that gives me the least risk? Have I isolated the things I can isolate? Am I doing all that continually? That would be my number one focus.

Jeremy: It sounds like a lot of work, though.

Hillel: Well, yeah. I mean, that's why we get employed, right? Security people need to get paid?

Jeremy: So maybe that is a good segue to this question. Whose job is this? Right, so if the application developer is now the one fighting for IAM roles, they want control over these IAM roles, and you've got different types of events coming in that could have SQL Injection, where again, just good coding practices, we know that takes care of some of those things. But then things like configuring what the timeout should be. That's on the developer now. Things like how many RCUs and WCUs do we need for DynamoDB, or how many Kinesis shards do we need? A lot of those decisions now fall on developers. So whose job is this? And maybe it changes with different organizations, but is it Dev? Is it DevOps? Is it AppSec? Is it Ops? Do we need SREs to come in? Who owns this now?

Hillel: Yeah, I I think in a lot of ways it's all of the above. And I know we've said that for a lot of years. I think the difference is, to your point you made in the question, there's just a lot of stuff that those people in those layers own now that they need to be responsible for. Now, I'm a big fan of the idea that security owns overall responsible for security. And if you don't have a security organization in your business, then you're going to find out sooner or later that that was a mistake. You need somebody whose job it is to care, right? But that person can no longer imagine they can solve the problem on their own. You know, I think we were able to imagine for a while that we could throw up a WAF in front of application and then feel good about ourselves. I think we're recognizing that's not really going to be enough. And so, yeah, we need developers to be empowered mainly to do the right thing in the easiest possible way. We need DevOps to help us automate the process of making sure those things happen. So kind of a trust-and-verify model. Sure, you own IAM. Sure, you go ahead. But at the same time, there's stuff in the pipeline that's going to make sure that if you go too far left or right, there's guardrails in place to help put you back on track. Then security needs to know that even though they put some of that stuff in the DevOps pipeline, and that's supposed to give them a lot of good hygiene and posture, stuff will still make it into the cloud in a way that's not ideal, whether it's because it looked okay when it was deployed, and then later on, we discovered it had a third-party vulnerability we didn't know about; or because it was grandfathered in; it got some waiver; it bypassed something; it's been there, etc. We still need to worry about okay, what's actually happening at runtime? Can we know where our risk is and can we go deal with that? And can we still, yeah, look for SQL Injection. Look for code injection. All those things still have to happen just in a way that lives it well in a serverless applications, scales with the serverless application, doesn't get in the way of a serverless application. So yeah, we need to layer all those things on, empower everybody to own their layer, and make sure somebody else is responsible for verifying everybody else's job.

Jeremy: Yeah, I think that a good way to approach it. So let's talk about tools just for a minute. And I know there are a whole bunch out there. Protego obviously is one of the security tools. What does Protego do? How does it mitigate different parts of this?

Hillel: Sure. So first, let me mention that they're obviously lots of tools out there. There are a bunch of tools that you absolutely should be using that come from the cloud providers. So Azure, AWS and Google all have very robust suites of security tools that are available to you if you're in the cloud and you definitely should be taking advantage of those tools. They're going to help a lot in a lot of the areas that you're in. So, you know, in AWS World, you should be looking at using things like WAF and Shield in places that it makes sense. You should be looking at using CloudTrail for auditing certain things and GuardDuty will give you some visibility to anomalies and certain types of parts of your account. So those things are all really great. Where we come in is we really come in trying to bridge the gap between security and dev and infrastructure and code. So we’re kind of saying, “Okay, in a world where the key asset you deploy or in code and API and where you're going to put security around code and API, not at the infrastructure operating system level, how can we come in and try to drive security in a way that's meaningful?” Where we can say “Okay, developers can do what they want to do. Security people can have their policies. Security people can know that they're driving towards least privilege and optimal security configuration and runtime policy without having to close their eyes blindly, and you know, hope. Just throw a WAF in front, and not configure it.” I think the single hardest problem to solve in the AppSec world is how do I configure my WAF and there're usually, you know, three answers. One is, don't bother. It's not going to work, just run it and kind of DDoS and basic attack mode. The second is, yeah, put a lot of people on it, have them constantly evaluate what the application does and try to get the WAF continuously configured. And obviously that gets more and more challenging as the application changes more and more rapidly. And the third is kind of these learning mode WAFs that exist where it's like, let me learn the application, figure out the right configuration, and the challenge there in serverless, and cloud native is things don't run statically for long enough for that to work. You know, by the time three or four weeks have gone by and you've learned what the API should be doing and not doing, those functions have been redeployed seven, eight, 10, 12 times, changing their functionality so you kind of have to keep continually trying to relearn before you can block anything. So to a large degree, what Protego is focused on is, on the one hand, automating least privilege, automating risk mitigation and automating is key. We talked earlier about how much work it is. You mentioned that, right? it's true; it’s a lot of work, but really, the way to deal with security in the modern world is to automate 99% of workload. Set up the right policies, set up the right tools, set up the right processes and let those do most of what your job is. And you spend all of your incredibly taxed time trying to deal with the 1% of things you haven't figured out how to automate yet. So we really try to automate the process of IAM role, IAM role generation vulnerability scanning, looking for keys in the code, and all sorts of other things that people are doing more and more. But mostly I'm trying to make sure that before it hits the cloud, it's configured optimally. After it hits the cloud, we continue to monitor it and make sure it's configured properly. And in the second half of what we focus on is what is runtime application security in a serverless world in a way that takes advantage of serverless. That scales with serverless. You know, we really tried not to build a WAF for a function. I think that's the easiest cliche to fall into. And it's not to say that part of the solution is not a WAF for a function. So I'm kind of contradicting myself. But it's really more to us about focusing on “hey, what is the best way to get security?” And so yeah, part of securing an application in serverless is securing each function and part of securing each function is making sure the inputs and outputs, you know, are validated and make sense. But a big part of it is really focusing on application behavior. And that's really where we're very powerful. Because one of the things we can really do is we can build kind of, I call it, “layer 8 microsegmentation.” It's microsegmentation at the level of API calls, process creation, interaction with third-party resources, things like that, where we can automate the process of figuring out by understanding what the code does, what the right configuration is, and we can really automatically build the proper microsegmentation around each function of your application simply by having an algorithm understand what the code wants to be able to do, understanding what the code is doing at runtime and saying, “Okay, I could build a whitelist for every one of those behaviors, and I can literally lock the function in a cage that says you can do exactly what you need to do. Not a drop more. Not a drop less.” And then above and beyond that, we let security come in and say, “You know what? I don't care what a developer wrote. I don't want anybody to be saying. IAM create user. IAM delete role. Those are things that need special exceptions from me. So I want to apply a policy above that.” So Protego lets you kind of automate all that, build your policies, deploy them, and really kind of sleep at night for the most part.

Jeremy: Awesome. Alright, well, listen, Hillel. Thank you so much for joining me. If people want to find out more about you and Protego, how do they do that?

Hillel: Yes. So a bunch of places. Obviously you can go to our website, and there's quite a bit information about what we do and how we do it. And, you know, you can see a demo. You can actually sign up to use the product. A bunch of clicks, you could start playing with the product and see what it does in your environment. I try to post on Twitter both at @hsolow and @ProtegoLabs, and as well as, you know, on LinkedIn. We've got a blog on Protego on the Protego website that you can take a look at. We try to really ah impart our findings and our wisdom both from a security perspective and a serverless perspective and just try to keep people interested. We've got a podcast, which you're going to reciprocate, and come on real soon, where we try to focus on some of the things that are new with the security angle in the serverless world and in the cloud native world. So there's a link on our website to that as well. And I think hopefully that'll be interesting for people as well. I mean, I see the ecosystem growing. I see more people interested, and I think that's it's reflective of the fact people are figuring out how to use this stuff to their benefit and hopefully, how to do security as well there.

Jeremy: Right, and I think the education piece of it is huge. And the more content we can put out there, that hopefully gets people thinking about this stuff is great. Alright, awesome. I will get all that in the show notes. Thank you again.

Hillel: Really appreciate. Jeremy. Thanks for the opportunity. See you soon.

What is Serverless Chats?

Serverless Chats is a podcast that geeks out on everything serverless. Join Jeremy Daly and Rebecca Marshburn as they chat with a special guest each week.

Serverless Chats

Episode #11: Serverless Security in the Real World with Hillel Solow

Episode #11: Serverless Security in the Real World with Hillel SolowEpisode #11: Serverless Security in the Real World with Hillel Solow

More episodes

Episode #11: Serverless Security in the Real World with Hillel Solow

Episode #11: Serverless Security in the Real World with Hillel Solow

About Hillel Solow

Transcript

Chapters

Show Notes

About Hillel Solow

Transcript

What is Serverless Chats?