Screaming in the Cloud

Do we have your permission to share this episode of Screaming in the Cloud with you? Sonrai CTO and Co-Founder Sandy Bird is back on the show to help Corey break down the woes that come with granting permissions in the world of cloud security. As they catch up, the pair touch base on how automation can create major headaches, what goes into navigating the minefield of granting permissions, and if the future of adoption patterns is as grim as Corey predicts. Sandy also answers one of Corey’s long-time questions: how do you pronounce “Sonrai?” Who knows? Maybe Corey will finally learn how to say it properly...


Show Highlights:
(0:00) Intro
(0:30) Breaking down Sonrai’s name
(1:45) Sonrai sponsor read
(2:25) Getting alerts vs. fixing the root of the problem
(4:50) The problems with granting permissions
(7:34) The dangers of automating permissions
(10:10) "Where do I make this change, and how do I enforce it?" 
(13:46) The security concerns that come with tagging automation
(16:12) Sonrai sponsor read
(16:53)  Properly deploying permissions access
(21:16) Woes of running reporting in the middle of the night
(23:21) Are adoption patterns getting worse?
(29:01) Where you can find more from Sonrai Security


About Sandy Bird
Sandy Bird is the co-founder and CTO of Sonrai Security, helping enterprises protect their data by securing cloud identities and access. Sandy was the co-founder and CTO of Q1 Labs, which was acquired by IBM in 2011. At IBM, Sandy became the CTO for the global security business and worked closely with research, development, marketing and sales to develop new and innovative solutions to help the IBM Security business grow to ~$2B in annual revenue. He is a trusted and experienced cloud security expert., Sandy Bird is the co-founder and CTO of Sonrai Security, helping enterprises protect their data by securing cloud identities and access. Sandy was the co-founder and CTO of Q1 Labs, which was acquired by IBM in 2011. At IBM, Sandy became the CTO for the global security business and worked closely with research, development, marketing and sales to develop new and innovative solutions to help the IBM Security business grow to ~$2B in annual revenue. He is a trusted and experienced cloud security expert.


Links



Sponsor
Sonrai Security: https://sonraisecurity.com/

What is Screaming in the Cloud?

Screaming in the Cloud with Corey Quinn features conversations with domain experts in the world of Cloud Computing. Topics discussed include AWS, GCP, Azure, Oracle Cloud, and the "why" behind how businesses are coming to think about the Cloud.

Sandy Bird: By default, great security, but there's probably some things you could change. And so what do you do centrally versus what you do, um, globally is a good question. And when you expand that out to Amazon, now it's not just one box you're dealing or one VM, now you're dealing with, okay, we've got multiple different accounts, we've got development environments, we've got staging environments, UAT.

What do you want to control centrally?

Corey Quinn: Welcome to Screaming in the Cloud. I'm Corey Quinn.

I am joined on this promoted guest episode by returning guest, Sandy Bird, who is the CTO and co founder over at Sonrai Security. Though I, the way I, every time I read it, I want to pronounce it San-ry. That's S-O-N-R-A-I. What's the deal with that?

Sandy Bird: It's always one of those things when you're trying to find a domain name for your company, all the good ones are gone, no matter what.

And, uh, at the, when we started this company, we were really about trying to secure all of the data in the cloud. And so data was important to us. But of course, data.com and datasecurity.com and anything with the word "data "in it is gone. And so, but what was interesting, our founder is Irish and Gaelic Irish for the word data is "sonraí" with a little fada over the "I."

And, uh, that was amazingly enough free in the domain world, at least. So we, uh, ended up as Sonrai Security. So always good stories.

Corey Quinn: Indeed, the stories are important. People always want to understand, Alright, wait, there's got to be, is there a joke that I'm missing hidden somewhere in there that's kind of important?

Uh, last week I wound up seeing a Miata driving down the road in front of me with a license plate of "HTTP507." That's an error code that means "insufficient storage," which for a Miata, I thought that was genius, but if you don't- if you're not in on the joke, it just sort of sails past.

Sponsor: Did you miss this year’s epic ACCESS virtual summit by Sonrai Security? 

The summit was jam-packed with industry experts sharing their best practices and technical strategies for securing AWS environments! I’m talking about everything from scaling permission management for both human and non-human identities to reducing cloud attack surfaces through automated SCPs.

If you missed the live event, don’t worry, Sonrai is granting FREE access to all of the sessions on-demand at sonrai.co/access-on-demand. 

That’s right, ALL of the sessions are available to you now at sonrai.co/access-on-demand.

Corey Quinn: So let's speak a little bit about, I guess, in on, I would say it'd be a joke, except it's not that funny.

Uh, something that I've heard a recurring refrain about, from you among others, is that people are getting lots of alerts, but they're not fixing anything. What's the deal with that?

Sandy Bird: You know, cloud security over the last five years has been largely about visibility, right? We spend a lot of time finding about all the problems that we've created in the cloud.

And the longer we sit in cloud, the more problems that exist. And it's just this kind of continual- developers, you know, have lots of words for it. You know, backlog, technical debt. There's, there's many words for the issues that you have to fix. We generate lots of tickets for them to find them. We fix the high priority ones, right?

And, uh, you know, we've done a lot of work in the last five years as companies to try to say these are the ones that are most important and you have to prioritize them. But the backlog grows. It just gets bigger and bigger and bigger. And so, we actually, again, this is back to those origin stories- there's always a story, Corey.

I was working with, you know, certain customers that were very successful. They were fixing thousands of tickets a year. But we were looking at this backlog of tickets that was growing, and it was like, there's no end to this. And we spend a lot of time on identity. You can widen this to the broader cloud security thing of, you know, vulnerabilities and all those things.

But when we were just looking at identity and getting to least privilege, there was no end in sight. And so we really said, look, we got to fix something here as opposed to just having everybody try to, you know, update every policy on every role in AWS and fix all of the RBAC assignments in Azure. And we said, well, Amazon has given us this great way to do denies centrally across the entire organization.

You don't actually have to fix every role if you just control the, uh, the, you know, permissions at this central level, you can't do that for every permission. There's 10,000 plus permissions at Amazon and the other clouds all have 10,000 or more, but there's certain permissions that are. And so what we decided to do was we take the most sensitive permissions, the things that destroy your cloud, right?

You know, creating pre-assigned URLs that bypass your network controls, changing the route tables, changing the DNS entries. Those are bad things. Take those and control those centrally. Make it really easy for developers that need access to them when they use them to get those permissions. But then, you know what?

If you deploy a whole bunch of stuff with EC2 star, it doesn't really get star anymore. And, you know, I think that's, that's a way to actually fix some of the things versus just generating a lot of tickets. And then things are less impactful when they're, run there that way.

Corey Quinn: The problem that I've had with a lot of these approaches has been that when I, when I'm too restrictive with a permission, the thing doesn't work.

When I wind up being too permissive with permissions, things work and, and the negative, the negative aspect of that, the downside to that is very ethereal. It's one of those theoretical issues and, well, I'd already have to make a series of mistakes before that could really wind up being exploited. Who would make a second mistake?

Certainly not me. And it becomes this weird, I guess, cyclical problem where it's, there's always, in a world of infinite things to do where the backlog is always going to grow faster than you can tackle it, it's difficult to get people to sign off on making changes that potentially could break things.

Sandy Bird: You know, this, the theoretical thing is an interesting one because we always have this problem. Selling security has been like this forever, right? It's like selling insurance. But the, the issue is, is that if you work with enough customers over time, what you discover, there's, there's two patterns. One is, you know, of course there's a big breach and we all go and fix those problems.

Okay, we all know that pattern. The second pattern actually comes from internal, um, scenarios. And it's often by red team exercises, pen tests, whatever it is. And what happens is, is they discover that I, and I always pick on Git actions, but whatever. Well, you know, there's a Git runner, Git actions, some sort of a pipeline process, which are always way over permission.

They usually have star permissions, and no one wants to admit that, but the reality is they usually do. But when the pen tester or the red team exercise somehow grabs a hold of the thing that can control those sometimes as being able to do a check in or PR, whatever it is, somehow they can get a control of this thing.

They can make the cloud do crazy things. But when we actually look at the history, of course, there's Those things were never used. They didn't need the ability to create an access key on an IAM user. They didn't need the ability to attach the administrator role to some other identity. They didn't need the ability to change the route tables to the DNS enters.

And so what happens is the, the red team or whatever it is, kind of tweaks them in a new way. And so that becomes the evidence, right, to say, yeah, we actually do need to fix these things, um, because otherwise they can be used against us. And so I find that that's a, that's kind of that hook that starts people thinking, okay, we got to do something about this.

But again, to your point, like trying to fix every role to make it work so that you don't break something is really important. And so, again, we spend a huge amount of time looking at evidence in, use Amazon's examples, CloudTrail data, activity logs in Azure, whatever it is, to say, well, what does the Git runner that's doing this thing actually need to do every time it runs?

Yeah, maybe it has to make some changes in ECU, it has to edit security groups for some, you know, workload that gets put in there or something like that. But it's probably pretty rare that it uses the other 800 sensitive permissions, right? And so by, you know, gaining control of those ones when you know it doesn't use them is, is a good way to do it.

Corey Quinn: Part of the challenge, too, is especially with everything being, I guess, embracing automation, which you sort of need to do. There's the danger of, all right, if I enable this without any context or paying too close of attention, the odds are terrific that something somewhere in my environment is going to break.

Probably badly. And that is, at some level, it's sort of like, I fix bills. When people go in and inadvertently cause a production outage in an effort to reduce costs, suddenly they're not allowed to save money anymore. Because step one is keep the site up, making it more cost efficient takes a definite backseat to that, as it should.

Security also takes a backseat to that. Whether it should or not is significantly more debatable.

Sandy Bird: Every time you break something, That will never happen again, whatever you were trying to do. We actually, we were onboarding our customer a couple of weeks ago, and they had the right thing to do. Start in development, roll it out across all development, then actually move and you know, do it across, you know, UAT environments and then production.

But they knew that they were going to roll this out in this, day and they were going to control, I forget what in their world, how many it is, but say it's 20 different development teams and some of them are in different time zones, including Asia PAC. And so. They knew that the team deploying this out of New York or wherever they were would not be up in the middle of the night when the problem that they were going to cause happened.

And so they were, one of the things that they did was they found, you know, whoever they deemed to be responsible in that other, in that other world and said, you are the approvers of these things. And so we, again, in the cloud permissions firewall, we do this, but I've seen other companies do this too. When they, when they create these triggers, when you run into one of the blocks, it generates a- and we use chat ops, so it's Slack or Teams or whatever it is, but it could be email. It could be however you do your notifications. It drops that team an email and by them actually clicking on something, it removes the thing that was the breaker, right? And so if it was, you know, in this case, you know, controlling a whole bunch of sensitive permissions and EC2 and things like that, by them being able to press that button, they could get them out of jail immediately without the people in North America having to be up and aware of doing it. And that works quite well, but it only works if you can get the people that are actually building and doing enough control that they can get themselves out of that mess. And I think this is where AWS sometimes did a bit of a disservice to themselves in this concept of SCPs is that you can't even see the SCPs in these sub accounts.

You don't even know what they are. So you're, you're fighting blind in, in terms of what's there, right? And so, it's sometimes very hard for teams to know this, and we can talk about lots of other examples where they also don't make the error messages the same and all these other things, but it's really this visibility problem into those errors, which is sometimes problematic for people, so.

Corey Quinn: It also leads to a question of, at least whenever I'm dealing with AWS in particular, it's just a question of, okay, where do I make this change and enforce it? Now, in my test accounts, you know, It's sort of academic because I am the person that has access to these. I, whether I have access from one end or the other doesn't really matter.

Do I do it in the central org? Do I do it individually? Do I do it on the resource itself? Do I do it on the, on a service wide basis? Et cetera, et cetera. That becomes a, um, a much more academic question. But when you start dealing with real environments, this becomes a very important decision point, and it's certainly not helped by the fact that when an SCP prevents something, it is not intuitively obvious to someone in a member account why it's that why that thing didn't work.

It reminds me of yesterday, I pissed some people off on the internet by saying that SELinux, "Great tool. Such a terrible interface that the first thing everyone does is disable it because when suddenly this, this-" The failure mode of, "Oh, I tried to do something." It blocked whatever I was trying to do, silently, unless I happen to know it exists and know to look in the right log to understand what happened. It just shows up as a failed attempt to do something that I should, by all reasonable accounts, be able to do.

Sandy Bird: That's exactly it. And figuring out which one, you know, SE links is a good example, probably too much stuff blocked by default. Great security, but there's probably some things you could change. And so what do you do centrally versus what you do globally is a good question. And when you expand that out to Amazon, now it's not just one box you're dealing or one VM, now you're dealing with, okay, we've got multiple different accounts. We've got development environments. We've got staging environments, UAT. What do you want to control centrally? And we always, you know, we actually did an interesting blog, that's probably been three months ago or something, where we talked about like different levels of SCPs.

And the first one we always talk about is like the, the "Dirty Dozen" one, which is basically like no one in the world should ever be able to run these commands no matter what. It takes, you know, an act of God to make this happen and things like disabling your org cloud trail should not be allowed to happen.

Corey Quinn: Or deleting some of the production accounts.

Sandy Bird: Or deleting some of the production. Termination of account is one answer.

Corey Quinn: That's sort of, because on the other side of it, Oh, I'm going to spin up a developer test account to see how something works. I don't want to have to go to the board to get permission to delete that thing.

But you can't express an intent today when, unless they change something while I was on sabbatical, with, uh, being able to say, "Oh, part of my AWS org, this account, anything in this OU is designed to be ephemeral. Just let me blow it away and don't, don't bother me about it." Whereas these other things, yeah, don't ever delete them under any, for, under any circumstances for any reason.

Sandy Bird: Again, it becomes so tiered. Your example is so perfect. Like in dev, you really want to give the developer that's in that kind of in zone, almost all of the permission they need to, kinda do anything that they need within reason, but then, you know, at the central level, you want to have control of things like, you know, whatever.

And I always use the example of the pre-assigned URLs because they have a tendency to poke through all the networking rules you put in place to help you secure things. You want control over those in some services and in some places, and so, how do you balance those two things, right? And there is this, you have to have some way to separate.

We do it using a product that we write where, you know, you can actually give the approvals and controls at those levels using things like attribute based access. Attribute based access is super powerful in AWS because you can actually do these very quick changes using things that are not editing SCPs, but using the SCPs to define the controls, and so you can separate these two into a, you know, an environment where you're giving the developer power to do things, but still controlling it centrally. You can also implement similar things, you know, yourself, even outside of our product doing that. It's just, you have to write code and things to do it, but it's, it's a good way to do it.

Corey Quinn: It does presuppose that you have an efficient tagging automation system that winds up making that addressing things appropriately. I admit, I forecasted back when they announced that you could suddenly start using tags for other things than cost allocation, that this would be a massive security issue.

I haven't seen that emerge. So either it's not as big of a concern as I thought, or it is and we just aren't seeing that happen much. I mean, there've been some small ones. I found an issue with the managed SageMaker policy that let it, uh, alter tags automatically on anything in Secrets Manager, but it could only read ones with a certain tag.

So, yeah, because originally when it was just cost allocation, we tried to get everyone to tag everything. Anyone's allowed to tag things. Please just do more of it. Now, inadvertently with attribute based access control, we inadvertently gave people the keys to the kingdom by doing that, and it's hard to roll back permissions once they're out there.

Sandy Bird: I was really, in some ways, a against attribute based controls when they released it. I was like, "This is crazy." Literally, "add tag" now becomes the most sensitive permission in your whole org, and everyone has it. So, why would we ever allow that? However, again, and now, you know, over time, right, and we build these solutions, you realize that the same controls that you're using to control those tags can be used to protect the tag.

So now you can do things like say, yes, but if I'm going to use this very special tag called, you know, "My Permissions Control" tag, you can also write an SCP that says nobody can alter the "My Permissions" tag as well. But again, if you're using a lot of automation, you got to make sure your automation tools are aware of those things too.

So there's definitely this kind of balance between, you know, what can you do in automation using these things that are rolling the tags out, but then also central controls to protect them so that the SageMaker policy that gets applied to way too many things can also screw with your tags that are controlling your security.

We've actually seen, this is where I think it came to me, and I started to see some light at the end of the tunnel was, we saw people with big identity teams in AWS do it effectively and do it well. Now, most of those teams have seven or eight people, seven or eight people that are actually dealing with IAM in their cloud, right?

It's not like the one guy that gets one hour a week to deal with IAM, so they could do it. And so one of the reasons when we were building the cloud permissions firewall, We thought this was useful is that you could take somebody that had 12 accounts and didn't even understand ABAC access, utilize it in a programmatic way, in an automated way, where they could actually get the type of scale that these very large, highly mature customers were getting out of it.

Sponsor: Did you miss this year’s epic ACCESS virtual summit by Sonrai Security? 

The summit was jam-packed with industry experts sharing their best practices and technical strategies for securing AWS environments! I’m talking about everything from scaling permission management for both human and non-human identities to reducing cloud attack surfaces through automated SCPs.

If you missed the live event, don’t worry, Sonrai is granting FREE access to all of the sessions on-demand at sonrai.co/access-on-demand. 

That’s right, ALL of the sessions are available to you now at sonrai.co/access-on-demand.

Corey Quinn: It's a hard thing to approve. It feels like it's the sort of thing that you have to roll out in stages and gradually and everyone's watching it like a hawk. I guess the dangerous part is when you have something that. You'll catch the obvious stuff early on that way, but there's always those corner cases that mean one day, middle of the night, something doesn't work correctly.

And I think that your permissions firewall offering gets it right in that it doesn't fail silently. It alerts people that there was a thing that this explicitly blocked. What do you want to do about it? And that, that's helpful. I think that's something that most tools miss. It tends to assume that computers will either be perfect or humans will improve their crappy environments to the point where computers can now handle it for them.

I've never known either to be true.

Sandy Bird: It's this interesting, you know, we talk about the non-human identities versus the human identities, right? When it's a human identity, it's kind of easy because you can tend to note to the human. You can say, hey human, you just tried to do this and it failed. That's our fault.

If you want it approved, go talk to Corey. Corey will approve it. That's super easy. When it's the build role that does that, like, who do you talk to, right?

Corey Quinn: Right. It's the old story of, okay, great. Like there were, there were some early approaches to cost optimization that tried to do, um, spend allocation based upon the person that provisioned it.

And like the question, like the joke was always the CFO would come in. Who the hell is "Jankins?" That's the CICD server. And they're, they're, they're mollified and go away. And then John Ankins crawls up, murder the desk. Thanks for covering for me. But yeah, ideally you don't have individual users doing resource provisioning.

There's some sort of deployment process that will handle this for them just because once you get into the position of Wild West, anyone can provision resources. Suddenly, you have a, you have a library management problem of all of your existing resources, discoverability, observability, security, maybe who thought.

Sandy Bird: So again, it's, it's just as important to have an ad, we always just call them owners, right? You need some sort of an owner in some boundary, you know? You can use account in AWS, you can use subscription in Azure, you know? You can get super granular, you can get wide, but you need somebody that, you know, if all else fails, the build rule fails somewhere, you've got to be able to tell somebody immediately that that happened, right? Because without being able to tell them, then, as you say, it just goes into the dark, everyone gets frustrated. It's broken. No one knows how to fix it. They call security. Blame security, which is what always happens. Uh, and so you need a way to tie that together, and it's important. And we ended up building whole hierarchy trees where we force at least somebody at the top to know every time that it happens.

It is interesting though, we found this was again, feedback from product. You release a product. It's never 1.0 is never perfect. That's the way. But you, you get this thing, we, we had originally done this with these approvers, and what we discovered was when humans make mistakes, they don't want the approver to know right away.

They would rather have an option to opt out. So when I accidentally spin up Bedrock for the first time and want to, you know, relaunch some crazy new model and it's whatever happens. When I get the Slack message that says, Would you like to justify that? Sometimes they don't, actually.

Corey Quinn: This user is not in the Sudoers File.

This incident will be reported. Same approach.

Sandy Bird: Exactly. But it's what allows, I think, you know, some of these, we, I think we, last time we were here, Corey, we talked about these zombies, right? These identities that are two years old that have never been used.

Corey Quinn: Oh, the older it gets, the more load bearing it becomes, because who knows what happens?

Like, it's always weird when you have, like, in a code base, for example, there's this block of code that is commented out. Whenever we delete it, things act differently, and we do not know why. And the obvious answer is, of course, ghosts.

Sandy Bird: Yeah, of course, ghosts. Exactly. You need some ghosts in your system. It's good.

Anyway, the zombies and the, it's, it's the perfect example of that scenario. It's like, no one knows who this is. No one knows why it is, right? But we, we need to somehow have in a way where we can short circuit it. It doesn't work anymore. But yet, if it wakes up, because it's the yearly report that everybody talks about, that they have only runs once a year.

Um, and it runs once a year, somebody needs to know that it failed and can wake it back up and make the thing go. Because if they deleted the identity, they'd never be able to put it back, right? And I think that's the, that's the scenario that you have to make sure still works in these clouds when you start to sweep them of their, of their history.

Corey Quinn: Oh, that's the thing that I think people could benefit from a lot. Like, I don't think anyone's using this resource, so I'm going to delete it. It's like, how about instead? Instead, you wind up changing your security policy so nothing can, it can't do anything anymore and nothing can talk to it. Or if it's an EC2 instance, stop it, don't terminate it, because then when someone screams off in the background, oh, you can revert it very quickly and not, and not have led everyone to disaster.

Getting back to your owner approach, a lot of folks I talk to have, you know, from the old world of batch processing jobs, we're going to run the reporting or whatnot in the middle of the night. Whoever owns this is probably going to get pretty tired of getting woken up at 3am as things like that start, start getting, uh, getting found.

Some of those workloads incidentally are business critical, so they need to be woken up. It's not just improper alerting. But I feel like people don't have a great sense of humor about those things when everyone else in their time zone is asleep.

Sandy Bird: This is, again, one of the reasons in the world we looked at the most sensitive permissions.

Generally, the report that runs yearly is not going to create a new VPC for that, hopefully. Probably not going to create an access key on an AIM user and probably not going to assign something new administrator rights. Those are the things that it's not going to do, so it shouldn't bump into these scenarios.

Doesn't mean it shouldn't be least privileged. It doesn't, you know, there's lots of other things you should do to it. However, there is always this scenario where that thing does wake up and it, it hasn't been used in a year, and somebody did quarantine, and it's denied. Most times those can be re-kicked quite easily because honestly they often fail anyway, for other reasons.

So, once you approve it in the morning, you can kick the thing again. It will run again and it, it will make it through its daily process or whatever it needs to. But it shouldn't generally be one of those permissions that's super denied anyway. Um, it can happen, and if it does, you need somebody to own it.

But you're right. If it's the middle of the night, it's going to have to wait till morning. If it doesn't, you know, people can always choose to, you know, send those Slack messages through PagerDuty and wake somebody up in the middle of the night. They don't usually do that.

Corey Quinn: You'd like to hope not. The problem too is when, oh, that really needed to run before market open, and no one caught it until after market open.

And now there's a, there's a responsibility, uh, in some cases, not even actual damages, but rather regulators want to have conversations about it, etc, etc. Now, ideally, by the time you get to the point of being a regulated entity, you can staff multiple shifts around the clock, but having worked at a startup that was also a regulated entity, not always.

Sandy Bird: It said we did.

Corey Quinn: Yeah.

Sandy Bird: We were asleep when we were supposed to be working.

Corey Quinn: "Oops a doozy!" is one of those things that regulators have no sense of humor for either. And it's the problem, once a regulator slaps you across the wrist for something, you're not allowed to do whatever it is you were doing that got your wrist slapped again.

It's an adoption pattern problem.

Sandy Bird: I think, again, it's hard in the startup world, of course, because you don't have round the clock. But once you get to a certain size, you definitely can give somebody responsibility that's in another time zone, right? You've got somebody at least there that, you know, they can press the emergency button if they need to.

In a small startup, that's not always the case. If you're all sitting in North America and you don't have 24/7, it's, uh, you got to choose what are the most important things to get alerted on.

Corey Quinn: Yeah, I'm curious to figure out exactly how this adoption pattern starts to grow, because it's, this is increasingly a problem that is getting worse instead of better.

And I don't know, it obviously is not tenable as it is, or maybe it is. I mean, I could have had that exact same observation back in the late 80s. And now it's, well, things have gotten worse since then instead of better. Now what?

Sandy Bird: Look, I think they, there is a huge possibility in these cloud vendors to get things better.

I've always believed that when we started this company, you know, five years ago, more than five years ago now, we said cloud would be better. It was the first time ever that you had a real CMDB that was like real-time and runtime and you knew exactly which pieces of compute you had. You could find them, you could inventory them, you know, that never happened in a data center ever.

You know, there was always something hidden behind somebody's desk that you didn't know was there. And so, in cloud, we have that inventory. I think where the other thing is, theoretically, everything's audited. That's not completely true, but close to it. Everything is audited. And so you have this time, finally, where you can see everything.

And my, the mistake I made, in the early days, was I really believed that that allowed us to do perfection. Right? If we can find it, we can see everything that it does. We can build perfect least privilege. We can put that on there, and we can automate that, and everything will be perfect. And I don't think at the time I truly realized that while that sounds amazing in your head, it involves a scenario where in most processes, a developer must take that ticket.

They must go to Terraform file that's sitting in GitHub. I need to check it in. Somebody has to do a PR. I need to roll that thing out into dev. They need to test it again. Need to move it to whatever their process is, UAT or something. Run a bunch of automation tests, and then put it in prod. And that probably takes, in the best of companies, 30 minutes.

In the worst of companies, 30 days. If you take that times, think about how quickly you just roll up new identities in the cloud. Everybody has a thousand identities. Most people have 10,000. Numbers of large companies have hundreds of thousands of millions of identities. Multiply that by 30 minutes, there is no way. It is an impossible task to ask people to do that. And so, you know, you have to, I think, find other ways to do this. And so we can get better, but we now have that split between some stuff has to become centralized again. And then some stuff has to be given back to the developers to be part of the security process.

You know, we use shift left as the example. We want the developers to fix everything before it ever gets released. I don't know how in the world you ever actually do that with things like identity where you don't understand when it goes into this account, there's a bunch of other identities. Those identities have other trust relationships on them.

They have other permissions. There's lateral movement that can happen completely outside the control of the single identity that you're creating. It will never be perfect when you, when you roll it out there until you see it run in runtime. But it doesn't mean that developers should be using star permissions on everything that they do, right?

The standard, whatever, code linter should figure out that this what you're actually doing is really overprivileged. You should do better and make them do better for those types of things. You should have a resource statement on the get object of your S3 call because you know what bucket you're getting it out of, you know, somebody should go in and put that in the pulse.

And so, there is a balance. We need some stuff pushed and done better in this kind of, you know, developer model where they're doing the right thing before it gets released. But we can't let it run wild and say, well, we're just not going to control anything in runtime anymore.

Corey Quinn: I wish there were better answers for some of these things, but I'm not sure that there are.

Sandy Bird: It's a balancing act, right? It's always, it has always been a balancing act. But I think, you know, again, you go back to whatever the talk, that blog and the, whatever the "Dirty Dozen" permissions that you're never supposed to have, you know, just carte blanche across your whole cloud. And there's levels of that for everything.

And I think most companies now probably need to get to the point that the really sensitive stuff has some central control in it, because otherwise, you just get this sprawl of way too many issues that you can never ever fix.

Corey Quinn: Who knows how this is going to wind up playing out, but I wish that, I think that we are ideally heading on a positive trend line. I just don't know that it's going to happen at anything other than a glacial pace. Unless you see something I don't.

Sandy Bird: Look, I, for the identity side, you know, this is why we're here as a, as a product company. We build products to try to help solve these things for you, of course.

And the cloud permissions firewall, as you talked about earlier, certainly helps with those at least thousand most sensitive permissions. Finding owners for who owns them, getting the approvals done, allows you to roll it out slowly. I think that helps a lot. On the shift left side, there's tons of tooling out there now that at least tell you you're not doing a good job and maybe, you know, stop the, the, the check in process or stop the bill process, stop something and say, you've got to do better.

If we just do those two things, we'll be in a better shape, I think. Um, the glacial pace, I agree. I, I, I don't know how to make it go faster. You know, I don't even know how many customers AWS is up to now. It's an insane number though. Absolutely insane. And, you know, probably 1% of those is really good at identity.

It's, it's, you know, and they have seven people doing IAM in their, in their teams, right? It's not like there's, it's a, it's a scenario where the average customer does amazing identity in AWS just by default.

Corey Quinn: Yeah. It would be nice if it were, but oof, I don't think we're there yet.

I really want to thank you for taking the time to speak with me.

If, if people want to learn more, where's the best place for them to find you?

Sandy Bird: You know, sonraisecurity.com, S-O-N-R-A-I, now it's Sonrai. So, sonraisecurity.com means data in Irish as we, as we started at the first. You know, there's free trials there. Certainly anyone looking to try our platform, they can do that.

There's great demos there, and there's great educational material. So you can go read our blogs on how to write some great SCPs for your company, even without using our product and things like that. So great way to find out about us.

Corey Quinn: And we will, of course, put links to all of this in the show notes. Thank you so much for taking the time to speak with me.

I really appreciate it.

Sandy Bird: Thanks, corey.

Corey Quinn: Sandy Bird, co-founder and CTO at Sonrai Security. I'm Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five star review on your podcast platform of choice. If, whereas if you hated it, please leave a five star review on your podcast platform of choice, along with an angry, insulting comment, but make sure in that angry rant that you mention the story behind your username.