Screaming in the Cloud

This episode of Screaming in the Cloud focuses on keeping critical data safe and organized, especially when there's a lot of it. Pranava Adduri, the CEO of Bedrock Security, shares the tools and methods Bedrock uses to help other businesses protect their essential information. They discuss how new technologies like AI can help manage vast amounts of data and ensure only the right people can access it.

About Pranava:

Pranava has worked in data protection and security for more than a decade. Before becoming an Entrepreneur In Residence at Greylock Partners in 2020, he was a Software Development Manager for AWS, where he worked with Fortune 500 CISOs to develop innovative products for data risk and compliance. Before that, he was a founding engineer at Rubrik, a SaaS data protection platform. Pranava graduated magna cum laude from the University of California, Berkeley with a triple-major B.S. in Computer Science, Industrial Engineering and Operations Research, and Economics, then obtained an M.S. from Berkeley in Industrial Engineering and Operations Research.

Show highlights:

(00:00) - Introduction

(01:36) - Overview of Bedrock Security's solutions for large-scale data protection

(03:04) - The importance of data classification and access control was discussed

(04:47) - Exploring the limitations of current data governance

(05:22) - Pranava details how data is managed in cloud environments

(09:39) - Evolving strategies in data lake management and data volume growth

(12:36) - Impact of generative AI on data creation and the need for retention

(15:50) - Discussion on cost-effective data management solutions

(23:45) - The role of AI in enhancing data security measures at Bedrock

(25:42) - How customer feedback shapes Bedrock’s AI security technology

(27:19) - The growing necessity for sophisticated data security systems

(29:22) - Upcoming events and where to find more about Bedrock Security and Pranava

Links:

Bedrock Security: https://www.bedrock.security/
Bedrock Security X/Twitter: https://twitter.com/bedrocksec
Bedrock Security LinkedIn: https://www.linkedin.com/company/bedrocksec/
Pranava’s LinkedIn: https://www.linkedin.com/in/padduri/
Pranava’s Twitter: https://twitter.com/thenava?lang=en
Innovation Sandbox 2024: https://www.businesswire.com/news/home/20240402284910/en/Bedrock-Security-Named-RSA-Conference-2024-Innovation-Sandbox-Finalist

Sponsor
Panoptica Academy: https://panoptica.app/lastweekinaws

What is Screaming in the Cloud?

Screaming in the Cloud with Corey Quinn features conversations with domain experts in the world of Cloud Computing. Topics discussed include AWS, GCP, Azure, Oracle Cloud, and the "why" behind how businesses are coming to think about the Cloud.

Pranava: When you upload photos or you're, you're uploading content there, the max data that you're going to be generating might be in a couple, might be a couple of terabytes. Right? But if you're punching into the petabyte regime, a good chunk of that data is actually coming from machine generated content or services that are uploading content in predictable ways.

The way that content is laid out on the disk. can have structure that you can start uncovering.

Corey: Welcome to Screaming in the Cloud, I'm Corey Quinn. My guest today comes from a different Bedrock than the one that I'm normally ranting about here. Pranav Adhuri is the CEO and co founder of Bedrock Security. Thank you for joining me.

Pranava: Corey, for having me on the show. I'm excited for our chat today.

Corey: This episode's been sponsored by our friends at Panoptica, part of Cisco.

This is one of those real rarities where it's a security product that you can get started with for free, but also scale to enterprise grade. Take a look. In fact, if you sign up for an enterprise account, they'll even throw you one of the limited, heavily discounted AWS Skill Builder licenses they got, because believe it or not, unlike so many companies out there, they do understand AWS.

To learn more, please visit panoptica. app slash lastweekinaws. That's panoptica. app slash lastweekinaws. Let's start at the very beginning, because I feel that most people are not going to do deep research on who the guest is today, they're just going to listen to the show. What is Bedrock Security?

Pranava: Corey, Bedrock is a, um, is a solution to help organizations with, um, understanding what data they have at the largest scales.

And then once they identify and understand what data is most important, then giving them the tools to protect that data. So imagine being able to build a heat map of where across all your cloud environments your red hot data is. Whether that's regulated data, whether that's core IP, whether that's financially sensitive documents.

Being able to find it, and then being able to ring fence that data and ensuring that only the right people have access to it. It's not going into the wrong places. Right? So, uh, in a nutshell, that's what my drug does.

Corey: One of the things that we, uh, recommended for a long time, just from a cost perspective, is while you're setting up a tagging strategy, we've always recommended tagging things by data classification.

Uh, is this regulated pure private health information? Is this financial data? Is it something that is going to cause problems if this sees the rest of the world? It's giving people a way to reason about those things. It's not directly aligned with cost, but while you're doing a tagging exercise, go for it.

That always seemed like a very blunt tool as far as it was almost more of a binary of is this regulated or not is how customers generally tend to do that because it's, it's not like you can have something that's only half regulated, whereas with data classifications and who should have access to what, that, that turns into an entire access paradigm and I don't think tags are the right Method to go about approaching that.

Pranava: And to continue off of what you're saying, Corey, ultimately, at the end of the day, the line of business is the owner of what type of data are we dealing with. And so, when that data is being created, that is the optimal time for that context to be added, because it's the creator that knows the intention of that data.

Where we've seen the friction come into places, Typically, it's the governance teams that are talking about what type of data shall be designated as what type of classification, right? So let's say you go to a bank. They'll have a data classification matrix of what types of fields and combinations of fields would be considered restricted versus confidential.

And where the friction typically comes into place is that the line of business doesn't necessarily have that context. Right? And the line of business, as you and I both know, wants to get shit done. They want to be productive. As they're going through that work, it might be the case that those tags are not always attached.

People might forget to add the right tags. And so that's one part of the problem. If you, if you, if you are trying to get a, uh, a heat map to start with to do a security assessment, to do a risk assessment of is our data exposed or not. The red hot spots might not actually be showing up as red hot, the spots that are actually red might not actually be showing up as red hot because the tagging wasn't put in place.

That then complicates the second step, which is who should be having access to this data, right? And then there's being able to get a sense for it at any given point in time. Is that data overexposed? Do the wrong people have access to it? That then complicates that second step. So, um, the first is a precursor to the second.

Corey: Yeah, you have to understand what you have before you can start figuring out how access starts being broken down to that. Uh, I imagine you're focusing primarily on data that would live in object stores, given that that seems to be where scale comes from in data these days.

Pranava: Objects will be one of them.

Data warehouses and lakes is another massive source where that data gets moved to. Oftentimes, um, if it's a data lake, people are absolutely using object stores as a backing store for that. A lot of customers also use Snowflake, which ultimately is backed by object stores as well, but the interface is a little bit different.

Those are some key areas. Other areas where People tend to have a proliferation of sensitive content can be G drives and Office 365 as well, right? But where the massive scale comes in definitely is Snowflake, S3, Databricks, etc.

Corey: I am not a particularly well organized person when it comes to data. I find that invariably I'll have All manner of things strewn over in a few directories on my system.

No, very little rhyme or reason. A half dozen aborted attempts at trying to bring order to chaos. And nowhere is that more obvious than if I look at my personal accounts as three buckets. Where, yeah, you sort of wind up with all this gunk stuffed into them. And observability and visibility into what is going on.

lives inside of each bucket is not terrific. That is, that has always been the case. A storage lens that MNS3 came out with is a good start, but goes nowhere near far enough for answering questions like that. So it's, people always wonder what's there. It's big, it's expensive. Are they using it? Are they not?

And then you have the compounding problem of, of security issues around that or scenarios that aren't just my piddly use cases, like my wedding photos are not high security, believe it or not, but it was one of the, it's one of those areas though that it just becomes almost impossible to tackle and it only gets harder day by day.

Pranava: And, uh, what I'll add to that is for, for, uh, for the folks that we, for the customers that we work with, the complexity is the reason that they, that we start the conversation, right? For, for despite the complexity that might be happening at the business level in terms of all the different product lines that they might be working with.

be having that are generating data to the, to the, to the folks that are, um, on the security side, the GRC side that want to protect it, they want to start with a simple answer of, can you just tell me where the regulated data is? Or we're in the business of, um, it can be regulated data. It can also be core IP.

Right. If you're in a, if you're in a business line where you're dealing with the architectures, drawings, or you're dealing with synthetic biology sequences, all of these are critical formats of data to the business. And what these teams really want is a simple way of finding where that data is and being able to answer is, are there any active threats that we should be worried about?

Are there any compliance risks that we're carrying? You mentioned, for example, um, And just to break that down a little bit, there could be several reasons why people are looking for this type of data. The first one could be they've had an incident before. They've had an audit finding, for example, or a leak where certain types of sensitive data showed up in Snowflake when it shouldn't be, as an example.

That could be a reason why people want to say, we don't want that to happen again. We want to make sure that we have a good handle on where all these social security numbers are showing up. And we can find them. If they are showing up where they shouldn't, we can burn them down. That's one example. Cost is definitely another one as well.

We've had a lot of folks articulate the problem as We have a ton of data we're collecting. We're not even using most of it, right? What can we get rid of? And, um, and in the process, can we, can we save some money? So that, that definitely tends to be, um, one of the other, um, motivators there. Um, another big one is actually life cycle, right?

Uh, we, we speak to, uh, when we speak to the fortune 500, um, The CIOs, the folks in charge of data strategy, and also on the governance side, they're looking at the consumption volumes of data, they're looking at their S3 bill, and their question is, okay, some of this we have to do, right, because we're legally required to keep it for a certain period of time.

But afterwards, it actually becomes a liability. So there's actually a governance reason for also finding that data and decommissioning it. I was actually just speaking to a healthcare executive yesterday, and for them, they're actually building models off of some of this healthcare data. It's actually a liability for them.

to, um, to not purge that data after some time from those models, right? So there's, um, in addition to cost, another reason for purging the data is, is, uh, is legal liability reasons as well because of, uh, in his case, the FDA didn't want them doing a longitudinal training on that data. So there can be a number of reasons why people are trying to find this data and corral it.

Corey: I am curious. When I look at my own data sets, I, I find that my pictures are taking up more space as the years go on, not necessarily because I'm taking more though, let's be honest, having a High quality camera in your cell phone that's always with you is convenient for that, but because the file sizes are increasing and the, and as time goes on.

And, Obviously, as time goes on, there are more of whatever it is that get accumulated. I'm curious what you've seen as far as, I guess, not just the volume of data over time, but the rate of change of that volume as it grows with time. What are you seeing in that sense?

Pranava: Some of the interesting trends that we are seeing right now, um, in a lot of S3 buckets, especially with lakes at all, there's massive data lakes that we're starting to see, right?

So we see huge, uh, environments where there's a lot of, uh, a lot of Parquet files, a lot of, uh, different, um, Delta like formats, uh, or some of the newer formats like Asperg and Hudi as well. So we see a lot of, um, lake like formats showing up. Um, depending on the type of environment, we can also see, for example, if If you're, if you're working at a financial services company or a bank, you might see a lot of images of checks showing up, right?

Or you might see a lot of forms that, that they're, that they're creating show up as well. Um, in terms of, in terms of the types of changes that we're seeing, um, there's a lot of, there's a, on the data lake side, there's a lot of unique, uh, unique, um, types of data that people are introducing into their data lake strategy.

So one rate of change that we're seeing is there. The other, um, area where changes come in a lot, and this is where, uh, from a data security policy, people, people face the challenges, product teams are also innovating very quickly. So let's assume, like, you know, let's assume you're a fast growing HR company.

You're going to be generating lots of data that, that is going to be important. You might have offer letters as part of the platform. You might have W2s as part of the platform. And then let's say that HR platform also decides to do cap table management, and they're doing 83b forms. As they're introducing these new types of data and they're storing it in S3, for the data security teams and governance teams, keeping up with that type of data, it becomes a game of whack a mole, right?

They have to be in perfect sync with the product teams. And if there's any gaps, that's how you can start having data run away from you, in terms of how that data is being changed. And I think the last example of change I'd give is this is a little bit more forward looking, but even now we're seeing folks that are operationalizing internal rags to quickly search over their enterprise data and use it, for example, to create a report, right?

Now you have all of this different diverse sets of data going into these models. There's an answer being, uh, generated, and that's going back into documents that are then being persisted yet again, right? So these are all the different ways that we're seeing new types of data getting created, uh, and, and some of the active areas where those changes are coming from.

Corey: Few things are better for your career and your company than achieving more expertise in the cloud. Security improves, compensation goes up, employee retention skyrockets. Panoptica, a cloud security platform from Cisco, has created an academy of free courses just for you. Head on over to academy. panoptica.

app to get started. It feels like there's definitely an increased focus being driven by the generative AI boom. on data, not, not just necessarily the, the security aspects and IP ownership pieces of it, but it in some levels needs to be generating a lot of data. And I have to wonder if that's having an impact as well.

Uh, I can imagine a scenario where some of these ridiculous emails that I get that are clearly AI written, uh, from someone winds up smacking into me and Okay, is this because, is this something where they need to retain all of the things that it generates and sends out to people for liability reasons, for troubleshooting purposes and the rest, because it's, we now have an ability to create data from nothing that is not able to be created again in exactly the same way, given the non deterministic nature of these things, so you've got to retain it rather than just, here's the prompt that spit it out.

I'm wondering if that's going to cause an acceleration of this. I mean, you're obviously a lot closer to this problem than I am. How do you see it?

Pranava: Yeah, I think that, uh, uh, especially in regulated verticals, if you're operationalizing, um, some of these models in the healthcare domain, or especially in financial services, uh, let's say someone's, um, uh, using some of these models to give client advice, you will definitely need to retain some of those outputs, because to your point, um, these can be non deterministic systems.

That's actually a great point that you just brought up, which is, Not only is there a diversity in data that's being created by these models as they're consuming more data and there's a data echo chamber effect going on, but also there's definitely from a from a legal perspective and from a regulatory perspective that those responses probably have to be kept as part of an audit trail as well in order to make sure that you have a you have a consistent reporting.

You have, um, uh, accountability into what, what, what all the organization has been communicating potentially with, uh, partners and customers as well. So it's kind of like a double army in terms of, uh, you're generating more data. You also have to keep, uh, keep a record of that data for, for some retention period as well.

Corey: It becomes a, yeah, it's, it's one of those areas where when you start having creative type content being created programmatically. That opens up interesting doors for things that used to be highly human intensive now potentially might not be. And how well that's going to work out is, of course, still a very open question.

But I think it does lead down a path of what, what data is this exactly? And it also leads to the fun thing of, the fun problem I would have to imagine of, you know, Things that look like highly sensitive data, but are in fact complete nonsense. I mean, there's nothing stopping me from asking ChatGippity to generate a whole bunch of what appear to be people's social security numbers and, and put that in a, uh, and put that in a, an Excel file, even though the people don't exist.

It turns out there's a bounded problem space for those, for numbers. Who knew? And that's one of those areas where I can, I can expect that could be a That could wind up causing challenges from a regulatory perspective. I mean, I don't know as much about your platform as I might like, but I can equate it to, for example, Amazon Macy, which at one point sounded like one of the most hilariously mispriced services on the planet.

5 a gigabyte for ingest. They gave it, I think, an 80 percent cut, which still only makes it ludicrously expensive instead of impossibly expensive. But that's really how I tend to think about this at the moment. Obviously, this is not new information for you. What's the difference between what you're doing and what Macy does, other than, you know, bankrupting people?

Pranava: Just to put that number into perspective, Corey, even at an 80 percent haircut, a dollar a gigabyte means that if you're scanning, um, a terabyte, that's a thousand dollars per terabyte. And if you're scanning a petabyte, that's a million dollars per scan. And when we talk about the Fortune 500s, these folks routinely punch above a petabyte into the multiple tens, if not more.

So, the challenge of knowing what data you have at these scales, for most people, is a non starter, and has been a non starter. And that is one of, that is actually one of the Inspirations for starting Bedlam when we, when we built the company to answer where some of the differentiators come in, there's, there's two bits to consider the way we think about it.

The first is the growth in data. So when you're, when you're in the petabyte regime, you need a different approach and I'll cover that. And then there's also the changing nature of the data as well. Dealing with false positives. For example, you mentioned a case of, uh, generating a hundred fake SSNs and then sticking them in an Excel file.

And Macy will have a field day telling you that you have all sorts of risks when those aren't even real. Right.

Corey: This is why grep doesn't work for anything that resembles, what is it, a 10 digit number? Yeah, sorry, nine digit number. My apologies. How does math even work? Off by one error. They're always going to get me.

Please continue.

Pranava: Yeah. Uh, and, and we can add some more compounding factors to that as well, right? Those nine digit numbers sometimes may be separated by dashes and sometimes may not be. And those nine digit numbers might also look like bank account numbers in certain cases as well. So, it's trying to learn just based on the data And taking a rule based approach to that can lead to a lot of false positives.

And that's one of the problems in the industry. There's the flip side of that as well, which is the false negatives, right? You can spend a lot of time building rules to match your data exactly. And then when something new comes along, like a product team introduces 83b forms, just as an example, that might slip right by and you might miss it because your kink rules didn't catch it.

So you have a problem of growing data. And as data is changing, you have the problem of false positives and false negatives. So, All of those are challenges to solve if you're gonna if you're gonna meet the market where it needs to with the solutions that it needs at this point. And so to address those, on the scale side, dealing with petabytes of data, The first thing is, at that scale, to avoid maybe some of the financial downsides of solutions like you mentioned, you can't look at everything, right?

That's just a non starter. So you do have to sample. Where that tends to have issues is, if you sample indiscriminately, like the traditional way of sampling is, let's go look at every thousandth file, because we can't look at everything, or let's go look at every ten thousandth file, and in that way we'll go, you know, Exhaust the entire object store.

You can miss important things, right? You can skip right over the file that's actually most important, or you can skip right over an entire group of files that are important. And so the approach that we take, instead of sampling directly, we think about what we call adoptive sampling. And the idea is, Take this bucket, this petabyte scale bucket, and carve it up into partitions, because there might be a portion of that bucket that's being used for data lakes.

There might be a portion of that bucket that's being used for check images. So if you can carve up that bucket into partitions, then you can actually go and look inside each of these partitions and then be more, more focused in terms of, within this partition, I know everything is similar data. I can focus now on sampling a couple points in there, and I can get a much better representation of what I'm looking at.

Now, this approach really starts paying dividends at petabyte scale, because Um, I was coming back to your example of how you're using your own personal buckets. When you upload photos or you're, you're uploading content there, the max data that you're going to be generating might be in, in a couple, might be a couple of terabytes, right?

But if you're punching into the petabyte regime, a good chunk of that data is actually coming from machine generated content or services that are uploading content in predictable ways, right? And so the way that content is laid out on the disk can have structure that you can start uncovering. Right? The data itself might be unstructured.

It might be an image or something else. But the way it's laid out will have structure. And if you can start understanding that structure, now you can do a much better job of doing a sampling that doesn't miss important data. So that's the first bit. The second bit that we're also combining on top of that is Our entire architecture is serverless.

So think about a bunch of Lambda workers blasting across your environment. All of them are going and running these experiments of what are these partitions, right? And once you get those partitions, then you can start sampling inside there. And what that does is it really reduces the set of things that you miss.

It reduces the false positive rate and the false negative rate. That's the first part. The second part is learning the data. Uh, and um, and accounting for the fact that a social security number might look like a bank account. Might look like, and could also be a fake number as well. And there, when we, when we, when we saw that problem, what we asked ourselves is, If you or I, Corey, looked at this file of fake social security numbers, how would we go about trying to, um, trying to figure out what to do with it?

Well, we'd start poking around the context, right? Okay. Well, where did the, where did these social security numbers come from? Who created it? Right. And so it's not just the, the numbers themselves. It's the context around those numbers that can also tell us a picture of, okay, well, this is sitting in a test of environment.

The file name is named in a certain way that indicates that it was, it's a temporary file. Um, and so by factoring in, um, not just the data itself, the environment, who has access to it, how it was created. You're mimicking the process that a human would go through to figure out was this fake data, was it not, was it not fake data, right?

Is it actually social security numbers or is it account numbers? Factoring in that context helps us with the false positives. Um, with the false negatives, which is just missing, missing things altogether. The traditional rule based approach actually doesn't work, right? So if you're trying to use regular expressions to try and find, um, try and find data that's most important, you can, you can quickly end up playing a game of whack a mole because you might have built regexes for W2 forms, you might have built regexes for 1099s, but the moment a new form shows up, it might completely miss those patterns.

And so this is why the time to value is so high for a lot of the tools out there right now. People are playing this whack a mole game with these regular expressions, trying to, uh, uh, get data. Chase the changing data. But when new data comes up, unless you have an ability to look at that new data and say, even though I haven't seen this before, the contents make it very similar to employee data, which I'm trying to protect, and I do have a policy for.

So even though I haven't seen it before, it should still be covered by that policy. That requires a different approach where it's not rule based, but it's actually looking at similarities of content. AI, to actually look at the document, understand how similar is it to other documents that we have seen and know the sensitivity about, and then we can make an inference saying even though we haven't seen this before, it still probably shouldn't be leaving the US, as an example.

So that's addressing the scale and also the change, the change of the data.

Corey: You mentioned on your site that a lot of this is AI powered, which is, which is awesome. That, that has, that has potential to it. The challenge that I, that I get when I try to wrap my head around things like that has been like the worst employees or candidates you'll ever meet, you wind up with, if it doesn't know the answer, it will cheerfully make things up.

I've, I've, I don't have a good experience working with engineers who bluff when they don't know the answer, that that's how things work. end in disarray. Uh, very often it, it, it, it, uh, especially in the generative AI space, it's not actually reasoning, it is effectively a predictive model around what words are likelier to go in a certain place, like, like a Markov chain on steroids, so to speak.

Uh, how are you, how are you avoiding those problems where it very confidently attests that something is something other than what it is?

Pranava: In that particular case, where we are using, um, where we're using some of the more recent advances in AI and ML, it's not specifically to generate the next token. We're actually using a portion of that, which is the vectorization, the translation of a document to an embedding.

And so coming back to that example of a new form showed up, and we haven't seen this form before, and we're trying to figure out should it, should someone be allowed to share it with someone else or not. Right. The task is less about the generation aspect of it and more about trying to understand, is this document similar to other documents where we do know the classification of it, or where we do know they're bound by a certain policy.

So the vectorization is one part of the generative AI workload. We're using just the vectorization part. So in terms of, for example, generating a false answer or like generating false tokens, we're not We're not using that part of the generative AI stack. It's, it's just the, uh, the, the vectorization and, uh, vectorization and creating embeddings.

Corey: It's one of those problems, it's one of those problems that I'm seeing that's just fundamentally hard to wrap your head around in a bunch of different ways. It's a It's one of those areas where people have their consumer side experience with things like Chadchipity. And then they're trying to match that into, here's what this is doing at an enterprise scale with data that's actually meaningfully important.

Um, do you find in your customer base when you're talking to prospective customers that they have trepidation around this sort of thing? Or is it something that once they start to see how it actually functions under the hood, that those fears recede? Or do they not come up at all, which is always an option, I suppose, but seems unlikely when it's good.

It's corporate data they care about sensitivity of. Well, what if the computer is wrong? It is a logical question.

Pranava: Absolutely. And, and so, and this, this really brings it back to, I'll tie it back to something that you mentioned earlier, which is how do you make sense of this data in a manner that's simple, right?

How do you do this in a manner where you're not, uh, drowning in all, uh, like every single new thing, uh, data type that How do you prioritize? And so to, to answer your question then, Corey, When we go through an environment, um, and one of the first things that we will do is we'll start with what is important to the business, right?

We'll actually intake that as part of the customer in terms of them articulating what, what they value the most. We'll actually codify that into the system. And when we go and do the discovery, we'll come back and say, here are the top five things that you need to worry about. Uh, did we get this right? In the platform, The user can actually give feedback saying, this is not correct, this is actually something else, or this is actually not important for x, y, and z reason, right?

We can actually use that feedback to train the model to actually update and avoid those types of mistakes going forward. So in that sense, you can really think of Bedrock, so with Bedrock, the technology that we have, uh, we call the technology AI, right? AI reasoning. And the idea of the reasoning is, as if a human would, about uh, based on the context available, is this, is this actually an SSL?

Right? Or your reasoning based on the other things that are around you. Should this file be protected the same way its relatives are? And so, once a user calibrates and says, uh, this is important, it's not important, AIR can actually iterate based on that and not make that same mistake again. So, maybe to put it a different way, um, think about AIR as something that is learning globally across your environment, uh, and, and is taking context from people as they're tuning it.

And that way, It rapidly converges on what's most important.

Corey: It makes sense. It's one of those areas that is, it's only getting harder to solve these problems as days go by. And I think that it's one of those increasingly sensitive questions that when the business asks, people responsible for data security have better have a pretty good answer on this.

It is clear that this is not going to work via humans doing this, that the rate of change alone suggests that data is going to increase faster than humans can sort it. The time is, if not here yet, rapidly coming, where something like what you're building is no longer going to be optional for most folks.

It's going to be a hard requirement.

Pranava: And that's very well said, Corey. In terms of being able to At the end of the day, the line of business is going to be the unit that has the ultimate context on what is this data and how does it line up with data classification policy. What's the intended use of this data?

But to your point, those humans aren't going to scale, right? And so the, the, the And the entire reason why people were trying to use, um, some of the tools that you'd mentioned earlier and trying to use Ritex as well, they're trying to scale humans already, right? It's just that the technologies that they're using right now don't necessarily account for that.

They're not able to keep pace with the change of data, right? So that's the problem. And so, uh, Our philosophy on this is you want to be working with the line of business. You want to be able to learn from them as they're giving feedback. And then you want to be able to, um, you want to be able to take what, take those heuristics.

You want to take that direction and then really build the right automations that reduce the false positives and negatives as you're able to go, go to these many petabytes of data and keep up with it. So absolutely right. We want, um, humans have the context. We want to help people codify that into the platform.

Using plain English, learn, and then go help people identify that data consistently.

Corey: It sounds like it's definitely the sort of thing that is going to be of significant note. I mean, you're, you're doing a fair bit of work. Okay, we're getting ready for RSA later this year in San Francisco. That'll be fun.

For people who are not necessarily going to be there themselves, where's the best place for them to find you?

Pranava: One, if you are going to be there, we will be in the Innovation Sandbox. Uh, we'll be pitching for one of the top 10 slots there. So, um, super excited for that opportunity. So if you are there, come give us a holler.

Uh, if you are not at RSA, We will be doing a number of events around RSA. So, uh, even if you won't be at the event directly, but you'll be in San Francisco, also drop us a holler. Uh, and if not, we will be doing a series of webinars around that time as well. So please make sure to check us out online as well.

We'll be, we'll be featuring a lot of the, uh, we'll be doing deep doves on a lot of the topics that we covered today, Corey.

Corey: Wonderful. I look forward to hearing how it goes. We'll of course put a link to that in the show notes as well. Thank you so much for taking the time to speak with me today.

Pranava: Likewise, Corey, as you so nicely articulated, this is a Growing problem.

I think we're right on the precipice of another inflection point in terms of amount of data and the growth in data and the change in data, so we're super excited about with folks as they come to as they as they deal with that and Looking forward to doing this again soon.

Corey: As do I. From Nava Aduri Co founder and CEO of Bedrock Security.

I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a 5 star review on your podcast platform of choice. Whereas if you hated this podcast, please leave a 5 star review on your podcast platform of choice, along with an angry, insulting comment that someday I'll probably send a computer to read, because I won't.

More episodes

Chapters

What is Screaming in the Cloud?