Dive deep into AI's accelerating role in securing cloud environments to protect applications and data. In each episode, we showcase its potential to transform our approach to security in the face of an increasingly complex threat landscape. Tune in as we illuminate the complexities at the intersection of AI and security, a space where innovation meets continuous vigilance.
John Richards:
Welcome to Cyber Sentries from CyberProof on TruStory FM. I'm your host, John Richard. Here, we explore the transformative potential of AI for cloud security. This episode is brought to you by CyberProof, a leading managed security services provider. Learn more at cyberproof.com.
In this episode, I'm joined by Mohit Tiwari, who is co-founder and CEO at Symmetry Systems, and an associate professor at UT Austin. Mohit shares how security ideas move from research to prototypes for early adopters, and eventually become the new standard for security. We also discuss the importance of generative AI governance and where organizations should be looking for the next big breakthrough in AI security.
Hello, everyone. Thank you for listening in here today to Cyber Sentries. Today we're joined by Mohit Tiwari, the co-founder and CEO at Symmetry Systems and faculty at UT Austin. Thank you so much for joining us.
Mohit Tiwari:
Thanks for having me, John. Really appreciate it.
John Richards:
Well, I see here just from your intro, it's obvious you are a person of two roles here. I'd love to hear about your beginning in the industry and how you ended up faculty, but also co-founding company. I mean, you're obviously a busy person, but how do you end up doing both of these things?
Mohit Tiwari:
I think the story is pretty simple. Me and my co-founders, we are all a research team at UT Austin working on essentially the problems around privacy and data security, and how we can make that part of the infrastructure as opposed to bespoke done for every application. And the second half was you can protect all you want, but at some point you have shipped things you have to detect and respond as well, so applying different ML techniques in conjunction with the protect work that we were doing to see how we can build a both proactive security framework, but also detection and response models. Those are the two teams that we worked on pretty heavily.
Around 2020, by then we had already worked with folks at the NSA, Lockheed, General Dynamics, etc., and then with hospitals and cloud providers as well. So we had a pretty good sense of if we take this idea of data flows, but data and identities, how is data flowing through identities? If we take this thing and operationalize it, it could be a big departure and a welcome departure from a very asset-centric world where you're chasing applications and code and infra, and networks, endpoints, etc. So we had a lot of conviction. Then we met with Forgepoint Capital and Prefix Capital, in the start of the company flagged it off early 2020 and it's been great going since then. Very happy to update you on how things went, but that's been the core focus over the last few years.
John Richards:
Wow, I love that story. So you talked about that you saw success in real business. Can you share a little bit about some of these bespoke applications where people were trying to solve it and you were like, "Hey, there's actually a way to systematize this" and kind of how you brought that out?
Mohit Tiwari:
That's a great question. I can take maybe three examples just to show. So one example is kind of starting really high end. You think of organizations like Lockheed or General Dynamics. They look like really pro-grade IP, intellectual property, heavy technical arcs because they're building all sorts of things and they have nation-state attackers.
So for them, the problem definition is like, "Hey, we are building applications on Kubernetes, on OpenShift, etc." And the goal is lots of different types of data goes through these applications. Now, how can we put it into the framework itself of the service mesh, so to speak, like the control plane and the data plane that when IP data enters, it's really heavily protected versus other things that are not as much? So the traditional way to do this would've been bespoke, "Hey, I'm deploying 20 apps on this OpenShift cluster. Each app I have to do the secure software development lifecycle and make sure everyone is protected. If this app breaks, the data shouldn't break." So you have to do a lot of apps, but security work, you have to do a lot of infrastructure security work, things like this. But you have to do it for each app because the data doesn't... Even between us, the data goes through email, calendar, Slack, whatever, right?
John Richards:
Right.
Mohit Tiwari:
So data's flowing across. So that was one use case where, "Can we put these data flow controls? Wherever data flows, it's protected as part of the service mesh." So that's really high-end use case. In the middle, you can think of someone like cloud providers. So this was one of the cloud providers we were working with. They were offering MongoDB as a service to their customers and to their internal product teams, but MongoDB had a rash of ransomware way back in the 2018, '19 timeframe, and it's basically every team was responsible for shared responsibility model, protecting their own MongoDB instance and setting up all the controls correctly.
But we realized that across multiple applications, data is flowing through, and you could sort of offer this security layer as a platform service as opposed to everyone doing it for themselves. So all the RBAC, ABAC, and data flow controls detection response. If a privileged user who has permissions gets compromised and there's ransomware in your environment, you still need detection seat belts for that. So you can sort of apply both the proactive and detection seat belts to MongoDB and give it to service to everyone.
So that was our aha moment where we're like, it doesn't need to be a full orchestration framework like in the first example, it can be plumbed into the data layer, and that gets you ease of plumbing because now you don't have to involve the full platform team or the app team, but it still gets you pretty large fraction of the protection as if you protected everything. Because most data, even when apps talk to apps, they'll send an API message, but the message is, "Hey, pick up this blob from S3 or from some block store or something like this." So you can see a lot of data flows by plumbing just into the data layer. So that was another example.
John Richards:
What kind of objections do you get from folks the first time when they're trying to figure out if they should switch from this individual model, which obviously has major flaws? What do you push back against to get people into this new mindset?
Mohit Tiwari:
So there are two answers, I guess. One is sort of circa 2018, '19, '20 when we were really pushing. At that time, this is a long-term field in research, but if you look around in the industry, you see CrowdStrike Endpoint Security, Palo Alto Cisco network. It's very infrastructure-specific. So even though what I say makes sense, it feels awkward coming from like, "Hey, why is this guy telling me that this makes all the sense in the world, but why is the world pointed this other way?" So this inertia or this model of like, "Hmm, let me wait until more people say-"
John Richards:
"We've always done it this way. Why would we do something else?"
Mohit Tiwari:
Yeah. I mean, I don't blame them. It's like a big inertia, and I think if you just kind of fast-forward step by step over the last four or five years, we were able to convince the analysts in the first couple of years, we were able to convince early adopters in their adoption curve, early adopters, and they renewed and the word started spreading, and then other VCs started investing. We've had three waves of companies come through and they're on the third wave now. And in parallel, AI took off. So it's becoming even more clear now that really that infrastructure and assets should basically disappear. They kind of matter less and less over time. When you're talking to Siri, I don't even know what's going on in that endpoint, right?
John Richards:
Yeah, no idea.
Mohit Tiwari:
So I think now this is probably one of the, if not one of the hottest fields in security, and lots of enterprises have budgets, and so these days we don't get objections to the field. Right now it's much a matter of, "Oh, why Symmetry and why not someone else?" So it's a matter what to be in.
John Richards:
What are the key metrics that you like to point to that show why this matters and is the right choice for folks?
Mohit Tiwari:
Oh, that's really a great question. Early stages of the field, I tend to lean towards outcomes, and you can define outcomes per month or outcomes per quarter as a metric almost from, "Here's a set of outcomes." So I can give you an example. When you have good visibility into data that you have and identities and how it's being used by entities, one of the best things you can do is like, "Hey, here's a whole bunch of data that belongs to identities that have been off-boarded. I don't even need to be sitting on this or at least not paying a lot of money for sitting on this." So you can archive them, reduces risk, reduces costs. So we call this the dormant data or data minimization kind of product. That really gets a lot big Philip, so you can kind of put a metric around, "How much data did we minimize?"
Then you can say another category of outcomes are a lot of compliance frameworks today, for a good reason, for good security reasons, security standard, they ask if you have a great data inventory. Do you know what data you have and who has access to it? This is fundamental, and it powers incident response like, "Hey, Okta scenario, some contractor had a breach. What customer data is impacted?" If you had a great inventory of data and access, you would be able to tell right away. Or if you look at the ubiquity breach. It's not as you're full counting all the pieces, but broadly every scenario, Log4Shell, Log4J, etc. If a problem happens to an identity, what data might be affected? This is probably the longest step in incident response. You have piecemeal constructing essentially the product, but through logs and pointillistic pieces of information that you have here and there. So if you make that directly available as a product, now you can just query your product and it'll tell you, "This identity can get here and this data did get sent out."
John Richards:
Trying to do that after the fact is a horrific thing, it takes forever. Whereas if you knew that upfront, you can instantly begin reacting and know what could have been compromised, so you're really lowering that footprint, I guess.
Mohit Tiwari:
Exactly, exactly. So that's on the incident response. And proactively also, you could be driving down the blast radius, driving down costs, getting ahead of if you were about to deploy AI co-pilots or agents in your environment, these are automated entities, but acting on behalf of identities, and you can drive down the essentially data risk that you might be creating by launching a bunch of these.
John Richards:
Yeah, that's one of the things I'd love to dig into a little bit more of. So, obviously, with the rise of AI, the importance, but even the understanding of how identity is going to function is changing as we have something working on our behalf, and what can it do? So how has that affected the work that you're doing as you try to figure out AI and what kind of controls or things should be in place? Yeah, what's that look like?
Mohit Tiwari:
I think that's a great question because I think one fundamental thing that's really useful to understand, at least for my mental model, is when enterprises use AI, there are at least three different bins of AI usage that have different personas involved, and hence different types of AI governance controls that might be needed. So one is sort of your corporate environment where you have Microsoft or Salesforce, etc., and you're using those co-pilots and agents. A second one on the other side is data lake teams. So you have Snowflake, Databricks, etc. Lots of data sets that you have collected from everywhere you're running, again, co-pilots or agents, etc., in your own one. And the third one is what I would call essentially a product environment. So this is classic transactional workloads with blob stores as a backend and caches, and these are folks who would typically deal with customer data in production dev staging.
And so these are three different personas and they use very different tools, and it's instructive to see that AI governance techniques that are almost mandatory in a data lake environment. What's your training data look like? Is it clean and balanced, etc., bias-free? What does your trained test split look like? How often is your model... You are checking for data distribution drift and updating the model. All of these things that we use, we do none of this in a corporate environment. In a corporate environment, here's my entire SharePoint, that is your test environment that the training folks, the Microsoft Copilot let's say, they have never seen your environment. Now you drop it in and now it's just used directly. There's a rag, I mean it builds an index and now you use it, but are you testing what kind of controls are in place? What information flow guarantees can the interns see, the board documents?
All of these things that are expressed in a Salesforce here, in a ServiceNow, they're in Atlassian, Jira. These all have to be translated into your AI tooling. You have to now educate your Microsoft Copilot or your custom internal Llama, whatever it is. All these authorization policies, and that's basically borderline non-existent. They all just get inherited and we all know how bad SharePoints and all these things look today, right? In companies.
John Richards:
Yeah.
Mohit Tiwari:
So I would say these are the three bins and within here you can sort of knock off what are the goals for AI governance. And they differ because the product teams, they care a lot about consumer-facing regulations and security. Who was it? Someone had a breach where some external inputs coming in, they were able to affect other users data or other users query. Data lake, I think they really cared about replying to regulations like, "Hey, did I build a bias-free model?" And all this stuff. And for corporate, you really worry about both corporate security, like maybe board documents shouldn't be org-wide kind of things, and you worry about the insider threats. And there's a whole range of questions that you are concerned about.
John Richards:
Is that largely choosing the right policies or is it about properly... I don't know if it's tangy or how do you go about in these environments and ensuring that happens? Is that something where symmetry steps in or are you providing an interface for folks to be able to properly label that? And how does that tie in? I know you had mentioned before about data. All of this is data at the end of the day. How do you tie these classifications together?
Mohit Tiwari:
I think that's a good point. So, if the policy sucks, the game is up. No amount of mechanisms will protect us. So I think that layer receives incredibly low amounts of attention today. You can see that this is a serious problem and a long-term problem. You can see Microsoft has invested heavily in Purview. They call Purview the product of the last Ignite Conference kind of thing. And they have this goal that, "Hey, we want to be the governance layer." We of course, believe that the policy language at minimum should be an open interoperable language, and that shouldn't be buried inside some proprietary product.
John Richards:
That makes sense. With all the different agentic models and needing different things to speak together, you're going to need a policy that can work across.
Mohit Tiwari:
That's exactly right. And it prevents lock-in, it leaves controls with the organizations, the customer organizations, as opposed to some vendors who may or may not have different agenda at different points in time, etc. So I think that the policy layer should be open and interoperable, built on ideally an open language. And then from there you want to synthesize controls down to, "Hey, for Microsoft Copilot, it's useful to talk the Purview as a language. Purview labels, tags, tool, all those policies. So that way you can control the Microsoft Stack O 365 with those.
Now, if you're using Databricks, they have Unity Catalog, and so you can use that for that. Snowflake has their own language. So my mental model is ideally if there's something that looks like PyTorch, for example, there's tooling and frameworks to specify policies, maybe a programming language format there, and then these are all backends that you compile down to synthesize controls down to. So anything from old school Oracle databases to Delta Lake, Databricks, etc., you can treat them as backends, even corporate environments.
John Richards:
And then tying that together, then you're saying any AI that's interacting can interface with that and understand the security ramifications of any requests it's making, so you don't have to worry about board reports, as you mentioned, showing up in a request from a developer trying to find some other report, something like that?
Mohit Tiwari:
That's exactly right. That's exactly right. Exactly. And if this is interoperable, you can use a different model for code and a different one for corporate, PDF, whatever, PowerPoint-type work, etc. And they're all following, because the policies really, I mean, we've done a lot of work, we meaning Symmetry, we've done a lot of work around how do you translate questions that NIST CSF frameworks are asking, or HIPAA is asking? Or just different standards. They're asking in plain language like, "Hey, show me you have segregation of duties in place," meaning you have put proactive controls, you have detection controls, detection response things, and you have audit logs. This general set of tools for third-party access, for privileged access, for business purpose controls. All of these things, and you can translate them into exactly these information flow policies.
Now, if you have that, now that's kind of the home that you can use to synthesize controls down to. So that's why I feel like this is such an exciting space because between data identity and AI sort of tying all this together, it's raising the level of abstraction closer to what we want when we want governance, security, compliance, and it's abstracting out endpoint, network, cloud infra, etc., tools. They should just do as they're told. They're just floating blobs essentially. So that's the most exciting thing.
John Richards:
I love it. So how are you using AI yourselves in this process? What are the ways that you've been able to use it to better put together the governance or monitor the security across these tools?
Mohit Tiwari:
I mean, that's a great question. I think the overarching theme is we use AI tools or ML or even boarding regular expressions, and we have sort of a best-tool-for-the-job approach. So one of the core things that we do is we, as a vendor, we don't want to own our customer's data. We don't even want to ideally see it. So we want to live in the same world that we want to help create, meaning we ship functionality to our customers and our product is separate from our company. So in that model, I think it's more sensible to create more distilled, smaller language models that are more purpose-built, like let's tune them to detect healthcare data better, or let's tune them to detect genomic or financial or corporate things of record better. So that's one big approach. How can we create smaller distilled models that we ship on-prem inside the customer's environment, versus pull the customer's data to me and I'm the big Azure-based backed by open AI or something? So that's a structural difference in how we are approaching this.
This also means that we can deploy globally and we don't have to pull data out of their environments or data sovereignty, etc. There's a lot of good side effects of this. The second thing that we are doing is we're treating AI tools as a way to augment every attribute. So essentially our product is building a data flow graph, like a data identity and how is data flowing across identities? So every attribute like, "Did this identity read this data?" It's only something that can be probabilistically answered in many cases. Is this identity something or is this role or group reachable by a third-party identity all the way from active directory to cloud to data store? So all these attributes, we use AI to help enrich. So that's one more part of how do you make a better data flow graph using AI tools?
And then the third much more studied area is using it for detection response. So there's this classic MITRE ATT&CK framework that says attackers have to do recon and then move around lateral movement, then exfiltrate or exploit. It's like a whole life cycle for these attacks, so how can we train abstractly on the life cycle, but then translate these into detectors that trace attackers through long-term paths? So we do purple team engagements where we throw attackers at it to test this stuff. So that's one side that's all in the product and available today. We also have a pretty healthy research agenda around this problem. So a really good question is, "Sure, I have this hairball mess of a permissions graph. How should I incrementally edit it so that I drive down the blast radius most effectively?"
John Richards:
Yeah, that's huge. So many folks don't know where to start. Your first time, you get a report back and it's overwhelming and you say, "I don't know what to do. How do I not make this worse while I'm trying to fix it?"
Mohit Tiwari:
That's exactly right. Especially when you account for transitive permissions, we see this all the time like, "Hey, I want to remove Alice from this financial data lake." You go, keep removing, but still, Alice has a different path to it because you can go assume a role, be part of a group, and transitively go from Alice to the status of too many paths." So even disentangling this in a way that you're actually creating islands with moats and drawbridges is a pretty non-trivial problem.
John Richards:
How does the AI then help solve that? Is it mapping out all those different paths?
Mohit Tiwari:
That's exactly right. So it's modeling this as a constraint solving problem and trying to solve it. Essentially, it's impossible to solve it perfectly, but you can make better and better search space exploration, so that's where the AI tools are really helpful. And when we use AI, we are not just saying language models because analyzing logs, language models are great, analyzing files, unstructured files to classify data into tags and such, great. But looking for attacks in a graph, we build graph neural networks. So that essentially gives a lower dimensional, much more purposeful representation for the same sort of neural networks are now trained and detected. So, again, there's different types of tools that are best fit for the job type of approach.
John Richards:
I like the best fit approach. Now, I'm curious, you mentioned embracing a model where you're like, "The product is going out to the team," because you don't want to see or view that data, but a lot of folks centralize that because they can keep their, well, one, AI proprietary and control that, but also it could take a lot of processing to run that. How do you help make sure that those folks can run that in a reasonable environment? Is it once it's trained, it's pretty lightweight for them to be able to run these models there locally or slightly different?
Mohit Tiwari:
No, that's exactly right. That's the whole goal. We spend a lot of energy on figuring out on training these models or tuning them or building RAG-style support to make it more tailored to the customer's org. But once it gets there, we want to make very few assumptions over time. We can add like, "Hey, if you happen to have GPUs handy, here's something that, again, a different back end that we would generate." But for now, our goal is to just assume as little as possible about the hardware layer.
But when you create really small tailored models for different problems, you actually don't end up in this situation where billions of sized model, billions of parameter model, and now it's used for every problem that I have. It's great. I think it's great for OpenAI-style use cases. One interesting thing I came up I saw was in the early 2000s, the entire web that Google was indexing was under a billion documents. And right now, each of our customers has way more than billion data objects.
John Richards:
Yeah.
Mohit Tiwari:
So it's almost like the scale of the problem is similar, but now we have to stamp out one per customer as opposed to one for everyone.
John Richards:
Yeah, it used to be like, "Oh, one group had maybe solved this." And now, I mean, just the scale of logging and stuff that's being stored, but then the power that gives if you have AI able to use your own data to find these anomalies and things like that's huge.
Mohit Tiwari:
I mean, the other part that I don't think most folks realize when they try to build a product in this space is if someone has 2050 petabytes of data, you can't just shovel all of it into OpenAI to classify. Even data transfer costs out of the customer's environment, even the amount of firewalls that have to be reset to be able to move. I mean, it's just crazy. So I almost feel like our space is one that almost, I mean this is a hypothesis, maybe I'll be wrong, but it almost doesn't admit this notion of one foundation model that will do it all. You have to build purpose, build smaller things.
John Richards:
Yeah, it's more efficient. And, again, let you have a little more fine-grained control there. I'd love to hear a little bit, so you mentioned this kind of process of going from research to product. What does that look like? How do you begin researching and then eventually get to a spot where you're like, "Hey, this is something people need, want, and kind of move into that?"
Mohit Tiwari:
So I can trace that journey. For us initially, it really started off with this is a really fundamental security primitive, and it's so beautiful that it applies to many different problems that all look different. But in this framework, they all converge into data flows at the data identity layer. So there was just curiosity and potential for impact that drove the research, and that was several years. And at some point we just started transferring this. So one of the stories is we worked with a hospital, they had a complex care clinic, and all the healthcare did happen outside the hospital because these are children who are going, social workers, school nurses, everyone is collaborating to keep the kids healthy, but all the health data is locked up inside an EMR, EHR, electronic health record, medical record chart. So we collaborated with them. These guys, they raised funds, ran a study, showed the value, all of this stuff.
And at the end, the hospital CISO asked us, "Which banks use you, your product?" And I'm like, "Hey, we are just a university team trying do some good stuff." So I think that was our kind of incrementally just trying to have impact. Once we sort of figured out the research open-ended questions, the next open question was, "How do you distribute this sort of findings or how do you turn it into a product?" And what we learned building a product is you have to get really granular on who's the persona and how will they try it, how will they convince the rest of the company to use it? So we really narrowed in on how do we make security teams like the chief security officer. And if they have a data security person, great. If not, let's make a regular security person be a great data security person.
So we just kind of narrowed down on that persona. We could have done something else. We could have been, "Oh, for data lakes, the data governance person gets super product." That could have been an alternative model, but we picked this one because it seemed like security owns the responsibility even if they don't have the power to control change and enforce. And that incrementally we built out, "Okay, we have to give our security team essentially ground truth evidence for what are the problems that they're seeing, arm them with the fixes, and templatize the problems as that it's not like each new problem is a unique snowflake kind of problem." There's a privileged access problem, there's a third party, there's dormancy, there's inventory, stuff like this, so that they all sound cookie cutter. They sound like things they have done before like for networking or for cloud assets, but now they're being done for more fuzzy assets like a data type, like manage your PCI information, but PCI information is a fuzzy [inaudible 00:28:43].
So I think we just mentally built this out. And at an abstract level, I think it was very similar, doing research and crafting a product feels very similar because it's very open-ended, you don't have a well-defined KPI that someone is giving you, you sort of have to make it and discover it and go along. So that was the stage so far. I feel like now that the market has hit, now the next kind of problem is how do you find the efficient non-linear ways of getting distribution? So that's where we today.
John Richards:
Amazing. Are there any problems that you're still getting a chance to dig in and try and solve like what excites you to dig into? Or are you pushed so much on the product side right now or marketing side maybe that you don't really get a chance to touch too much of that now?
Mohit Tiwari:
No, no. I mean, as a team for us, we know we have some set of problems. I think we can view this almost as a pyramid. There's a base set of problems that I think are proudly applicable, well understood now for us, and that's a distribution problem, but this is a very fast-moving space. Specifically if you look at like, "Hey, AI workloads are going to be creating all these data flows." I mean, a lot of AI security is about, "Let's make an AI firewall. So there's a prompt shield, prompt guard, all these techniques, and they're really focused on safety. Let's prevent them from teaching how to make a bomb" kind of stuff. It doesn't really work for the problems we were talking about some random contractor should not be reading board documents. That's authorization information flow in the enterprise problem.
So I think being able to build these probabilistic data types, being able to transform what does business purpose mean into this information flow framework, that's an open problem right now. What should this almost policy language be, and can it represent business use case? And how can we even mine that from people saying like, "Yeah, I would like this contract"? Whenever I cancel a contract with my customer, their data should be deleted across our real estate. This is a very common request, but building the actual tooling for it to make this happen, I mean, that's an open problem we are working pretty hard on, or doing detection response attacker's doing ransomware-style exfiltration plus encryption. How do you detect it early and disrupt the attack without causing damage internally or blast people with false positives? There's a lot of open problems here. It's kind of a whole new world of working on security.
John Richards:
Wow. Well, any of our audience out there who is interested in that, you should definitely check out Symmetry Systems. They are animating and doing amazing stuff. Mohit, thank you so much for coming on here, sharing your expertise. Before I let you go, tell us a little bit about how folks can find you and what's the best way to engage with Symmetry for folks that are interested to know what's going on?
Mohit Tiwari:
Perfect. We're on the internet, symmetry-systems.com. There's a little chatbot there that's not an AI robot. You can even tag me in that and that message will get routed to us. So that's a great way to engage as you browse, so we put all our white papers and everything on our website, so browse the resources section. That's great. The other option is even just going on LinkedIn, Symmetry Systems. If you DM us or pretty much DM any colleague of us, it's a great way to get in touch with us.
John Richards:
Amazing. I'll make sure to include a link to the site so that folks can check it out in our show notes. Thank you for being on here. It's been an absolute pleasure. Glad to have you. Thank you again.
Mohit Tiwari:
Thanks, John. Really appreciate it, and good luck. Great podcast.
John Richards:
This podcast is made possible by CyberProof, a leading managed security services provider, helping organizations manage cyber risk through advanced threat intelligence, exposure management, and cloud security. From proactive threat hunting to managed detection and response, CyberProof helps enterprises reduce risk, improve resilience, and stay ahead of emerging threats. Learn more at cyberproof.com.
Thank you for turning into Cyber Sentries. I'm your host, John Richards. This has been a production of TruStory FM. Audio engineering by Andy Nelson. Music by Ahmet Sahin. You can find all the links in the show note. We appreciate you downloading and listening to the show. Take a moment and leave a like and review. It helps us to get the word out. We'll be back September 10th right here on Cyber Sentries.