Dive deep into AI's accelerating role in securing cloud environments to protect applications and data. In each episode, we showcase its potential to transform our approach to security in the face of an increasingly complex threat landscape. Tune in as we illuminate the complexities at the intersection of AI and security, a space where innovation meets continuous vigilance.
John Richards:
Welcome to Cyber Sentries from Paladin Cloud on TruStory FM. I'm your host, John Richards. Here, we explore the transformative potential of AI for cloud security. In this episode, our guest is Shreyans Mehta, the CTO and co-founder of Cequence Security, an early adopter of AI and machine learning. Shreyans walks us through using AI for real world security use cases, including detecting suspicious traffic by studying legitimate users' behavior and identifying exposed data based on context. Let's get started.
Hello, Shreyans, and welcome to Cyber Sentries. We're so glad to have you here. Now, you're co-founder and CTO at Cequence, an API security company that you started like 10 years ago. Now, exciting thing about what you all are doing, you began to invest in AI a lot earlier than many folks, so I'm curious why your organization started to embrace AI so early?
Shreyans Mehta:
Well, first of all, thanks, John, for having me here. Yes, to your point, we started investing in AI very early days, good assumption, right from the beginning of Cequence. In fact, if you look at our domain's actually Cequence.ai, because the founding of the company go back to using AI and detecting adversaries using AI. So, I'll tell you a story. Large financial institution, was actually one of our first customers, they're an API first company.
They were getting attacked by the adversaries creating fake accounts, moving money around from account to account, even laundering money outside their institutions, and they were using tools that mimicked real devices like mobile devices, browsers and things like that. When we started looking into it, we had to detect these adversaries, which looked like real devices and browsers, and that's where we brought in AI, to be able to detect the behaviors of these threat actors and then being able to separate them from real users because if you end up impacting real users, no matter how good your security is, it will never be adopted. So that, to us, was the beginning of use of AI in our solution.
John Richards:
I know folks can feel like AI started around the recent buzz in it, but it's interesting to hear that you guys were finding such value early on in this, even before the recent wave. So, I'm sure you've got plenty of thoughts on how that's maturing over the last year or so as it's risen up. But before we dive more in that, you talk about API security here. Can you explain a little bit more what is meant when you say, "Hey, we're doing API security. We're applying AI onto this."
Shreyans Mehta:
We started with a very specific use case for a very large financial institution, where there was actually abuse going on on top of their APIs. As we were working through that and then worked through retail, telecom, and a bunch of other verticals, we were always working very closely with them as partners, hearing what their real needs are and so on, so forth. Ultimately, our customers, what they're asking us to do is three simple things. Where are my APIs? What risks do they pose in their environment? Are they being abused or not? To me, that is the full suite of API protection. Starting from API attack surface discovery, like what's actually exposed out there?
Then coming into cataloging those APIs, what risk are they posing in terms of PII, PCI, or any of those things? Are they properly authenticated if they're actually exposing such data? Then the later part is around are they being abused? Is somebody scraping data? Is somebody mimicking a real user and trying to take over an account, or move money, or go after the gift cards? Depending on the business or the API business logic that enterprise is actually exposing, and then ultimately stopping that natively into the environment. So, you need to have a holistic view of API protection and these pieces are all needed in a single platform to be able to do that.
John Richards:
Do you see a lot of folks not even understanding their landscape? I mean, I've been at organizations, used lots of APIs, and it feels like that first piece is just so often missing, where we think we're okay and we really have had no idea of how many different API points we had out there and what information they were really showing.
Shreyans Mehta:
We have many such stories to share, so we were working with a large dating platform, and they had protection on a couple of APIs that allowed for users to log in, and they were still getting abused and they were scratching their heads, like where are these attacks actually coming in from? What we uncovered for them was instead of the five that they thought were the entry points into the environment, there were actually 29. So, just knowing the unknown is the starting point. Otherwise, you cannot protect what you don't even know exists out there, and that is one such story.
There's another one where there was a potential customer who said, "Oh, we are good. Everything for us actually flows through one single CDN, and then there is no other attack surface outside of that, and we have a VAS in there and we should be good." Then with the Spyder tool that we have, we actually showed them that they have actually 29 different environments.
John Richards:
Wow.
Shreyans Mehta:
Not everything actually flows through their CDN, and in some cases, their original API servers are actually publicly exposed as well. So, that was actually an eye-opener and start of a journey, "Okay. We need to have an API protection program in place."
John Richards:
Wow. It sounds like so many people don't even understand the scope of that. They've got this false level of security because they're like, "Well, this is the area we're focused on and aren't aware." Now, I know a lot of folks right now are looking into what are ways we can begin to use AI in our product? It's a trending topic right now, and they're looking to do that. What is the level of dependence on APIs that folks are... As they move to this, Is there going to be less API usage? It seems like a lot of this is driven by API. So do you see a lot of customers coming in and needing to suddenly secure new API points from their integrations with different AI models or generative AI?
Shreyans Mehta:
Absolutely. So I think you use APIs for pretty much everything. I mean, it is the backbone of internet today. The way we see this is if you are hosting an AI model or any of those things, your consumers are actually interacting with those models over APIs. So let's say I am say, a Bard or a ChatGPT, and I'm hosting these services. My consumers are sending me a lot of information over APIs about the prompts, the data, any of those things, and number one, you need to worry about that, being able to protect any of those things. These are new channels that you're opening up.
Second thing is I as a company, let's say I'm a consumer of a model and I'm relying on third party services to be able to answer or ask questions as prompts or any of those things. My data, my customer IP and any of those things are actually leaving my environment to connect to other environments as well, and guess how it's happening? So, all over APIs as well. There is no other way that any two entities actually communicate but APIs, so that's where you have to worry about a lot of things to be able expose this, and actually adopt this in a secure fashion.
John Richards:
Are you seeing any unique threats or new attack vectors that folks are using to target specifically some of these AI endpoints, things like that?
Shreyans Mehta:
The actors are evolving as well. So there is AI and then there is the gen AI, generative AI. There are threads that are still yet to be explored in how to solve for the misinformation that they can spread by messing up with the model, creation of deep fakes. So, there's one aspect or one thread around that. Second thing is the data privacy and security issue that we just touched upon. What kind of data is being used to train certain models, is it going to land up in those responses when customers ask for certain questions, certain ways? Are my passwords, PII information, is that going to be exposed or not? How is that data going to be exposed?
So first thing is what are those models being trained on? It is okay if it is trained on publicly available dataset, but once we get to these verticals, when my data is getting used to train certain models, I'm not talking about these publicly available LLMs, but vertical LLMs where it could be trained on my enterprise data, and is that going to be exposed in any way, if Customer One's data can be exposed to Customer Two while serving for certain prompts or questions when they actually ask?
So, is it securely exposed? Making sure that it's not overexposing things, is it properly authenticated or not, because is it only exposed, authenticated, as well as authorized? Only certain users should have access to certain kinds of data. So all standard techniques around protecting your assets, the way you would do it. On top of it, there are other things around model manipulation, [inaudible 00:11:33], and the other things will also come in, but when we talk specific about APIs, is it authenticated, is it authorized, is it secure? What kind of data is exposed? All those things actually come into play.
John Richards:
So what do you recommend for folks that are trying to get that visibility, stay ahead of the curve with their APIs? Obviously check out Cequence, but are there other best practices that you're recommending to folks as they try to adopt these APIs and use them that will make their life easier?
Shreyans Mehta:
So, I think you are also hinting this. It's a double-edged sword, but it's adoption, you like it or not, is going to happen. No matter how much you try to block it, users are already using it. That boat has sailed. You have to just expose it in a controlled fashion, so that you know exactly what is being exposed. So I think I listed it earlier as well. Number one is you have to prioritize secure data sharing, making sure that you know exactly who your data is getting shared with and how. You need to get complete visibility into that. So might as well bring it into your circle of trust, that these are the apps and APIs that I trust, and I have visibility into that rather than saying that I will not be exposing this at all.
Then whoever is the right partner or set of partners you're choosing, you need to know that they have the capability to protect your data as well, and there are standard compliance suites around that like SOC 2, PCI, and bunch of other things around it. But you should be able to trust that third party you're actually relying on. But then even after that happens, you need to make sure that you have proper credentials to access that you're doing that over API that are being properly monitored. If they are exposing certain kinds of data, you need to have proper contracts and training with your users that, okay, this is the data you're allowed to expose, and this is something that you should never expose, that your customer's private data or any of those should not leave your environment.
It should be scrubbed off. Then the last aspect is trust but verify. That means even though you have committed contracts with your end users who are actually using these services, making sure in real time/runtime that this is what they're exactly doing, and sometimes they might make a mistake. So you need to be able to flag that, double it up, and potentially even block it when such things happen.
John Richards:
Great advice. Thank you. Now you mentioned earlier about how early on you all were using AI, identifying the legit traffic versus illegitimate traffic. How has that changed over time, or what are you looking at as AI continues to grow, as you guys are growing? Where do you see new possibilities for AI in the space of advancing API security? Are there any areas you're currently working on or that you're already seeing value in, or you're looking to jump into?
Shreyans Mehta:
I mean, as you already know, security is a complex world. There are so many different tools. Being able to figure out alerts and making sense out of it is a very, very hard problem to solve. So first area for us is really some simplifying the product, is the first use case that we have, and it could be about our end user, who actually could be a SOC analyst to API developer, any of those end users simplifying and making sure that they can get to a proper API protection very, very quickly. So it could be about the SOC alerts, like can they make sense out of those very easily and quickly and take automated actions around it is one area for us that we're continuing to invest in AI around that. Second is that that we announced last year is use of LLMs to generate API security test cases, where the models can automatically detect API specs, what kind of tests actually make sense around it and auto generating those tests based on that.
John Richards:
Because building out tests is people's least favorite job, so incredible spot to be automating.
Shreyans Mehta:
Yeah. So those are making lives easier for our end customers, and then ultimately using gen AI for other purposes would also be around just understanding the adversary as well, because the adversary is not sitting still. They have access to the same tools, so how can we stay ahead of the curve? When they use tools like this, how can we detect and protect against that?
John Richards:
That is happening now. We're seeing hackers, at the very least, scaling up their attacks, at least in a lot of other spaces. Are you seeing that same thing happen, where they're using AI to evolve how they're attacking in the API space or is it the same thing, just maybe more volume?
Shreyans Mehta:
I think that the level of sophistication has increased, for sure. The ability to hide behind real looking things has also improved a lot, and this has happened in the last one year alone. So, we are seeing some of these things. We are constantly mining the black market for tools that are available, and I'm sure they are using some of these tools because they're upping their game, for sure. So there is no hard evidence just yet on our side as a threat actor, but looking at their tactics and sophistication, I can guarantee you they're actually using such tools today.
John Richards:
How do enterprises try to stay ahead of that? Is there anything they can do, or what does that look like? If you're a CISO out there at an organization and you know this is coming, how do you get ahead of the threats that are out there?
Shreyans Mehta:
So one of the things that I see, and that is again, I go back to the beginning of Cequence as well, is these models are all based on data. So without data, these models are useless. So if you give junk as an input to a model, the output will also be junk. What all these bad actors don't have is your users. They don't have access to it. Their behaviors, how do they interact on a daily basis from morning to night? What the overall pattern of access is in your environment, that okay, you are a retailer and retailer customers typically wake up at 9:00, they start shopping until noon, and then they go back to work and certain behaviors around it.
What kind of access patterns do they have? Any of those things, they're blind to it. They only have access to your mobile app, your web application, because it's publicly available. Then they're trying to use that reverse track to get in. They can be very good at mimicking a real browser, a real mobile device, but they will not be good at understanding your customer interactions. That's where you can actually stay ahead of the curve, applying AI to learn certain behaviors, and then being able to differentiate the bad actors from the good ones, because that is your asset and that they'll not have access to.
John Richards:
Wow. I love that insight. So it reminds me, I heard years ago that when they want to train bank tellers on identifying counterfeit money, what they don't do is give them a bunch of counterfeit money to use. Instead, they take real money and they just handle it so much that when something feels off, they know there's something wrong. Anyway, your response there reminds me of that. You know how your real users interact, and if you know that well enough, if you analyze that, you can use that data then to identify, "Oh, here's an anomaly. Why is this different? Oh, this could be a bad actor."
Shreyans Mehta:
Yeah.
John Richards:
Wow. Oh, wow. Well, thank you for sharing that insight. I appreciate it. Something for folks to think about as they use that to try and identify what's going on. Another thing I wanted to ask you about was, I've been hearing this term, API posture management. There's a few different posture managements out there, and API is starting to become one of these. What do you all think about when you think of the idea of API posture management?
Shreyans Mehta:
So, I spoke about the API security journey. Three things, where are my APIs, what risk are they posing, and then are they being abused? So the second aspect of it that we can expand on that, and that is really the API posture management. So, we get into organizations who are catching up to API security. The APIs are already out there, and maybe less than 20% of them are actually cataloged or documented, because developers already knew their APIs. The specs are just catching up to that. Now, what that meant was people are using those APIs, you catalog them, but then what? You might baseline them by creating... You're looking at the runtime traffic that, okay, the remaining 80%, let's create the baseline spec for it, and let's start seeing deviations on top of it. But then what you also need to know is what are they really exposing?
We have customers who have tens of thousands of API endpoints because they're large, complex organizations. Some grew through acquisitions. Some have development happening across the globe. Sometimes the developers have moved off, and then new people don't even know those APIs actually exist out there. So first is we're getting a handle of how many APIs do I have, but then where do I start? Out of these tens of thousands of API endpoints, what risk are they posing? So, it could be based on multiple aspects.
One is, what kind of data are they exposing? It could be generic PCI/PII data that you know about, or it could be customer-specific. Like if you're a telecom, call records are a big deal. IMEI numbers are a big deal, so it could be very contextual to a vertical URN. And then if they're being exposed, did they really need to be exposed or not? We have to ask that question. That all falls under the posture management. So, did they need to be exposed? If they need to be exposed, are they properly being authenticated or not? Is there proper authorization scheme in place?
Are you using strong form of authentication on those APIs? So all those things actually fall under that bucket, and this is not just a snapshot in time because the developers will be developers. They have to roll out new apps at a very fast pace, so they will roll out a new endpoint without your knowledge. So being able to bubble up those endpoints called shadow APIs, sometimes they will do that to test an endpoint. So I have a get endpoint, but during testing, I need to be able to delete the data and then repopulate that data. So I have a delete, an endpoint that's not supposed to be publicly exposed, but it ended up being one.
So, some of those things you have to worry about and being able to detect and exposure around that. There have been cases where the test endpoints were backed by real customer data. I mean, there have been very well-known public breaches that we are aware of where a test endpoint unauthenticated, backed by real data, and the bad actors just enumerated through certain identifiers and exfiltrated that entire dataset. So, that's where the API posture Management comes into picture. What are my APIs? What risk are they exposing? Then do they really need to be there or not? Then you need to start classifying them based on the risk they're opposing, and start zooming into the ones for deeper analysis for the higher risk APIs.
John Richards:
How do you go about understanding what data is being exposed by that endpoint? Is this like a manual process of getting dumps of data of each one, or is there a way to more rapidly classify that? If you're having to do this, as you mentioned, as a continual thing, because you never know when an API might change, or as you mentioned, a new API might come online.
Shreyans Mehta:
It's a mix of multiple things. One, I would say the big brush that we use, again, an NLP-based AI model, where we can actually understand the context of data flowing back and forth within those APIs and trying to say, "Okay, this is a 16-digit number, and based on the model around it and the context around it, it actually looks like a credit card number." Or because around it are the dates of expiration and then certain keywords around it, so think of it like an NLP-based model that can actually detect some of those things.
So you don't need to manually go ahead and figure out what needs to happen, but the customizability is the other aspect of it. You know as an organization what matters to you and might not fall into something that we have ever seen. Your environment is your environment. So being able to customize what you care about, come up with patterns what you care about, and then use that to be able to detect and bubble up some of those risks is also possible.
John Richards:
Yeah. There's the organizational contextual ones you had mentioned. Okay. Yeah. Well, thank you Shreyans, so much for coming on here. I learned a ton. I really appreciate it. So in addition to thanks for being a guest on our show, I'd love if you could share with our listeners a little bit more about Cequence Security, what's going on, anything they should follow, or maybe you've got something yourself you'd like to promote, but any words for our listeners out there?
Shreyans Mehta:
First of all, again, thanks, John, for having me here. A little bit about Cequence, if you've not heard about us before, we are the largest API protection company that is out there. We protect on a daily basis close to eight billion API requests every day, and this is across multiple verticals ranging from finance, retail, telecom and social, and you name it. So anybody and everybody who's interacting with their customers and partners, I mean, it's no news that they're actually happening over APIs, so if you need to learn more or hear more about Cequence, you can either reach me directly at shreyans@Cequence.ai, or get to our site and ask for a deeper dive. We can definitely help you there.
John Richards:
We'll definitely include that in the show notes, a link over to that. So, be sure to check that out.
Shreyans Mehta:
Again, thank you, John. Great chatting with you today.
John Richards:
This podcast is made possible by Paladin Cloud, an AI-powered prioritization engine for cloud security. DevOps and security teams often struggle under the massive amount of notifications they receive. Reduce alert fatigue with Paladin Cloud. Using generative AI, our model risk scores and correlates findings across your existing tools, empowering teams to identify, prioritize, and remediate the most important security risks. If you'd like to know more, visit paladincloud.io.
Thank you for tuning into Cyber Sentries. I'm your host, John Richards. This has been a production of TruStory FM. Audio engineering by Andy Nelson, music by Amit Sagee. You can find all the links in the show notes. We appreciate you downloading and listening to this show. We're just starting out, so please leave a like and review. It helps us to get the word out. We'll be back March 13th, right here on Cyber Sentries.