Cyber Sentries: AI Insight to Cloud Security

Decoding the Language of Machines: AI's Potential to Revolutionize Cloud Security
In this episode of Cyber Sentries, host John Richards is joined by Murali Balcha, founder and CTO at Trilio, to explore how AI could transform cloud security by understanding the unique language of machines. Balcha brings over 20 years of experience in IT, particularly in storage systems, to the conversation.
Harnessing AI for Proactive Security
John and Murali dive into the potential of AI to enhance cloud security by analyzing the vast amounts of data generated by IT systems. By treating system logs as a language that AI can learn, models could be trained to identify threats and anomalies in real-time, even detecting zero-day attacks that traditional rule-based systems might miss. This shift towards proactive, AI-driven security could significantly reduce the time between a threat emerging and its detection.
Questions we answer in this episode:
  • How can AI be applied to cloud security?
  • What advantages does AI offer over traditional rule-based security systems?
  • How can AI models be trained to understand the unique language of machines?
Key Takeaways:
  • AI has the potential to revolutionize cloud security by learning the language of machines
  • AI models can identify threats and anomalies in real-time, even detecting zero-day attacks
  • Shifting towards proactive, AI-driven security could significantly enhance threat detection and response times
This episode offers valuable insights into the cutting-edge applications of AI in cloud security. Listeners will gain a deeper understanding of how machine learning can be harnessed to protect their systems and data, as well as a glimpse into the future of proactive, intelligent security solutions.
Links & Notes
  • (00:00) - Welcome to Cyber Sentries
  • (00:56) - Meet Murali Balcha
  • (03:29) - AI’s Evolution
  • (06:06) - Transferring Data
  • (14:43) - How Trillio’s Looking at AI
  • (23:36) - Wrap Up

Creators & Guests

Host
John Richards II
Head of Developer Relations @ Paladin Cloud The avatar of non sequiturs. Passions: WordPress 🧑‍💻, cats 🐈‍⬛, food 🍱, boardgames ♟, a Jewish rabbi ✝️.

What is Cyber Sentries: AI Insight to Cloud Security?

Dive deep into AI's accelerating role in securing cloud environments to protect applications and data. In each episode, we showcase its potential to transform our approach to security in the face of an increasingly complex threat landscape. Tune in as we illuminate the complexities at the intersection of AI and security, a space where innovation meets continuous vigilance.

John Richards:
Welcome to Cyber Sentries from Paladin Cloud on TruStory FM. I'm your host, John Richards. Here we explore the transformative potential of AI for cloud security. Our sponsor is Paladin Cloud, an AI-powered prioritization engine for cloud security. Check them out at paladincloud.io. On this episode, I'm joined by Murali Balcha, founder and CTO at Trilio, which provides cloud-native data protection. We discuss how everything has a language, even machines, and how decoding that language could allow AI to stop zero-day exploits or restore system outages in real time. Let's dive in.
Hello, everyone. Thank you for joining us today. We're excited to talk to Murali Balcha from Trilio. He's a co-founder, CTO over there and is coming to talk to us today about data, AI and a lot more. So, excited to have you here. Before we dive into that, though, I'd love to hear a little bit about how you got started, what led you to become a founder and be working at Trilio.

Murali Balcha:
Thanks, John. Thanks for the introduction. My name is Murali Balcha. I'm founder and CTO at Trilio. Yeah, it's been a long journey. We started around 2013, when cloud was the thing just coming up. It's not as mature as it was right now. I had 20-plus years' experience, and most of it is at EMC Corporation, so I'm very familiar with the IT, especially the storage and the IT systems. Those 20 years, IT has gone through a lot.

John Richards:
It really has.

Murali Balcha:
And cloud is something, a paradigm shift, altogether. It becomes obvious now, like 10 years from where I started, but we know just like a VMware, which basically changed, transformed the IT, we know cloud is going to change and transform the IT one more time. So, we thought it's a good opportunity for us to basically start something in the cloud. And what is striking for us is data protection, backup and recovery, which has been there in the IT forever, since the IT existed. But the challenges with the cloud workloads is a lot different than what is doing [inaudible 00:02:46]. We thought it's an exciting place to start a company, so I left EMC and started Trilio, primarily focusing on workloads that are running in cloud, submission, creating an applications to provide enterprise-level backup and recovery. The paradigm that we were seeing at the time is the scale and the self-service capabilities of the cloud. So these are the things that we thought it's going to define what the data protection will look like in the future. So that was our mission to start the Trilio.

John Richards:
I love it. So important, and capitalizing on a critical moment there in time, seeing, "Hey, this is going to be a differentiating moment." I'm going to use that here as my segue, though, for this next piece. So, the cloud, at the time people were like, "Will this catch on? Will it not?" You kind of see this opportunity. As we look at AI right now, some people are like, "This is going to keep growing, it's going to be as big, maybe bigger than cloud adoption," and some people say, "Hey, this bubble's about to burst." As you look at this, what are you thinking, kind of incorporating that experience you've had from other critical moments in tech evolution? What are your thoughts?

Murali Balcha:
Yeah. AI is going to have a profound impact everywhere, whether it's in technology and everyday life. We are probably already interacting with AI without even knowing that we are interacting with it. If you do a Google search, probably you're interacting with some AI agent there. But when you look at AI, AI is not something new, but what really changed it was the inflection point one and a half year back, when OpenAI had the breakthrough with the generative AI. Up until then, it didn't really caught on. It didn't really capture the mainstream imagination what AI would be.
Enterprises, they have been using some kind of AI [inaudible 00:04:41] solutions to basically derive some business intelligence, some predictive analysis and that kind of stuff. But it hasn't really caught on until OpenAI showed what AI can do. That is transformative force. And the rest, we can see the excitement for the last one, one and a half year around AI, not only in the stock market but also the kind of innovation that is happening in AI is mind-boggling, breathtaking. And more importantly, the foundation model, they call it, a transformer-based foundation model, that will have profound impact, especially in IT.
And the generative AI, when you look at the generative AI, most of the datasets that generative AI uses is the public datasets like Wikipedia or some kind of text, lots of text language that are readily available to train these mammoth models. So, generative AI is going to be profound. The way we do things will change forever. The concept, generative AI, can we translate directly to the IT problem that we solve, that need to be seen, right?

John Richards:
Yeah. No, it's a good point. Hey, you've got this public data, but so many enterprises are very protective of their data and so how do you transfer that over? Is that a easy process? "Hey, it understands English, maybe it'll understand my data." Or is that time-intensive to try and get your data in a spot where it can work with these models? Or do you kind of need to just start fresh and have it learn everything from scratch?

Murali Balcha:
Absolutely. You hit the problem on the head. I think that is the public dataset that is available, you can try and generate your model with a Shakespeare, the work, and then make it create a Shakespearean work because the data is readily available and you have enough data that is available to train it. So, you can't take those things and basically transport into IT. And there are few use cases that you can do, but you cannot take the models and then readily make it use in the IT space because everything that happens within the IT, they are not language, they are not English.
When you look at the artificial intelligence, especially the generative AI model, from a mathematical notation point of view... They call it sequence-to-sequence translation. For example, if you have a sequence of tokens, words in English, generative AI or these language models can readily translate to any other language. So you have sequence of words that can translate into sequence of words. Likewise, you can take a sequence of words and then you can classify. So then what that means is you can train the model to understand how these words are related to each other and bring out the meaning. It's like sentiment analysis. So you can basically feed some IMDb review of a... and then you can say whether the reviewer was happy about that movie or not so happy about that movie.
So those, you can do that. But you can also take the same model and then you can translate and monitor the IT events. For example, you have tons and tons of data that is available in the IT every day, the instrumentation, the data that is generated by this enormous number of computers. It doesn't make any sense for humans, unless you look at the logs and peel the onion kind of thing, but you have zillions of recurs, but then you have to make sense of few recurs like when something happened in the IT.

John Richards:
Yeah, I've had to deal with incidents and it is a nightmare trying to dig through logs and be like, "What am I reading? What happened here? Where's the ones that even matter?" and then a bunch of filler in there. It's so time-intensive.

Murali Balcha:
Absolutely. This is like finding a needle in the haystack. But if you can take that, process those logs and then somehow convert those logs in the AI model, what they call these tokens, and then somehow train those models with the log events, now you can analyze the event that are happening in the IT a lot more readily with this language models without having a human operator trying to basically make sense of things. That's not there. A lot of companies are kind of looking at AI, how to basically use the AI in the cybersecurity space and application performance, all those things. But once we get to a point where we can capture the logs as tokens and then somehow use these language models to find the context, what sequence of events related to some threat, what sequence of related to some application problem, if you can somehow analyze those things, then you can unleash the power of AI to basically monitor the IT more effectively than the human operator would do it.
Because when you look at the current IT, current tools that are available, they are not exact science. In the sense they are like heuristics. They are like fuzzy logic. They're like a lot of rule-based things that, based on what you saw, what kind of threat you saw, and then you research the threat and then find out the key parameters of the threat and try to create the rule and then monitor for the threat based on the rule.
But when you look at the threats that are happening in the industry, especially with the generative AI, with the social engineering that is happening, these threats are created on the fly, too. So, there is not time between a new threat is happening and then the cybersecurity expert can go and detect the threat, analyze the threat and generate a rule. There is not much time. These kind of things you cannot use the rule-based systems to basically analyze because they're very difficult. They could break for the new threat. They cannot detect if something changes. But by focusing, by using AI, by basically tokenizing these events and then finding this contextual information dependencies between the events, you can more effectively analyze the threats even for the zero-day attacks.

John Richards:
Wow. Let me make sure I'm understanding this. You're talking about right now our current models are very rule-based, and that's reactive, and as we're dealing with new threats that are coming out that can be kind of dynamically created on the fly as AI might be used by attackers or maybe a zero-day exploit, those rules don't really hold up anymore because you don't know what the requirements are. But by theoretically having an AI model that is monitoring your logs or your traffic, back to your kind of sentiment analysis, instead of saying, "Is this movie review good or bad?" you're saying, "Does this sequence of traffic look threatening or is it safe? Is this normal according to our baseline, or is this something risky we need to deal with?"

Murali Balcha:
Absolutely. The AI can do it a lot more effectively than a human at a rule-based system. One more thing is about when you look at the AI model, for example, even a ChatGPT, the GPT-4o or the Llama, they may train using terabytes of data, but once the model is trained, you don't need the data to basically generate new data or do the classification, all those things. So the model itself is small, but then the weights, they call it weights, parameters, they together is not as big as the volumes of data that you train. With the traditional systems, like for example, a database, if you were to basically do a search, you need access to the entire data.

John Richards:
Yeah, yeah, I hadn't thought about this.

Murali Balcha:
When you have these security systems and when you were to basically design these detection systems, the [inaudible 00:13:26], with the traditional set-up, you need to have access to the database, entire database that you... because that's how you basically filter the events and then make sense of the events. With the AI, once the technology becomes more mature, and then the processing power improves and the price point comes down, these things can be effectively deployed at the edge without having access to the data because the knowledge is already baked into the model. Now, it can readily detect these kind of threats a lot more effectively than the traditional models.

John Richards:
Wow. Yeah. Oh, yeah. I had not thought about the advantage of that. And then there's also, you don't have to worry about backwards engineering it or things like that if it's sitting out there.

Murali Balcha:
Yeah. You can push new training models to the edge again. So, that is one aspect of what the AI can do for the IT.
And you can look at other problems that we solve with the conventional... Because the conventional approaches poses limitation, but we tend to accept the limitations, but then when you take a new look with the AI, a lot of problems can be solved a lot more effectively with AI than the traditional models.

John Richards:
Yeah. So what are some of the ways that Trilio's looking at AI right now, things you're actively doing, and then I assume some of what you've been talking about is what you're currently researching and developing as well?

Murali Balcha:
We've been looking at the AI a lot more seriously. The first impulse is somehow force fit. Take gen AI, somehow make it a cool chat dialogue to some our product. Remember, like, five years, six years back, Alexa was the trend. You got everything integrated with Alexa, including toilets to the mall.

John Richards:
Yes, yes.

Murali Balcha:
But that's a force fitting, because technology should not be on your face. Technology should enable you to do the things that you do a lot more effectively. Very few companies can do those things. So we could have taken some gen AI thing and somehow force-fit it, but we don't see the value of it because it's just, "Me, too," kind of thing, but we are not seeing a lot of value.
But this year, I think last year, Trilio started with a mission like intelligent recovery, because recovery is the most important thing for a backup and recovery backup is not. It's like insurance policy. You can keep paying your premiums, but the proof was whether the insurance really works when that happens, or when that incident happens. So we started with intelligent recovery. Intelligent recovery is when the data centers becomes complex, complex and they are deployed in multi-site, multi-cloud. Outages are also complex, right? Quickly recovering from outages, no matter where that outage happened, is very important. So, point of recovery time objective is one of the thing that industry measures these recovery solutions by, so giving RTO as close to zero is pretty much everyone tries.
We started with this theme called intelligent recovery, which is ability to basically recover no matter what that outage is. But then what many companies are kind of leaving undefined or taken for granted is when outage happens. It's as if the IT personnel, everyone knows what that outage means, but that's not trivial, especially when you are deploying thousands of applications in multi-cloud environment. The outages, when a computer goes down, okay, that's easy to detect. But when a site goes down, it is a physical outage, you may detect the applications that are affected, but then, especially when in a complex environment where you have the ransomware attack that is happening on an incremental basis and threat is more persistent, identifying the threat and identifying what the systems are affected, how far that you need to go back to basically eliminate the threat and other things, they're highly complex things.
So, with that, we can do something with the AI if we want to basically extend the mission, intelligent recovery to intelligent monitoring and recovery. So the idea is essentially see if we can... We are still in exploratory phase because, but we have done a lot of work around this area. So the idea is take the events, the computer-generated events, the Windows logs, the Linux logs, system logs. So, for any AI, even though we call it a model, training the model, but then what you don't see is the 90% of the work is data preparation. So, essentially, you have volumes of data, but the model doesn't really understand the raw data. What you need to take, you need to clean up the data and normalize the data and then tokenize the data and then train the model. Otherwise, it's garbage in, garbage out. It's not effective data.
So, we did a lot of research on how to take these events, whether it's Windows events or Linux events, clean those events, and then basically in a AI world, in the large language models world, they call it token. We want to treat these events as a new language. So that means when you are training a new language with the AI, you start with cleaning up and then finding the token, tokenizing it, and then they call it embedding. You need to find the embeddings, embeddings of that tokens and then train the model and then classify it.
This is one thing that we are actively looking at, but there are some challenges, obviously. The challenge is a lack of huge amount of data. If you look at the cyber threat, and there are various companies, the groups that basically monitor the threats and catalog the threats. MITRE is one of the company that does it, and if you go there, you'll see there are 140 known threats that we know, there are some unknowns, zero-day threats, they are not categorized there. But every threat, it goes through different phases, maybe 10 phases. Those are documented, but you don't know... but it's very, very difficult to simulate and then generate the logs and then train the model.
So this is one thing that collaborate, industry, at the industry level, because not everyone has everything, because a lot of it is description-based and even the logs that are generated, each company has access to them, but not every company has same access to them. We are confident about our model, but I think we need to gradually take it to the next phase of training with the real data and then classify these threats and the stage at which these threats are. And if there is a credible threat and if it really affected your IT, then how do we create a recorded plan around it so that those affected systems can quickly recover? So these are some of the visions and mission that Trilio is pursuing at this point.

John Richards:
Yeah, it is funny you started this piece off here talking about how used we are to this idea of an outage. We just had, dates the episode a little bit, where we just had the CrowdStrike outage that hit everywhere, and it was like, "Oh, what are we doing?" But people have come to expect periodically we have these huge events that happened, and what I'm hearing from you is that instead of needing, and I talked to my friend and he is like, "I had to go to every single machine physically and take care of stuff," if we had a way to identify these, some outages occurring automatically and then kick off a restore state without needing to verify and make these changes, we could use AI to be monitoring this live and not wait for a person to come complain and then use this automatic, "Let's get back to our previous state that we know is works," and kind of restore everybody back. Is that sounding right?

Murali Balcha:
Yeah. Right. That's exact. So these are some of the innovations that AI can drive, the cost of ownership of the more AI, it can be reduced gradually, and then the efficiency can be improved in the IT, right?

John Richards:
Yeah. With the challenge that you're creating a new language here, you were saying, that to map all these embeddings to do that.

Murali Balcha:
Absolutely. Everything is a language.

John Richards:
You don't like easy problems, I see. You're willing to tackle the hard ones, right?

Murali Balcha:
Yeah. I'm the nature loving person, and I basically watch these nature videos. Now, language is not very confined to human. Every species has a language in their own way. You know, whales. We hear about, they sing songs for half an hour, and each song is unique. Now, when I look at that, the computer systems, they have their own language. That language is logging. Basically, they beat out these logs, even log events, and there is a set of dependencies, grammars, and context. But the language models are so powerful, so why not think this logging as some process tweeting or some system tweeting, and then it discovered the inherent structure of that, basically built some effective monitoring tools. That was my motivation.

John Richards:
Yeah. I think it will be hugely powerful. I'm glad people were tackling that challenge to be able to identify this stuff. Well, I want to thank you so much for your time coming here on the podcast. I would love to give you a moment to shout out any product or anything going on, how folks can find you. So, if you want to give us a couple moments to tell us what's Trilio doing and how can folks connect with you?

Murali Balcha:
Yeah, Trilio, we've been very busy building this data protection for cloud. One of the thing that is happening in the IT is a containerization, and it's been going on for a few years now. But then there is also convergence that is happening between containers and virtual machines, and people want to bring both containers and virtualizations into one platform and manage them with one tooling. That's been quite popular these days, and we think we've been seeing a lot of uptick where people are trying to move away from traditional virtualization platforms to Kubernetes-based virtualization. So we've been working hard to basically support that transition, and we have very differentiating product on how to protect both containers and virtual machines in Kubernetes. So that is top of our agenda.

John Richards:
Awesome. I know many folks out there running these mixed environments of different systems, and being able to manage that together, instead of having completely different processes, it is really helpful. Well, awesome. What about connecting with you? If anybody wants to follow up, what's the best way to reach out and learn more?

Murali Balcha:
Yeah, yeah. So I can reached at my email, murali.balcha@trilio.io.

John Richards:
Awesome. Well, check out a excellent blog post that Murali wrote, I read it beforehand, that covers more details on this. If you're really interested in this topic, definitely check that out. We'll include it in the show notes. Thank you so much for being a guest on here. I learned a lot and look forward to the next time we get to chat.

Murali Balcha:
Yep. Thanks a lot, John. Thanks for having me.

John Richards:
This podcast is made possible by Paladin Cloud, an AI-powered prioritization engine for cloud security. Devops and security teams often struggle under the massive amount of notifications they receive. Reduce alert fatigue with Paladin Cloud. Using generative AI, the model risk scores and correlates findings across your existing tools, empowering teams to identify, prioritize and remediate the most important security risks. If you'd like to know more, visit paladincloud.io.
Thank you for tuning in to Cyber Sentries. I'm your host, John Richards. This has been a production of TruStory FM, audio engineering by Andy Nelson, music by Amit Sagie. You can find all the links in the show notes. We appreciate you downloading and listening to this show. Take a moment and leave a like and review. It helps us get the word out. We'll be back September 11th, right here on Cyber Sentries.