Cyber Sentries: AI Insight to Cloud Security

Kubernetes, AI, and Edge Computing: A Powerful Combination
In this episode of Cyber Sentries, John Richards is joined by Saad Malik, CTO and co-founder of SpectroCloud, to explore the intersection of Kubernetes, AI, and edge computing. Saad shares his insights on how these technologies are transforming various industries and the challenges organizations face when implementing them at scale.
Unlocking the Potential of Kubernetes and AI
Throughout the episode, John and Saad discuss the growing adoption of Kubernetes and AI across different environments, from public and private clouds to data centers and edge locations. Saad explains how SpectroCloud's platform simplifies the management of Kubernetes clusters, enabling organizations to leverage the unique capabilities of each environment while maintaining consistency and security.
Questions we answer in this episode:
  • How can organizations manage Kubernetes across diverse environments?
  • What are the primary use cases for edge computing?
  • How can developers scale up their Kubernetes deployments faster?
Key Takeaways:
  • Templating Kubernetes configurations and integrations simplifies management at scale.
  • Edge computing enables data pre-processing, unique experiences, and robotics applications.
  • AI operations (AIOps) can provide actionable insights and automate Kubernetes management.
The conversation also touches on the cultural shift required to embrace AI-driven automation in Kubernetes management. Saad suggests that organizations will gradually adopt these technologies as they gain confidence in the recommendations and actions taken by AI systems.
This episode offers valuable insights for anyone interested in the future of Kubernetes, AI, and edge computing. Whether you're a developer, platform engineer, or IT decision-maker, you'll come away with a better understanding of how these technologies can be leveraged to drive innovation and efficiency in your organization.
Links & Notes
  • (00:04) - Welcome to Cyber Sentries
  • (00:57) - Saad Malik and Spectral Cloud
  • (02:33) - Environments
  • (04:29) - Spread
  • (06:06) - Edge Adoption
  • (08:47) - AI Adoption
  • (12:06) - Scaling Up Faster
  • (15:36) - Security
  • (18:20) - Integrating AI
  • (23:44) - Ownership Models
  • (25:32) - Wrap Up

Creators & Guests

Host
John Richards II
Head of Developer Relations @ Paladin Cloud The avatar of non sequiturs. Passions: WordPress 🧑‍💻, cats 🐈‍⬛, food 🍱, boardgames ♟, a Jewish rabbi ✝️.

What is Cyber Sentries: AI Insight to Cloud Security?

Dive deep into AI's accelerating role in securing cloud environments to protect applications and data. In each episode, we showcase its potential to transform our approach to security in the face of an increasingly complex threat landscape. Tune in as we illuminate the complexities at the intersection of AI and security, a space where innovation meets continuous vigilance.

John Richards:
Welcome to Cyber Sentries from Paladin Cloud on TruStory FM. I'm your host, John Richards. Here, we explore the transformative potential of AI for cloud security. Our sponsor, Paladin Cloud, is an AI-powered prioritization engine for cloud security. Check them out at paladincloud.io.
On this episode, I'm joined by Saad Malik, CTO and co-founder at Spectro Cloud, a platform designed to manage Kubernetes at scale. Saad walks us through how AI is impacting security in the Kubernetes space, the rise of AI on the edge, and the technology needed to pick a perfectly ripe apple. Let's dive in.
Welcome to the show Saad Malik. I really appreciate you coming on here. I'm excited to hear from you about what all you're doing over there around Kubernetes, AI, security, all these different things, but before we dive into that, I'd love to hear about how you got involved in co-founding Spectro Cloud.

Saad Malik:
First of all, John, thank you for having me on the show. It's a pleasure to be here. My name is Saad Malik, and I'm the CTO and co-founder for Spectro Cloud. Spectro Cloud focuses on Kubernetes management at scale. Kubernetes, as you know, is very difficult to be able to operate and run, so our goal was to provide that easy button for organizations to be able to run it in any environment, whether it be in public clouds, private cloud, data centers, and doing so at scale.
The story actually was, about five years ago, we were at Cisco working with many different organizations that were moving towards containers and seeing firsthand the kind of challenges they were having with being able to operate Kubernetes, and especially as we started thinking about multi-cloud, multi-cluster and all the different integrations that went into it. We really thought, "Can we provide a very simple platform that is able to automate all the complexities away, from being able to run it securely, being able to run with the logging and monitoring observations built into the platform?"

John Richards:
Kubernetes feels almost like a gatekeeper to development nowadays. It's just the next tier. It's one of the hardest things that developers encounter as they're trying to really get hold of this, so being able to make that barrier a little easier is super helpful. Now one of the challenges I know comes up in the Kubernetes space is the fact that Kubernetes can be run in so many different areas, folks running this on-premise, but other folks are running this on the cloud, but just using bare metal on the cloud, and then other folks are using these managed Kubernetes providers. How do you approach the fact that Kubernetes can be fractured across so many different environments?

Saad Malik:
I think one of the value props that Kubernetes did really well is it provides a fantastic abstraction layer. It provides abstractions for your compute, your storage, and networking, such that when you run an application workload, you can pretty much develop it once and have it, for the most part, be able to run in any of the environments you mentioned, whether it's managed Kubernetes providers or running it on the cloud as IaaS layer, bare metal data centers, and also edge environments. But the challenge really comes in how do you configure Kubernetes, the actual implementation of it, to run across these different environments?
And so the approach that some organizations have taken is we want to be able to provide a very simple Kubernetes, vanilla, the exact same cookie-cutter across environments, but the challenge with that generally is you're not able to take advantages of the individual environments. If you think about, for example, if you look at the cloud perspective, you have all these different amazing additional services, from different databases and message buses and firewalls. You want to be able to leverage all those capabilities also within your Kubernetes distribution.
The approach that we took is we obviously let you run it in any environments, but we do optimize it as well, so you are able to leverage all those different integrations and services as well. Behind the scenes, the technology that we use is an open-source project called Cluster API, which does automate the provisioning of clusters across any of the different environments, but it also still lets you leverage the unique capabilities that they also have to offer.

John Richards:
What are you seeing in how much folks are spreading across or focusing, or is it most teams pick one area and they just go all in? What's the ratio you see of folks running across a lot of these different environments? I know you mentioned edge is like a big up and coming space for this.

Saad Malik:
We actually conduct an annual research called State of Kubernetes, and this last survey, we did ask the question what percentage of your clusters are you running on public clouds, what percentage are you running in data center? And I'm sure you wouldn't be surprised that 80% of organizations, these are large Fortune 500 companies, mentioned that they are running in multi-cloud. Some are running multiple clouds and running data centers and also edge. A single team, when they start off, most likely, will start on a single environment, what, I would say, most likely public cloud like Amazon or Azure, but there are a number of different use cases where they have to run it in the data center, regulations or compliance reasons, and now, with them wanting to provide additional enhanced services and capabilities also on premise where their customers are, they're now running it on edge.
One thing we realize is, when you come to edge environments and data center requirements, you always have some sort of back cloud or back end application that is also running, that is aggregating all your analytics, that is aggregating all the different results you're getting from your edge environments and being to show a nice dashboard on top. Edge always does come with a data center or cloud use case, and majority will still have many clouds running at the same time.

John Richards:
It's wild to me how much edge has just exploded. It wasn't that long ago that it was such a thin layer, you couldn't do a lot in it, but now you're like, "Oh, I'm running Kubernetes out here on the edge." What do you see as the trend for this edge adoption?

Saad Malik:
All the different industries we talk to, from telcos to healthcare to FinTech, they are obviously all exploring different edge use cases. It is picking up significant use cases. Now what are the primary use cases for edge? It generally falls down into three different buckets. One is more about data pre-processing. More and more IoT devices are being available on these different edge environments. They're collecting lots of megabytes, and sometimes gigabytes, of data over days, over years. How do you aggregate all those results and provide meaningful insights? You can send all the data back up to the cloud, but, John, it becomes very expensive, and there's latency involved. There's connectivity issues. Doing some sort of data pre-processing directly at the edge becomes very relevant.
The other one is about providing more unique experiences. When you are at a factory store and you are looking at some camera that is looking at people coming in and out and trying to determine whether to activate some machine, those are all AIML models that are running and doing local inferencing to determine what is the next action to do. Again, you can't send that data stream all the way back up to the cloud. You have to keep it local.
And the third use case is also regarding robotics, being able to very quickly change, dynamically, other robots, other services within your factories, within your automotive, to be able to respond quickly to changes that are happening. Primarily, the use cases that we see, for example, working very closely with GE HealthCare, they're the leading manufacturer of CT scanners and MRI machines. Now, as you would know it, today, they're actually building all the actual scanning and detections for your tumors, for your cancers. They're not sending this imagery back up to the cloud, because each scan is gigabytes, sometimes terabytes in data. It's just so many different layers. All that actual inferencing to detect whether it's a tumor happens locally in the hospital, and so they're deploying our solution into every single hospital that they have available and they're essentially able to do the scans locally. That's one use case. Yeah.

John Richards:
That's incredible. Wow. I did not know that, and I'm incredibly impressed at the use cases here that are exploding here and how you can do stuff that you couldn't do before by putting this on the edge. And you also mentioned the AIML that's growing on the edge. What are you seeing about AIML adoption in the edge and just overall in the Kubernetes space?

Saad Malik:
AI has become obviously ... everyone has been talking about it since ChatGBT has really picked up, but AI has been there for the last five, six years. We've all been trying to deploy and manage it at scale, because it does provide unique insights and experiences that were never possible before. It makes the human touch and human experience so much more nicer. The AI technologies that are being built from the training engines are not custom-built, but they're made in such a good way that they interface really well with Kubernetes. I think we talked about the different abstractions Kubernetes provides at storage, at the compute and networking layers, but it also provides additional integration setting on top, everything from your monitoring to your logging and other solutions that sit and integrate really well. These AI solutions that are being built for training interface with any of the integrations you want for your logging and monitoring services.
It makes it really easy to be able to deploy into any environment you have, whether you want to deploy something like Kubeflow or you want to build a serving engine like TensorFlow or PyTorch. We're seeing that, for the most part, the AIs that are being run for training is being done in the cloud because that's where you'd have most of the compute or your data center use cases. But when it comes down to your edge environments, generally you can have as big as a one or two-rack RU servers, but you can bring it down, all the way down, to a single Intel NUC device, maybe even smaller Intel Raspberry Pi, and there you have maybe just a single model, a single in-person engine that is run.
You mentioned the use case about GE HealthCare. We're also working on a really cool project with a company called Tevel Aerobotics. They're essentially ... think about, as an intercultural technology hub, they're building these automated drones that are flying around in farms, and are able to do some really cool operations, like trim the actual trees and prune the branches. If they see that one of the apples is ripe, they can actually have a robotic arm that goes out and grabs that actual apple and puts it back into the bin. Removing the actual overhead of having individuals in the summer heat going and pruning the different trees, grabbing the apples and bananas, and putting them into the bin. That's some real cool use cases for edge.
All of these things, from being able to detect the actual apple and whether it's ripe or not, requires a really unique customized model that is being deployed then to analyze it. Being able to control the robotics that goes and grabs the apple and twists it in the right direction and placing it back, these are all AI that is now run some really cool use cases. Yeah.

John Richards:
This is so many different elements of technology coming together to make what, in my mind, at first, seems like the simplest task, go out, grab an apple, pull it off the tree, but you've got the image detection, you've got the training that had to happen, how many different apples did it have to look at to understand what was a ripe one, plus all the work gone into robotics. But then, at the end of the day, you get something that you can replicate out at scale to do this, and I love it. Thank you for these wonderful use stories here.
Now I did want to ask you a little bit, when I think about doing Kubernetes, it's a lot of work to try and configure this, and as I was looking at some of what you all are doing, you guys really do a lot with templates stacks to really help people get up and running fast. For Dell developers that are out there looking to like, "I love this use case, I want to build something like that," how can they scale up faster?

Saad Malik:
Part of the benefit of Kubernetes is having those different integrations out of the box, whether there's actual infrastructure integrations that come from your logging, monitoring, security integrations, or there's some application services that sit on top. These could be, again, database services. It could be security products, like in your case, your product itself, as well too. Now there are 1800-plus integrations available in the CNCF landscape. For most organizations that are looking at adopting Kubernetes with all these different integration, it's just a piece of complexity. They don't know where to start. And even if they were to choose and identify these are the different components, how do you update and manage our lifecycle and individual clusters, how do you make sure it's properly secure? How do you make sure all the auditing is done? Who has access to these different integrations?
But the complexity skyrockets when you now start scaling to additional clusters, as you go from one cluster to 10 clusters to hundreds of clusters. And what if you are not just dealing with a single cloud? You're not only dealing with Amazon, you're also dealing with Azure and Google and VMware and edge environments. It's just massive, massive undertaking. What we did is what you said, John, is take the concept of being able to tabletize everything from your Kubernetes configuration all the way down to your integrations. We provide a 100-plus out-of-the-box integrations across a different stack of layers you may want to use that are pre-validated and pre-tested across every version of Kubernetes that we provide, and makes it very easy then to roll out these new clusters based in this template. But, John, as I'm sure you are aware, the complexity doesn't really come from provisioning a new cluster. It's always in day two. It always comes to you when you're doing an upgrade, an operation, and changing things.
For us, the template is a two-way link. If you were to make any modifications in your template in this cluster profile concept, you will very quickly see all the cluster profiles and all the clusters that are now out of sync, that are no longer tied to that version of your new cluster profile. And so you no longer have to control all these snowflakes that are individually managed and potentially not configured. And, John, I'm sure you see this a lot too, where you have five environments, I have my dev and stage and dev two, dev three and prod, and I've made some security rule modifications in Amazon to protect against a firewall, but what if I forgot to do it on my last environment? That, I'm sure, happens all the time. Even for our technology, we see the same thing in clusters not being properly configured, but if you have a template controlling the configuration, you know exactly the state of every cluster that is managed out of your fleet.

John Richards:
Yeah, I've totally had that happen, and there's been plenty of stories of folks who accidentally revealed sensitive data because they didn't properly configure these other environments, even though they had production so locked down, but they missed one of those other steps out there, and all of a sudden, you ran into an issue because you didn't have that synced across your environments there.
Security is a big aspect for these different Kubernetes setups, the whole CI/CD pipeline. How do you help make sure that these teams that are deploying out there can know, "Hey, I'm safe deploying regardless of where this Kubernetes environment is, on BRIM, on one of these different cloud providers, the edge." How do people sleep well at night knowing that they're pushing stuff out to so many different places and trying to manage the different security that needs to happen, whether you're securing something local and physical or you're securing something way out there in the cloud that is so abstracted.

Saad Malik:
Obviously, from our perspective, because we don't host our Kubernetes clusters in our cloud accounts, the customer always breaks in their own cloud account. Now, part of that, the requirements of setting up a secure IAM roles, or equivalent in other clouds, is on the user, but we help with very basic validation tools. By the way, John, we do also use Paladin Cloud in our solution itself to be able to, for our operations, make sure we are keeping our cloud account secure and config drifts secure.
Now, when it comes to provisioning the actual cluster itself, all the Kubernetes configuration is already pre-hardened and secured by our distribution. We make sure that all the different rules that come to not exposing your root users, not exposing unprotected socket falls, all those are already done in the Kubernetes cluster itself. When it comes to workload management, obviously, the onus is a little bit on the user deploying their application workloads into the cluster, but we do provide operations like S-box scanning, where we are able to scan an individual cluster across all of its different namespaces and all the different workloads, and we're able to identify what are all the different dependencies that are running inside in every one of your containers. If there are a vulnerability detected, we also use another tool to be able to show are there any ... what CVEs and what are the actual cache versions you can move to to rectify those situations?

John Richards:
That's huge, because SBOMs have become so important. We have everything from the White House declaring, "Hey, we need to be investigating, figuring out how we can standardize this," so I'm glad to see that you're able to help folks manage that at a scale, because it gets out of hand really fast. And we've seen so many different exploits that try to find a weak chink somewhere in that long chain and say, "Oh, here's where I can attack." And if you don't know what all is involved in all the libraries and stuff you've pulled in, you can all of a sudden be like, "Oh, no. It wasn't anything I coded up, but I included something down here I wasn't aware of, didn't have it verified, and ran into this problem."

Saad Malik:
Yep, absolutely. Yeah, true.

John Richards:
What about on the AI side of things, what are you all seeing? Are you looking at ways you can employ this inside of Spectro Cloud to assist your users and customers out there? What does that look like for you all?

Saad Malik:
Two aspects. First, I'll talk about is Pallette Edge AI. As we talk about, John, the number of use cases for edge is just exploding, especially for AI. Even in data center, it is also exploding. But what we want to do is, if more and more models are being deployed onto edge environments, what can we do to help our customers deploy and manage the lifecycle of these models? And that comes in two parts. One is being able to not only model the actual AI model, whether it's based on TensorFlow or PyTorch or Selden, it doesn't matter. How do you model that as part of your template, that cluster profile that we talked about, and then repeatedly deploy it across your hundreds of thousands of locations? Along with that, you also have your inferencing engines that are deployed with that, managing their lifecycle and updating and keeping them secure. That operation is how we provide the ability to model and deploy your AI applications.
But the other flip side is that, again, all this configuration that comes into Kubernetes on how you secure Kubernetes, how you make sure it's running performance, how do you run all the different integrations, if there are any issues, today, it is still a very manual process where I have to make manual configuration changes. I have to go into an actual cluster environment and look at why is this PAW failing? Why is it not able to operate? Is it out of memory, is it because it's not storage quotas, or some other issues? Kubernetes is already very difficult. Asking your development teams, asking your platform engineer teams, to become experts, everyone at this level, it's just too much from our perspective.
We are exploring something called AI Operations, AIOps, and we leverage the power of AI technologies to be able to provide actionable and meaningful insights to the platform engineering users that are actually operating these clusters. When you do see an error, we already have this rich metadata of all the different type of errors that we have seen and learned from. Can we make a recommendation? Oh, it seems that your PAW, your workload, isn't in a running condition because it's getting this specific error called quota error. This quota error is linked to the fact that maybe on Amazon you have no more VPC, blah, blah, blah. The way you should be fixing it is doing X, Y, and Z.
I think the first level of insights would be based on providing troubleshooting recommendations, but at the same time, we can provide recommendations when it comes to cost optimization. We can provide recommendation when it comes to performance optimizations. You're running a m2.large instance, but your workload seems to always be hitting its peak capacity, maybe you should be running a bigger, larger instance. This is the part of being able to provide actionable insights to our end users on how they're running clusters.
But over time, John, I think the next stage, why not automate it? If you are able to identify these recommendations over time, why not be able to automate it such that you don't even have to do anything? And only in the exceptional cases where the AI doesn't know for sure which decision you want to go with, they can provide you recommendations and let you click on one that you want to go with.

John Richards:
Now how hard do you think the cultural change that needs to go with that will be ... for instance, we had some automated workflows that can remediate certain vulnerabilities out there, and a lot of folks love that idea. And then the moment you say, "Well, we need permissions to be able to go in there and do this," they're like, "Oh, wait, let's make it a manual." And is it like folks just need to be around it enough, maybe like self-driving cars, that they finally say maybe it's worth risking, or what's your thoughts on how we drive the cultural change to be able to say, "Hey, I'll trust something like this with making an important decision or remediating that, having that level of access?"

Saad Malik:
Yeah. And I think it definitely changes depending on individuals, it depends if they're from different teams, and different organizations have different risk factors. I do feel that, initially, like you said, it's going to be more about, "Hey, let me let you make the recommendations, show me the recommendations, show me the recommendations, and then I'll go ahead and I'll see, oh, this doesn't even makes sense. None of this make sense. [inaudible 00:22:37]." I think, over time, as I feel more and more confident that every time the number one recommendation is something that is applicable to my team, is able to do the right thing, then I'll be like, "Okay, whenever I see this specific rule, go ahead and automate it over and over again."
But obviously, just like with anything, like what you mentioned with Tesla ... I think first time my wife sat in our car, she was like, "Saad, don't put it on autopilot," and I was like, "Why?" She's like, "I don't trust this yet." But over time, as she saw that we're just relaxing, chit-chatting, and the car is phenomenal identifying object threats that are happening in the road, we all became a lot more comfortable with it. I think it's just one of those things that you have to crawl, walk, run. You're first going to have those recommendations, maybe some basic rules are automated based on my input, and then when you start running and flying, it's like, "Hey, go full speed at it."
I think AI will still make mistakes, as humans do as well, but then it's up to the AI to learn, when I say, "No, actually, this is the wrong thing you did. I would not expect you to have done this. This is the right behavior," it should learn from that and not try repeating the next time.

John Richards:
How do you think folks will be looking at ownership models for something like that? Is it whoever implemented it? How do you map some accountability? Or is it, "Hey, the numbers are just better," so this is just a separate risk category, like, oh, infrastructure went down, but even in that case you're like, "Well, who was managing that? Why didn't we have a fallback plan?"

Saad Malik:
Yeah, I think business continuity and making sure your cluster held, that still has to be number one, because again, there could be exceptional mistakes that happen, whether it's AI or human. Being able to recover from that, I think, is very important. Having a good business continuity plan, disaster recovery, for any infrastructure is always, always required. And, John, even I'm sure, for Paladin Cloud, you do ask customers to make sure you have backups of the configuration so you can recover at any time.
But I do feel that it's still going to be a team. Before, it used to be development teams that would run the Kubernetes clusters, but as Kubernetes has become more of a critical business infrastructure, it's moving up into the platform engineering team's responsibility. They're the team that is managing the lifecycle of the individual clusters. They're the team that's managing the different integrations and the security policy and postures and your performance. All those aspects, the platform engineer team does. If you're using a tool to help with many of those aspects, it's, of course, nice to have, it's going to make things a lot simpler, but the team is still ultimately responsible for operating and running it. If some challenges happen, I think it's going to go back to the team.

John Richards:
Yeah, that's a great point, because we've moved towards this collective ownership as these teams work together. Obviously, there's some reporting chain to your CISO or something like that, but there's so much going on, being able to distribute that is very helpful.
Well, Saad, thank you so much for coming on here. This has been incredibly enlightening. I've loved all the examples that you've shared. It's been a delight. Before I let you go, how can folks learn more about you or connect, and anything about Spectro Cloud you'd like to shout out here to our audience?

Saad Malik:
Well, again, John, thank you for having me. It was a pleasure being here. To learn more about Spectro Cloud, please go to www.spectrocloud.com, that's S-P-E-C-T-R-O cloud.com. And, obviously, we again, focus on multi-cloud Kubernetes management. If you have a single cloud, multi-cloud, just multiple clusters, and you want to be able to manage it at scale, we are the shop for that. Thank you again for having me.

John Richards:
Awesome. Well, it's been a delight. Thank you for being on here, and thank you everyone for listening.
This podcast is made possible by Paladin Cloud, an AI-powered prioritization engine for cloud security. DevOps and security teams often struggle under the massive amount of notifications they receive. Reduce alert fatigue with Paladin Cloud. Using generative AI, the model risk scores and correlates findings across your existing tools, empowering teams to identify, prioritize, and remediate the most important security risk. If you'd like to know more, visit paladincloud.io.
Thank you for tuning in to Cyber Sentries. I'm your host, John Richards. This has been a production of TruStory FM. Audio engineering by Andy Nelson. Music by Amit Sege. You can find all the links in the show notes. We appreciate you downloading and listening to the show. Take a moment and leave a like and review. It helps us get the word out. We'll be back July 10th, right here on Cyber Sentries.