Practical AI

What does “AI at the edge” really mean in 2026, and why does it matter now more than ever before? In this episode, we’re joined by Brandon Shibley, Edge AI Solutions Engineering Lead at Qualcomm’s Edge Impulse, to discuss the current state and future of Edge AI in 2026. We discuss Gen AI, Small Models, and Cascades of Models, along with real-world constraints like latency, power, and privacy. We also dive into the role of MLOps, evolving hardware, and how developers can start building practical edge AI systems today.

Featuring:
Links:
Upcoming Events: 

Creators and Guests

Host
Chris Benson
Cohost @ Practical AI Podcast • AI / Autonomy Research Engineer @ Lockheed Martin
Host
Daniel Whitenack
CEO @Prediction Guard & cohost @Practical AI podcast
Guest
Brandon Shibley

What is Practical AI?

Making artificial intelligence practical, productive & accessible to everyone. Practical AI is a show in which technology professionals, business people, students, enthusiasts, and expert guests engage in lively discussions about Artificial Intelligence and related topics (Machine Learning, Deep Learning, Neural Networks, GANs, MLOps, AIOps, LLMs & more).

The focus is on productive implementations and real-world scenarios that are accessible to everyone. If you want to keep up with the latest advances in AI, while keeping one foot in the real world, then this is the show for you!

Narrator:

Welcome to the Practical AI Podcast, where we break down the real world applications of artificial intelligence and how it's shaping the way we live, work, and create. Our goal is to help make AI technology practical, productive, and accessible to everyone. Whether you're a developer, business leader, or just curious about the tech behind the buzz, you're in the right place. Be sure to connect with us on LinkedIn, X, or Blue Sky to stay up to date with episode drops, behind the scenes content, and AI insights. You can learn more at practicalai.fm.

Narrator:

Now onto the show.

Daniel:

Welcome to another episode of the Practical AI Podcast. This is Daniel Whitenack. I am CEO at Prediction Guard, and I'm joined as always by my cohost, Chris Benson, who is a principal AI and autonomy research engineer. How are doing, Chris?

Chris:

Hey. Doing great. You know, looking forward to another show. And like always, we're getting really edgy out there in AI topics, aren't you?

Daniel:

I'm I'm definitely on the edge of my seat for for this discussion. I've I've been been thinking about it a lot, because today, we we have with us Brandon Shibley, who is the edge AI solutions engineering lead at Edge Impulse, which is a Qualcomm company. Welcome, Brandon. How are doing?

Brandon:

Doing great. It's an honor to be here. Been a fan of the podcast, so it's great to join.

Daniel:

Oh, that's that's great to hear. That's it's always a good connection to make.

Chris:

Any any thanks for putting up with our terrible puns here as we as we, start the show off. We're famous for terrible puns.

Daniel:

Yeah.

Brandon:

I'm here for it.

Daniel:

Nice. Nice. Well, I it's been a while since we've had a full episode talking about edge AI or AI at the edge or machine learning at the edge or however whatever combination of things you you want to make. I'm wondering if you could just give us a little bit of an update or a kind of state of edge AI or AI at the edge in 2026, maybe highlighting first what what does the edge mean in in 2026? And then maybe ways that if if there are different ways in which AI is being applied at the edge than it has maybe traditionally been applied in in previous, previous years or or eras, if you will.

Brandon:

Sure. So, allow me to start with the definition of edge, I take a pretty broad view of the edge and practically speaking, in my mind, it's anything that is not in the cloud. Now, depending on who you ask, they have far more specific definitions and we get into, like, far edge, near edge, edge of network, and all of these things. In my world, we we deal with all of it, so, you know, edge just means we're taking AI, we're gonna embed it somewhere that's that's not in a data center, not in the cloud, but usually close to the real world where real data is captured and where the sensors are. To tech, you know, where is it going at the edge?

Brandon:

The good news is with the, you know, with everything that's going on with AI, we're seeing a lot of innovation around silicon. That's enabling us to embed models at the edge with greater efficiency, more capability, and so we're seeing that, you know, the the industry is adapting to the needs of AI as we're gonna bring it into the real world, and so that's very exciting. We're seeing also some other, you know, we call it pressures or trends, you know, economically speaking, tons of money has been going into AI research, at the same time, you know, the economy is also putting pressure to, you know, to achieve productive outcomes, you know, an ROI on investment, that pressure's always been there at the edge, by the way, so I think what that means is there is a rationalization that's actually pretty healthy, ensuring that when we apply AI, that, you know, it is doing something productive and ultimately achieving some kind of return on investment, so that's a lot of what I end up collaborating with companies on, is really understanding what it means to achieve a positive outcome for them, and then we can discuss like the technical methods on which we're going

Daniel:

to get there. And would you character, I I mean, I know the last few three or four years have been dominated by certain types of models, specifically generative AI models, and many people thinking, of course, of these large models. I mean, it's even in the name, large large language model. And and those, you know, I I guess those models people might not think of as living kind of in the in the physical world or or at the edge. Is that is that a fair assumption?

Daniel:

Or I I know we've seen kind of people talking about SLMs, small language models. How how has that shifted over time? I guess, you know, if we look back five to ten years, the types of models that were being run maybe in disconnected environments or on a factory floor or, at the edge in some sense versus now? Has has there also been that shift as the market has moved to kind of these Gen AI tools?

Brandon:

Yeah, absolutely, I mean, language models are still relatively new phenomenon, right, but they, in the last couple of years, we've seen them kind of explode in different directions. They're getting bigger in the cloud, they're getting smaller at the edge, and that's a good thing, it means that, you know, there's there's a broader range of possibilities to to solve problems with. So we're gonna have trillion plus parameter models that you can never practically fit at the edge that are gonna be in a data center somewhere, and then we're gonna have much smaller versions of LLMs, even SLMs, that we can embed into devices. Know, edge devices are growing to accommodate those small to mid size LLMs. We're talking on the order of, you know, single digit to tens of billions of parameters can be accommodated in some form of edge hardware, even edge AI appliances.

Brandon:

These are things that have, you know, let's say, 64 or under 28 gigabytes of memory, for example. They have powerful MPUs or GPUs to be able to do the inference, and they can be embedded into a premise or even into vehicles or things like that to to accommodate these kinds of models. Now, the smaller size of those models makes them, you know, implications of them being small mean that they don't have quite the same kind of like knowledge capacity as we call it. It's kind of a rough term, but the idea means they're not gonna be necessarily great for retaining tons of, like, real world knowledge, but where they shine is where they're specialized and fine tuned for specific specialized, you know, data. And so there, I think this is what the industry is beginning to become more effective at, is achieving that.

Brandon:

That means doing more with less, essentially. And it's not just SLMs, it's all kinds of AI models where we see this. I personally work with a lot of other kinds of neural networks as well, and there, it's always been about, you know, curating datasets, training those specialized models for specialized needs, and if anything, what we're seeing now is a lot more of combining these models into really interesting, you know, cascades or ensembles of models, in order to leverage sort of the best of all of them, and at the edge, we really have to remain pretty lean, so it means, in many cases, what we're doing is a combination of different lean models to get exactly the characteristics we need to solve problems at the edge.

Chris:

I guess one of the things that I'd like to take a moment and maybe back up a little bit, and as we've kind of dived into smaller models at the edge, but maybe for listeners who have not had experience themselves operating at the edge, could we talk a little bit about, or could you kind of explain a little bit about some of the characteristics that you find at the edge that make it kind of a distinct operating environment that you're having to cater to, you know, in terms of, you know, security, latency, comms between things, but just kind of like the the whole set of characteristics that makes it very distinct from the cloud environment. Because I because I think most people probably listening out there that have done stuff have have been operating in cloud environments instead of this.

Brandon:

Absolutely. I mean, this is a key point. I'm glad you brought it up because it's these constraints are what we have to live and die by at the edge. So what are those constraints? Size, power, connectivity, which may or may not be there or be reliable.

Brandon:

We're dealing also with cost constraints at the edge, as I mentioned, it's, you know, many of these products have to be, you know, sold into very cost sensitive markets, that plays a huge factor. Reliability may be key, latency in the case where we're dealing with, let's say, robots, or anything that's gotta take immediate action in the real world based on the the data that it's, you know, collecting. And then also privacy, so, you know, users are going to be, you know, in many cases, we're talking about systems used by people and the kind of data that's being captured with cameras and microphones and other sensors is sensitive data that should be kept private. And so, that's, another element we're often dealing with. In fact, a lot of these are, you you can think of it almost like double edge issues.

Brandon:

Both the challenge that you face at the edge, but it's also the opportunity. Privacy is a good example of that. Edge is an opportunity to keep that private data at the edge and not proliferate it out onto the internet and into the cloud and in places where, you know, users would prefer it not go. And so, to contrast that with the cloud, obviously, we've got far fewer constraints around power, around compute, you know, usually things are compute in the cloud where latency is less of an issue, although it still may be important,

Daniel:

but, yeah, the

Brandon:

pressure to bring things to the edge is often driven by things like latency, privacy, and, you know, there's, in general, the economics as well, because you're thinking about where is it most efficient to do computation, you'd want to do it near the data. I mean, otherwise, we're, we need connectivity, we're gonna be paying for cloud services. A lot of these systems already have some compute, like, at the edge, and a lot of times it's underutilized, right, it's being used to do certain things, but if you've already got compute there, you can also use it to do a lot of your data computation and AIs. You know, there's a

Daniel:

lot of economical, you know, economic benefits of leveraging the compute at the edge, rather than have to pay a lot to compute at scale in the cloud. And how does this, we've talked about sort of real people using this technology in the kind of physical real world, maybe not at their computer screen. I I know one of the big topics kind of coming into this year that I even I just saw LinkedIn post about is physical AI. How does that jargon kind of overlap with edge AI or relate to it, maybe for people that are kind of trying to parse through some of the hype and the and the jargon?

Brandon:

Yeah. It's it's difficult because there's some jargon and buzzwords, and I think in some way, edge AI, physical AI can be a buzzword, but it's also referring to a real, you know, use case and phenomenon, which is that we can put AI out in the real world. And in the case of physical AI, would say it just sometimes is distinguished from edge AI in that it really relates to taking physical action in the real world. Think about robotics or self driving vehicles. Not only are they sensing the world and making predictions about it, but then they're also translating those predictions into taking action at the edge.

Brandon:

So, I would say, you know, if there is a distinction, that's generally what the distinction is between edge and physical, but there's also a ton of overlap. Obviously, any physical AI is essentially also about sensing and making predictions of the data that's that's out there in the real world.

Daniel:

And could you describe a little bit, I I know there's so many people and, you know, many developers that are probably listening to this show that have primarily interacted with AI through, API endpoints, you know, over over the Internet. Those seem fairly fast in in many cases and there might be some people thinking out there, you know, oh, well, now we have Starlink and we have these these endpoints. What are you talking about with with latency or or these sorts of things? Could you just kinda drive home on that point and maybe with with, you know, theoretical examples or or something illustrating how how sometimes that's not a not not an assumption that that can be made, that kind of connectivity, and then maybe tie that to, tie that to what what might need to be run at the edge in order to not operate at at that model or, in that kind of API endpoint model?

Brandon:

Yeah, absolutely. When we talk about real time performance, it means that we need some kind of response or output of a system within a certain timeframe. Now, what that is, is not, you know, it depends very much on the application. So, if we're talking about a high speed manufacturing line, that may be on the order of microseconds. Take a self driving car.

Brandon:

Again, maybe it's microseconds or single digit milliseconds. If it's a chat app where I'm chatting to an agent and I need a response, it might be on the order of many microseconds or even seconds is acceptable for latency. So, the application really drives home the the requirement, and so it all comes down to what is the requirement for the type of behavior we're trying to get out of the system. Based on that, we can make a decision about, you know, where should the computing be done, where should the the models live, is it acceptable to send that data over the Internet or not, do we need to do it right at the sensor, can we do it somewhere maybe on premise but somewhere else on the network? Those are the kinds of things that we can determine based on those latency requirements.

Brandon:

Yeah. Again, there's a wide range of different possibilities. You know, the great thing, I I think of AI, edge AI, and, you know, even cloud, it's it's we have many tools in the tool chest, you know, we need to kind of approach this from first principles design thinking, which is what are we trying to accomplish at the end of the day, and then that will inform us about what tools we can use to get there.

Chris:

You made a you made a a comment earlier as we were kinda as as we were getting into the description of that edge environment, and you talked about, if I if I'm I think I I can quote you as cascades of models at the edge, and as we're I think, you know, with most people, even even outside of of the industry itself, just people using, you know, Gemini, ChatGBT, Claude, and and they're kind of they're kind of used to thinking of, I'm gonna go to the AI, you know, that's the large language model that that is gonna solve whatever it is that I want to solve. And yet, on the edge, as you've just kind of described all these characteristics that are that are very common that that teams have to address, and, you know, you have lots of potentially different models coming into bear, and some of those are LLMs, and some of them are small they're kind of moving from large language to small language, and some of them may have nothing to do with generative. It may be reinforcement learning in a lot of cases or other types of models that we've talked about on the show.

Chris:

Can you talk a little bit about the relationships of having that cascade of models to the types of actions that you need to take, you know, the sensing and the actions that you need to take on platform when you're at the edge to kinda give a sense of, you know, the different architectural thinking that goes into these edge environments that way?

Brandon:

Yeah. So let me start by giving you an example, where I think kind of makes clear why combining models and cascading them, or you can think of it as like a processing pipeline, is a common pattern that you'll see here. So, you know, the thing about the edge environment is we're often compute constraints, and we're also trying to minimize power in many cases, which means we don't wanna just use the most powerful processing technique we have at all times. If you were using a large language model or maybe a vision language model on camera data and running it continuously on every frame that came through. It's it's a very quick way to burn through a lot of power.

Brandon:

So what we'll do in many cases is we have this pipeline or cascade where on the front end is some kind of very initial detection that can be done very efficiently. So maybe it's an object detector. You've maybe, you know, maybe listeners are are familiar with yellow as a as a common form of an object detector. It

Narrator:

can

Brandon:

be used to detect objects in the frame and maybe we throw away, you know, 99% of the frames that ever come through based on this initial object detection, but then when we see an object that looks of interest, we can maybe use the the bounding box that we've predicted around this object, crop the image out, and then cascade it into something like a VLM, where we can do maybe much deeper or more dynamic analysis on the image, and it can give us, like, much more detailed, you know, metadata about what's there. That's an example of where these cascades are useful, and we don't just use them for image processing. They're they get used for audio. Sometimes we're doing multi stage detection. Sometimes we'll do initial detection, then detect have other object detectors that can detect different features.

Brandon:

So maybe you detect a vehicle, and then what you wanna do is once you've detected a vehicle, now I wanna detect the license plate, maybe I wanna detect certain features of the vehicle, and then based on that information, maybe I can perform some retrieval augmented generation, for example, I'm looking up information from a database of documentation, and then combining all that information to request a response from an LLM, will craft like a, you know, textual reply to a user, for example. So, you know, these are all the tools that we think about using when we're going to solve a real world problem, and, you know, trying to get the best possible performance. I mean, many different constraints and also traits that we're trying to get in the solution that we built.

Daniel:

And I'm wondering, I'm having flashbacks to maybe, I don't know, like, earlier on in my career where a lot of what I was doing was running kind of models next to data. A lot a lot of that due to just the size of the models and how we wanted to deploy them and and that sort of thing. And I remember part of the the the trauma of, not not to, people experience, obviously. We were stressed by the time that's but I'm thinking of trying to get all the right dependencies to get TensorFlow to run with this particular model and, like, all, you know, debugging all of that kind of chain of things. What is maybe from another perspective, from the developer perspective, what taking a look at the kind of state of tooling around edge AI now and, like, the ability you mentioned the advance in hardware, which we can talk on here in a second.

Daniel:

I'd love to kind of hear some of that in a little bit more detail, but just in terms of the tooling, what is the state of that? I'm guessing things have advanced and and changed maybe. But yeah, how how has that advanced? How has the kind of tool toolset and and frameworks kind of advanced to support this these kinds of pipelines?

Brandon:

Yeah. The good news is the industry has responded with options for tooling, and so, you know, I personally work for a company that builds a platform with this kind of tooling called Edge Impulse, so it is a way of being able to easily work with data, train models, tune and optimize them for target devices, and then generate a deployment that's easy to run on a device. That's the kind of state of the art in terms of simplifying this development. Of course, there are frameworks below that, things like TensorFlow, as you mentioned, and others. Those are also, you know, many machine learning developers work directly with these frameworks, but I would say the difference is, you know, people that are and you've seen this in software forever.

Brandon:

Right? Abstraction layers. There are people that specialize at different layers of of this stack, and to reach the general developer, somebody who's not necessarily an expert in TensorFlow or these frameworks, they can leverage, you know, easy to use tools that are out there, and Edge Impulse is a great example of one that's specifically designed for the edge, and the fragmented hardware ecosystem that's out there. The advantage of the cloud is really that there's kind of been largely some, what I wanna say, almost unification around some common hardware. Right?

Brandon:

NVIDIA is obviously very dominant there. It means that most developers are using very similar tooling, targeting a various, you know, similar hardware target, and at the edge, things are still very fragmented. So, you know, this is where using tools like Edge Impulse really does help developers and helps them make develop models that are highly portable, can still also be optimized for the specific features of the hardware as well.

Chris:

As you're talking about your kind of the tooling there and recognizing that, you know, as we've moved from cloud to edge and that maybe the workflow is a little bit different, you're trying to develop systems that are planning and executing, you know, multiple tasks with some level of autonomy, you know, and and the various support framework to ask around that. Could you talk a little bit about what you know, most people you know, we're so used to hearing, you know, about inference in the cloud and stuff, and you still have that at the edge, but you hear the word agency a lot more when we get to the edge, and can you talk a little bit about kind of what that what that that workflow shift and that objective shift is like, and how the tooling impacts that?

Brandon:

Yeah. In a lot of ways, machine learning is is math and it's statistics, and at that level, it's very similar, like Cloud Edge. The similar concepts apply. The difference comes in generating efficient run times that are gonna work on a processor at the edge versus, you know, a GPU and server in the cloud. And, you know, also, there are also other differences that I think are pretty important, like, how do you get data from the edge?

Brandon:

How do you continuously deploy newer and greater models? We talk about MLOps as, like, a best practice here, which is just because you've deployed a model out into the real world doesn't mean that it's going to be, you know, it's always gonna be good enough. The world changes, right? And sometimes we're also deploying things into new environments. Those models will need to be adapted and approved, and the way we do that is collect newer data from, you know, over time, we talked about this concept called drift, where the world changes for whatever reason, and so the model will perform less well in these new environments.

Brandon:

So we'll have to get new data, you know, train a new version of the model and redeploy it out. And so that's can be challenging in the physical world. These devices, they live out in a world where, you know, connectivity may be an issue, the environments vary vastly, unlike the cloud where you have very uniform environment, centrally managed, it's highly distributed and chaotic out in the real world. So, that is also one of the major factors that comes into play here.

Daniel:

And when you're thinking about, I guess, that distributed nature of the environments that you're working with, immediately my mind goes to sort of like complication and control. Like, how do you how do you govern and manage both the operational component of that and the governance component of of that? What would have what's been learned, I guess, as some of the best practices and thought process that goes into making sure that you as you have more and more of a distributed set of things out there in the world, you you have some concept of kind of control or governance or however you put put that.

Brandon:

Yeah. Where possible, where these devices are connected to the Internet, we still leverage, you know, that connectivity in order to manage the devices. That means that, you know, we're still centrally managing a lot of this in the cloud. We're obviously aggregating a lot of data to do training in the cloud, be able to generate models from data that's been captured from many different devices. It helps us train more generalized models than if we were to try to train a model on a per device basis.

Brandon:

Right? Because each device has only got a small sliver of the total universe of data. By bringing all the data together, we can train models that are really more generalized to work broadly throughout the whole world, and and the same goes for how we're gonna manage deployments as well. So if we can bring that connectivity of those devices centrally, it means that we can also roll out new versions of the model in a controlled way, often using something like an over the air update framework as a way of helping manage not just the software on the device, but the models as well. So revision control, all these best practices that we have from software, and, you know, we at the edge have been dealing with that for quite some time.

Brandon:

We can now apply also to the models that we're deploying to the edge.

Chris:

You know, as as we're talking about models at the edge, one of the things that that has definitely been very pronounced has been this this as we've moved to smaller models in terms of number of parameters over time, and you're kinda comparing like where we're at today with that and the advances there, which maybe a lot of folks aren't you know, the general public's still focusing very much on the frontier, you know, large language models out there. That's what they read about most of the time in in the news. And maybe this is one of those topics that kinda gets missed is the advances in smaller models. Can you talk a little bit about the fact of of, like, what can you do, you know, now that we as we are doing this in 2026, discussing this, and and, you know, you have some incredibly capable models that are small that may have 3,000,000,000 parameters, you know, instead of many times that number of some of yesterday's large language models. Can you talk a little bit about why those smaller models have gotten so effective, and what are the decisions that you have to make when you're using these small models that for both for their strengths and their weaknesses so that you can kind of put them in an architecture that makes it work for the mission that you're trying to address in that You're correct.

Chris:

I think like the what's happening with the state of the art is kind of overshadowing some of the, you know,

Brandon:

also advancements that are happening at the the edge and with small models. The good news is also a lot of that is applicable to what we're doing at the edge as well, and so one of those techniques that we use is knowledge distillation. So a way of leveraging big powerful models and being able to distill out the knowledge into a small model, and this is one of these techniques that allows us to achieve this. We don't need, like, the whole universe of knowledge into a small model that's meant to do something very specialized. We only need the knowledge that's relevant to that specialized thing, and so these knowledge distillation techniques mean that we can use big, models, extract basically their knowledge through a lot of, you know, if it's a language model, then we use a lot of, like, you know, hit it with a bunch of queries where we can get a response, train a simpler model based on that, and there are other techniques we use as well.

Brandon:

Fine tuning as well, so, you know, taking a model like this, fine tuning it specifically on the data that's gonna work on in its specialized task. There's a lot of those techniques, and then of course there's non generative models too, so these classically have always been pretty purpose built on data sets that are targeting specialized use cases and enabling us to generate very small models. The Edge Impulse, we've been able to we work with a lot of wearable devices, so this is like the smallest of edge devices you can think of, wearable rings, for example, pointing to microcontrollers. We've always been able to do that using, you know, what's been coined TinyML, right, small machine learning models. So, yeah, there's a whole spectrum of possibilities there, many techniques that are applied,

Daniel:

and yeah, I think it's great that we've been able to leverage, you know, the advances that continue to come in the frontier of AI. And I I know Brandon that you mentioned that Edge Impulse has their kind of own take on some of the the framework and the tooling used to enable some some Edge AI. I I'm also curious, you know, Edge Impulse now being a a part of Qualcomm or a Qualcomm company. There's kind of a vertically integrated, I I I guess, component to that. I'm not gonna, you know, put you on the spot to talk through why, you know, Qualcomm would wanna acquire Edge Impulse or something like that.

Daniel:

But but could you talk a little bit about maybe first kind of Edge Impulse's unique take on or opinionated take on how the the tool set should should look for enabling these kinds of workflows? And then maybe also if there's anything relevant to that kind of vertically integrated, take on on Edge AI that kind of makes makes a a vertically integrated approach maybe appealing in in certain ways?

Brandon:

Absolutely. Yeah. So Edge Impulse is the leading Edge AI platform. It's also the reason why. And really the goal for it, I mean, it to be the leading platform really had to deal with the diversity and the fragmentation of the silicon in this space and continues to do so.

Brandon:

Our, you know, so, like, our opinionated take on how to serve that space has really been to try to, you know, I think of it as kind of a duality when it comes to hardware. Right? We're trying to, like, in some ways, abstract away all the hardware differences. The machine learning is essentially math and statistics, and so we, on some level, wanna treat it that way. Then, when it comes time to deploy, we do target aware optimization and, you know, generation and conversion of the models for for those targets.

Brandon:

Right? So, by kind of thinking about it in those two different terms, we have this flexibility to go and serve the broad market. Now, how we bring in and empower, like, the the processors and platforms from Qualcomm is we make sure that Qualcomm is, of course, supported best in class with this optimization and tuning and leveraging all the the competencies that the Qualcomm has, so that means, you know, extreme power efficiency and leveraging their accelerators, you know, we're talking about, like, the Hexagon MPU, for example, that's in Dragon Wing processors used in many different use cases, industrial, and also things like automotive, you know, everything from, like, very low power infrastructure out in the world up to very powerful, like, I mentioned AI appliances. These are, like, basically AI servers that go on prem, So it's a broad portfolio there that they, you know, awesome range of different silicon with and it's not just the MPUs as well. It's DSPs.

Brandon:

It's ISPs. It's a lot of specialized processing, so it's a lot that we can tap into and leverage in order to bring, you know, the most efficient models out to the edge device. Yeah. So I think in a lot of ways, Edge Impulse hasn't had to change its opinion about the world. It's like we understand how we need to be able to go out and bring ML into the edge space, and, you know, it means also being able to accentuate all the different silicon that we can serve with our platform.

Chris:

I'm curious, kinda to to dive into the hardware again a little bit more, because, again, I think this is a little bit of a new topic for folks that are used to, you know, big servers that you're plugging in in a data center and, you know, or cloud environment. You know, all of these things are are battery driven out of the edge, know, kind of by definition, if they're a moving platform. You know, we're talking, you know, autonomous vehicles and stuff like that. When you're when you're looking at trying to do that computation out there, and, you know, you have you have neural processing units that have become quite, you know, quite advanced. To your point about Qualcomm, and you see that the number of operations per second per watt are really have gotten pretty amazing in terms of what they can do.

Chris:

How has that how has that changed the math of or or kind of changed the way you think about operations at the edge when you're talking about different platforms that don't have traditional power available? That efficiency that you're having there, how has that how is that kind of yielded new capability at the edge for battery powered devices?

Brandon:

Yeah. It it certainly means we can do more, and so just the amount of ML that we can bring or the size of the models obviously allows us to scale out. And, you know, that's very you know, it's just it means that what we've previously been able to do, we can just keep building on in ways that allow us to do bring more intelligence, more processing. So, you know, I don't think it's anything more than that necessarily. It's just that extreme efficiency means, like, when when developers are building a product, they're trying to differentiate, they're trying to do better than the last iteration of the product, they're able to get processors that are both efficient, very power efficient.

Brandon:

They have the compute so that they can go and deploy models or their software in a way that's gonna give them best in class performance, and that translates to being able to market their end product, you know, competitively or best in class relative to all their competitors. So, yeah, that's the way I I look at it, and, you know, it's there's always an opportunity to do more. I mean, I think that's the way when once you have sort of AI in the in the tool chest, there's you you kinda just broadens the perspective of, like, what could the world be like if we put intelligence right where the data's at. Suddenly, the possibilities start to explode, and that question is like, well, what's actually feasible with the devices that we have there, or that we're gonna put there in the next generation, and so on. And it's usually power and cost constraints.

Brandon:

So, you know, that's how the calculation usually works out. I hear from folks all the time, like, have really interesting, sometimes crazy ideas about what they want to achieve, and that's exciting, but we also, again,

Daniel:

are forced to rationalize a bit. What brings real value to the end users of these products? What can, like, bring, you know, let's face it, revenue for the companies building these products? And that helps, you know, it's a forcing function for making sure that what we're building is actually valuable ultimately. And some of that's really, I mean, I get excited thinking about some of those possibilities.

Daniel:

Also, I love the idea of kind of creativity that comes when you're working in constraints and, you know, trying to to work through some of those things. As, develop if you think of developers or kind of AI practitioners out there, do you have any recommendations for the person who is maybe inspired by this conversation and says, hey, I want to try an AI at the edge thing. What might be a way that they could, you know, not necessarily, there's all sorts of use cases as you mentioned, but, you know, create some type of lab environment or maybe it's Arduino or or whatever that is. Do something, you know, create a a kind of minimal setup, that would help them explore and experiment with some of these Edge AI things? Where should they start both on the maybe on the hardware side and the software or kind of use case tooling side?

Daniel:

What what's a good starting point and how can they get going?

Brandon:

Yeah. What I think is awesome about Edge AI is it's you can honestly think about any real world problem out there and and start to think about how can I go and, like, solve it with AI? And we can put these processors anywhere now, so that's the first place to start. Is there something interesting or a pain that, you know, somebody's dealing with? There's so many cool projects that people have built just around their homes because they're like, there's not something that does this for me, and so I can take a simple board, maybe it's from Arduino, you know, I can create a simple model in Edge Impulse, by the way, it's free to sign up, so in terms of tooling, you know, that's a great place to start, And then, go and solve your problem.

Brandon:

Right? Like, do you does your basement link and you wanna know when it leaks? Create a leak detector. Super easy. Do you wanna detect when your cat walks by so you can dispense some food from your cat feeder?

Brandon:

You can do that too. It's amazing that so many of these things are readily achievable with commodity, like, maker hardware that's out there. That's a great place to start. And we see this even in enterprises, like, you know, I've been a developer in enterprise, and I've known many of them. They also use a lot of this, like, stuff to get started, and it's a easy way to generate a proof of concept, and once you've got a working example, then, of course, you go get some real enterprise hardware.

Brandon:

Qualcomm's got a lot of it, so definitely check it out, and, you know, you've got tools like Edge Impulse, which also scale into production and can support you when you go up to, you know, serving models at scale with, like, true MLOps, continuous deployments, all of that as well. So, you know, long story short, I think there are some great examples. I'll with Arduino here. Great options for getting people started on these projects for very inexpensive amount of money. Check out edgeimpulse.com.

Brandon:

You can sign up for free. Start using it. And there's also great content to help you get started using these tools.

Chris:

That's a great answer, and I point out a very fun answer to go implement when you're actually bringing these capabilities into your own life, into your own world, you know, not just through your work or whatever, or through your phone on an app, but actually having things happen that you said I wanna go do. So great answer. I guess as we are winding up and we're kind of looking at the future, you know, you have Edge Impulse and Qualcomm and the kinds of work you're doing, and you also have the larger edge space. And where is is one of the things we like to ask, and you may have heard this on other episodes, is just like, where do you think things are going? And and, you know, the the nature of the question's a little bit less structured in the sense of, like, kind of when you're not trying to solve a problem and you're just kinda letting your mind wander, what kinds of things do you think of that might come to pass?

Chris:

They might not, but they might come to pass as you're looking at this industry you're in that excites you and you kinda go, that's where I really wanna go. Like like and I know that there are other people that would probably want that too. What are those kinds of thoughts that you have about where edge where edge compute may be heading over the next few years?

Brandon:

Yeah. I think so the way I think about it is, like, what if power and cost and compute, they basically kinda go to almost zero or, like, the the cost of these things, right? It means that we could put intelligence literally anywhere right at the edge, and it what we're where we're at today with a lot of intelligence is it's kind of in the cloud, so it's gated by connectivity, you know, the cost of using the cloud, and things like that. I think when you're at the edge, and I think it's also important to think about, like, biological intelligence, right? We've had these incredible organisms, right, that have sensors, have intelligence directly where the sensors are, and like, we see the world around us that we've managed to create with that.

Brandon:

It's incredible, that's like just the most amazing inspiration. What if we can get closer to that with AI? And so, the realm of possibility is enormous. What I see is that we're gonna continue bringing models to the edge, more of them. We talked about cascades and things like that, think is how we one of the techniques we go.

Brandon:

And then there's also, you know, world models, VLAs, and stuff on that spectrum as well, where we're talking about, you know, very large models, and if those become more economical to the bringing the edge, it means that they bring real more, like, true intelligence about the broader world, and the ability to act in the world. So, you know, I also think that these action models are gonna become more prevalent. We're seeing that in robotics, self driving, but also many other places they could be applied. So, you know, that's what I see. I think we're gonna see a lot more robotics, which is gonna be exciting and and interesting.

Brandon:

And, hopefully, like, even you know, maybe they're, like, robotic like systems, but things that just live around us and can help take action in the world using intelligence.

Daniel:

It's awesome. Yeah. Well, I I'm certainly excited to

Brandon:

see,

Daniel:

some some of those things. And, I really encourage, folks out there. You you have no excuse to not go experiment and try some things, with all the great hardware and tooling available to, you know, experiment with some of those things in your own you know, that fit your own passions and your home environment or or wherever that is. So, really appreciate you coming on the show to to inspire those things, Brandon, and the work that you're doing with Edge Impulse. Appreciate that and and hope to have you back on.

Brandon:

Yeah. It's been a real pleasure. Thank you for, having me on.

Narrator:

Alright. That's our show for this week. If you haven't checked out our website, head to Practical AI. Fm, and be sure to connect with us on LinkedIn, X, or Blue Sky. You'll see us posting insights related to the latest AI developments, and we would love for you to join the conversation.

Narrator:

Thanks to our partner, Prediction Guard, for providing operational support for the show. Check them out at predictionguard.com. Also, thanks to Breakmaster Cylinder for the beats and to you for listening. That's all for now, but you'll hear from us again next week.