Explore the evolving world of application delivery and security. Each episode will dive into technologies shaping the future of operations, analyze emerging trends, and discuss the impacts of innovations on the tech stack.
00:00:05:11 - 00:00:30:29
Lori MacVittie
Welcome back to Pop Goes to Stack, the only show where emerging tech meets actual ops and neither walks away clean. I'm Lori MacVittie, and today I'm your snarky Sherpa through the silicon nonsense because we are talking about DPUs. And I'm sure that acronym actually has meaning. But we're going to wait and let our guest explain it. Or or maybe Joel.
00:00:31:01 - 00:01:04:00
Lori MacVittie
But DPUs are increasingly in the news. We talk about them. Our recent research is showing that enterprises actually factor them into their buying decisions. So as they're building out these AI factories or farms, whatever you want to call them, 65% told us that DPUs are a buying consideration as they're going about getting all the hardware. Now for the 80% of those that are running inference themselves,
00:01:04:06 - 00:01:32:23
Lori MacVittie
so they're not just consuming AI as a service, they're actually deploying inference and running it locally in the cloud or in the data center, 71% of those consider DPUs an important buying decision. So the question is, well, what are DPUs and why are these organizations interested in them? Right. To help us,
Joel Moses
Right.
Lori MacVittie
we've got co-host Joel, and he is Joel today,
00:01:32:25 - 00:01:35:21
Lori MacVittie
not AI, I think.
00:01:35:21 - 00:01:37:21
Joel Moses
Yes, I, well, you couldn't tell from the suit could you?
00:01:37:21 - 00:01:45:01
Lori MacVittie
No. That's actually what threw me off is that you had a suit, not a vest. I don't know what's going on or who you are, but welcome.
00:01:45:03 - 00:01:47:25
Joel Moses
Topsy turvy world.
00:01:47:28 - 00:02:03:19
Lori MacVittie
And to really dive into DPUs and all things PU. We brought Tim Michaels, who is a distinguished engineer and really an expert in these kind of hardware discussions. So welcome, Tim.
00:02:03:21 - 00:02:05:04
Tim Michels
Yeah, glad to be here.
00:02:05:06 - 00:02:13:24
Joel Moses
So, Tim, I figure we can probably start by talking about what exactly a DPU is and and the fact that they're not always called DPUs right?
00:02:13:25 - 00:02:15:09
Tim Michels
No, sometimes they're called IPUs.
00:02:15:10 - 00:02:35:20
Lori MacVittie
No.
Joel Moses
Sometimes they are. It depends on who you buy them from. So a DPU stands for data processing unit. In some parlance it's called an IPU or an infrastructure processing unit. But effectively what they are is an ultra smart smartNIC. Can you talk about some of the architecture behind the DPU, Tim?
00:02:35:22 - 00:03:02:20
Tim Michels
Sure. I mean, the DPUs have their roots in smartNICs and smartNICs evolved from, you know, basic networking NICs by adding hardware offload components that sit in front of the CPU and the basic NIC functionality. But DPUs took this a step further. And so DPUs added a compute complex, sometimes lightweight, but often now we're seeing much heavier, beefier compute capabilities.
00:03:02:20 - 00:03:19:07
Tim Michels
And that combination of the standalone compute complex combined with the domain specific hardware accelerators that come from the smartNIC world, and then plugging that into the smartNIC location, often in a server gives you a new kind of capability.
00:03:19:09 - 00:03:33:27
Lori MacVittie
A system on chip. Isn't that something that we used to call it? Because it's it's almost an entire system on a card that gets plugged in, which is typically the way that we see these kinds of offload and acceleration capabilities.
00:03:34:05 - 00:03:58:26
Tim Michels
And it's really more than just a system on a chip. While there is a big ASIC, which is a system on a chip, as you say, put on a circuit board that's going to plug into a PCIe slot. That PCIe card brings with it even more capability. It brings with it memory. It brings with it storage, it brings with it a BMC, it brings its own network identity, its own self boot capability.
00:03:58:28 - 00:04:01:27
Tim Michels
It's basically a now a server on a card.
00:04:01:29 - 00:04:03:06
Lori MacVittie
Okay.
00:04:03:08 - 00:04:16:09
Joel Moses
So why exactly would, you know, it's a bit like someone saying, hey bro, I heard you like compute, so I put compute inside your compute. Why exactly would they do that sort of thing? And what are the common use cases?
00:04:16:11 - 00:04:41:16
Tim Michels
Yeah, I think there's a lot of good reasons. You're trying to get this separation of concerns. You're trying to get capabilities that maybe shouldn't be in the CPU for security or performance reasons, and moving them off of the host CPU. You're trying to lift entire operational stacks, the networking stack, the storage stack, the security stack. And even in the hyperscalers case, the virtualization stack.
00:04:41:19 - 00:05:01:23
Tim Michels
Let's move the hypervisor off the CPU. Why? Again, we want to give all of our compute to the application, and we want to create security barriers that say, a compromised network or a compromised host can't compromise the infrastructure. And make no mistake, the DPU is where the infrastructure is going.
00:05:01:25 - 00:05:26:21
Joel Moses
Got it. I think it's really important that you noted that that hyperscalers are typically using these. In fact, a lot of the hyperscalers are either devising or partnering for their own, their very own hardware design of a DPU to use in their infrastructure. We we've we've seen that inside of Google. I believe that Microsoft has announced the Azure Boost card, which, which is used exclusively in Microsoft Azure.
00:05:26:23 - 00:05:46:08
Joel Moses
And they find, they find those cards of particular value for offloading infrastructure workloads like tunneling or, or setting up mainline services on those cards or isolating workloads one from another. But the, and what other, what other industries have we seen uptake of DPU cards in?
00:05:46:10 - 00:06:07:24
Tim Michels
Yeah. I think you're right to talk about the hyperscalers. In a sense, the DPUs were sort of born out of the Nitro projects at AWS and all of the major hyperscalers now have a DPU project of some kind, either homegrown or partnered with an external vendor. The tier two, tier three cloud guys are using the merchant silicon, but they're largely copying it.
00:06:07:24 - 00:06:30:21
Tim Michels
And it's it's because you can get that isolation. And the one point you missed that DPUs allow Hyperscaler to provide is, is bare metal access, right? They can create a tenant now that looks like it's running on bare metal. Well, it's not really. It's running on the DPU abstraction. But to the to the renter of the node, it looks like bare metal, which was a capability
00:06:30:23 - 00:07:07:15
Tim Michels
the clouds, you know, the customers were looking for and they've been able to deliver through DPUs. But as you say, the DPU penetration has gone far beyond hyperscalers. And we see in places where things like noisy neighbor behavior and separation of functionalities, you know, it's like CNF, those are places that DPUs really shine. And so we've seen that penetration in the 5G and the service provider, all the way from the vRAN world, all the way back into the cores. Where they're being, they're trying to segment things, they're trying to control the deployments.
00:07:07:15 - 00:07:16:23
Tim Michels
They're trying to manage the security and the performance in a reliable, predictable, and resilient way. And DPUs are a big piece of that.
00:07:16:25 - 00:07:44:19
Lori MacVittie
Well, they're a lot like, you know, in the past we had SSL accelerators way back when they were, well devices then, they were cards, right, designed to offload that from the CPU, as you mentioned, in order to optimize performance, because the cards could do it faster. We've seen that with TCP, right? We'll offload that from the NIC. And eventually we get to these, you know, GPUs and they're pulling certain functionalities off of the CPU to help the CPU
00:07:44:19 - 00:08:01:20
Lori MacVittie
also have more free cycles, right. Let the server serve right. The inferencing server needs to be able to do its thing without having to worry about all of the network capabilities, the delivery, the security pieces, and that goes on the DPU.
00:08:01:23 - 00:08:26:06
Tim Michels
Yeah. And in particular, we're seeing that beyond the 5, the 5G service provider, but we're seeing it in these AI clusters, right. Where the compute of the CPU is there to feed the GPUs, which are super expensive. And so any compute cycle you waste, it's potentially causing the GPU to be underutilized, which is wasted TCO on a large scale.
00:08:26:08 - 00:08:38:23
Tim Michels
So, using the DPUs as a way to free up CPUs as a way to maximize your GPU utilization, creates the most efficient cluster overall, if that all makes sense. That's a lot of -U's.
00:08:38:26 - 00:08:44:24
Lori MacVittie
It was a lot of -U's. I'm like
Joel Moses
Yeah, a cluster of acronyms. Yeah, now I
00:08:44:26 - 00:09:08:21
Tim Michels
If you think about it, if you run your networking stack on your CPU, that means it's not front ending the workload for the GPU. And if you can push that networking stack off to a DPU, if you can terminate your VPN tunnels, if you can terminate your eVPN BGP routing, that's now work the CPU doesn't need to do. And you're going to deploy that work to feed the GPUs.
Joel Moses
Right.
00:09:08:23 - 00:09:25:27
Joel Moses
Now, the DPU can also present itself to the host in various forms. So you can actually stand up, for example, an interface that is tied directly to a VPN tunnel, for example, that the that the device controls. And in that way you can also ensure multi-tenancy, can't you?
00:09:25:29 - 00:09:26:15
Tim Michels
Yes.
00:09:26:18 - 00:09:37:05
Joel Moses
Yeah. With with bright with bright lines between them. Now, I think it's safe to say, though, that we haven't really seen an uptake of DPU technology into the enterprise as of late, yet.
00:09:37:05 - 00:09:59:29
Tim Michels
Right? So I've long thought, and others in this market, that the enterprise is ripe for a DPU takeover. But we haven't really seen it emerge at at scale. But we do see the right trends, right? We see the the large repatriation of workloads back from the public cloud, back to the enterprise. We see the build out of private cloud to serve those workloads.
00:10:00:01 - 00:10:20:08
Tim Michels
We see the utilization of these private clouds, and to engage in the the emerging AI workloads. And all of this infrastructure and all of this build out is attracting a lot of forward thinking about, hey, how should we do this? They're looking at how the hyperscalers have built their private clouds, and that leads them right to the DPU use cases.
00:10:20:11 - 00:10:43:20
Tim Michels
And then this, you know, in the enterprise, you get an even larger footprint. It isn't just the networking stack now. It is the virtualization offload. It is the storage offload, which I think you touched on, right? This ability to take NVMe over interfaces, treat those as local devices to the DPU, but they're really abstractions that are served out over the network, or the powerful use case.
00:10:43:20 - 00:11:03:08
Tim Michels
And then the security we also haven't talked about. Enterprises, service providers, everybody needs security. If you can set up your DDoS barriers, if you can set up your firewalls and you can do that on the DPU, then you can protect the nodes and the applications that are sitting behind there without affecting those workloads' computation.
00:11:03:10 - 00:11:41:14
Lori MacVittie
It really it really fits into the context is the perimeter talk around AI, where you really need to protect, you know, every instance there individually, almost. You can't just assume that something externally will be enough. You have to bring it closer to the actual workloads. So we see a lot of that thinking going on, and the data is showing that they're interested, but it's also showing that a lot of them are physically isolating their AI clusters, farms, factories, from their data centers or cloud installations.
00:11:41:14 - 00:12:01:21
Lori MacVittie
So, you know, it may be the case that they're still building this out. And I haven't seen a lot of indication that this stuff is scaling out to the point where it would need DPUs at the moment. But they're clearly looking at and understand that the day is coming when they do need that, because otherwise, why would it be a buying decision?
00:12:01:24 - 00:12:41:13
Joel Moses
Yeah. You know, I think one I think one of the factors is simply that enterprises don't know enough about DPUs in general. I mean, these things have been effectively isolated to hyperscaler use, where they have been used to reduce cost and deliver tremendous return on investment in inside of hyperscaler environments and service provider networks. And those are what, those are kind of what I like to call the harbinger networks, the canary in the coal mine, so to speak. The places where innovation is done that eventually enterprises become very, very cognizant of and very desirous of. So, how the hyperscalers oriented and did workloads on demand directly contributed to how
00:12:41:13 - 00:13:05:21
Joel Moses
enterprises adopted and used private cloud technologies later down the road. I think we're probably going to see some interest in how hyperscalers are solving some of these challenges, and DPUs will be a part of that. One thing that I'm very confused about is that enterprises also have a strong interest in microsegmentation technologies, and DPUs are really good at microsegmentation.
00:13:05:21 - 00:13:37:01
Joel Moses
In fact, they're they're they're super great at creating hard, defined boundaries around workloads for hosts. But but you don't necessarily see, microsegmentation and DPUs talked about in enterprise space today. Instead, people often leverage software that configures host security boundaries to create microsegmentation in enterprise space. But but so far and, I don't think the DPUs have been able to breach that for enterprise uses.
00:13:37:04 - 00:13:42:03
Joel Moses
What else do you think, Tim, is keeping enterprises from adopting DPUs?
00:13:42:09 - 00:14:01:24
Tim Michels
Yeah, I, I think that one of the major things is just the immaturity of the ecosystem. I think the, the out-of-box experience for DPUs is still not great. It's they're difficult to deploy. They're hard to get up and running. You can't, I guess you can in a limited way by turnkey systems from people like Dell,
00:14:01:24 - 00:14:18:07
Tim Michels
but again, even when you get those, there are issues as you try to stand them up and use them in the way that you want to use them. So there's that piece, there's the out of box. The vendors aren't doing enough. They need to partner more deeply with the server vendors to package these things so they're more turnkey.
00:14:18:09 - 00:14:39:01
Tim Michels
The other side of that is where is the software? To your point, where are the ISVs that are going to bring the software, the firewall software, the routing software, where are the ISVs that are going to provide those solutions qualified, tested, ready to go and deploy them on DPUs? Well, F5 is is making motions in that direction.
00:14:39:01 - 00:14:58:29
Tim Michels
We have our BIG-IP Next for Kubernetes, that is deployable on a BlueField-3. We hope to bring that to other DPUs in the near future. And that's bringing these application delivery services and deploying them on the DPU in front of the workload that sits behind it on the host. And and I think we need to see more of that.
00:14:58:29 - 00:15:19:14
Tim Michels
We've we've seen Check Point, we've seen Palo Alto, a few other vendors dip their, you know, dip their toe in the water. But, I think that needs to be more pervasive. And enterprise customers need to feel like there is a deep bench of vendors ready to serve their DPU infrastructure if they're going to spend all this money on the deployments.
00:15:19:16 - 00:15:43:14
Lori MacVittie
What is it, is it the case, I mean, when you talk about software, you know, going on a server, is it we have that down? We know how to do that on hypervisors and even in Kubernetes. Is it a significant challenge to take traditional software, infrastructure software, and get it to run on a DPU? Is is that part of the challenge that's kind of holding back the ecosystem?
00:15:43:14 - 00:16:05:00
Tim Michels
Yeah, I think in the lab, you know, you can do it with an engineer. Not everybody has engineers floating around to do that work. And and even if you do it on a 1 or 2 servers, that's not the same as saying 100 or 1000 in a fleet. And so that whole concept of lifecycle and fleet management is largely an unsolved problem at scale.
00:16:05:03 - 00:16:37:16
Tim Michels
And you see in the at least the Kubernetes side of things, people trying to develop operators that will allow you to deploy and operate software on your DPU using a Kubernetes operator. That's probably the right approach. These operators, though, are a little clunky today, and they're a little too opinionated. And so they're very, you know, they're serving a specific vendor's goals as opposed to broadly serving the industry across all vendor types, which is what the end customers would really like to see.
00:16:37:18 - 00:17:01:25
Joel Moses
Yeah, an ecosystem of things that can be deployed on DPUs. Not a single thing that can be deployed on a single DPU. You know the, you mentioned fleet management, and I think that that's a fairly critical thing. You know, enterprises are used to managing fleets of compute in very vendor centric ways, using, you know, on board lights out management, for example, from one particular vendor.
00:17:01:25 - 00:17:21:26
Joel Moses
And they get used to, to, to how they bootstrap and load system software on those systems. And then when you take and you put, say, two DPUs in one of those systems, it changes managing one compute node into managing three independent compute nodes. And, and so it's not, it's not so much, you mentioned the Kubernetes operators,
00:17:21:28 - 00:17:35:15
Joel Moses
it's not so much bootstrapping. It's not so much getting software onto the DPU, it's bootstrapping the DPU from scratch with nothing on it, loading based system software, and then using the operator to load applications, isn't it?
00:17:35:17 - 00:17:57:06
Tim Michels
Yeah. And you have to solve all that. I mean, even once you've bootstrapped you might have to push updates, you have to push CVEs, you have to push patches. So the whole infrastructure of the DPU needs to be like a server. You have to be able to push stuff to it or have it have it pick up those updates in those patches, as well as deploy the applications on top of it.
00:17:57:09 - 00:18:22:27
Tim Michels
So it's it's it's the full server problem that now pushed onto the DPU. And and there are these, these opinions about, you know, how this should look like. Should the DPU be a peripheral of the host? Should the host be the peripheral that the DPU? Should they be peers? Should the DPUs workloads be in the same cluster as the workloads on the host? Or should they be in their own clusters?
00:18:22:29 - 00:18:33:14
Tim Michels
We have an app, we have an application cluster for the host. We have a service cluster for the DPU, right? These, and all of these affect how you build your operator.
Joel Moses
Right.
Tim Michels
What's your point of view, right?
00:18:33:17 - 00:18:39:25
Joel Moses
And all of these, and all of these are things that hyperscalers, they have their own way of doing these particular things. And
00:18:39:27 - 00:18:44:26
Tim Michels
Yes, they've made their choices and their opinions and they've solved for that specific
Joel Moses
For themselves.
00:18:44:26 - 00:18:55:20
Joel Moses
Yeah, but for the, for the larger enterprise, this is something that that as of yet remains unsolved. What is the industry doing to try to help solve these fleet management and standardization issues?
00:18:55:23 - 00:19:21:27
Tim Michels
I think individual vendors are offering their sort of bespoke solutions, even if they they say, well, it's open source, but really it's designed for their stuff and in the manner that they think you should use it. There is the OPF, the Open Programable Infrastructure. OPI, sorry, I misspoke that. So OPI is attempting to solve some of these problems, but we're not seeing the level of vendor and customer,
00:19:21:28 - 00:19:30:14
Tim Michels
you know, industry engagement that's really moving the needle there. So, while it's a forum where we discuss many of these issues, we're not really solving issues. So,
00:19:30:17 - 00:19:41:21
Joel Moses
And then I think I've also seen some involvement in, in the, in the SONiC organization, of of switching and routing stacks, adopt some DPU mechanisms, but not all.
00:19:41:29 - 00:19:59:01
Tim Michels
Yeah. This is Microsoft driven and SONiC+DASH deployed on DPU. And that's, that's, that's largely being consumed by Microsoft. They make it available. They'd like everyone to use it. But again, it's very oriented toward the Azure use case.
00:19:59:03 - 00:20:24:18
Lori MacVittie
So I'm hearing, I'm hearing a lot of, there are a lot of still, I guess, big hairy questions that need answers. And I don't think we're going to solve them in our limited time today. But, you know, I think we have learned a few things, like what a DPU is, right? What purposes it serves, and that there's interest, but the market's not really ready.
00:20:24:18 - 00:20:29:12
Lori MacVittie
That's that's what I learned today. Joel? Tim? What did you learn?
00:20:29:14 - 00:20:52:22
Joel Moses
Yeah, I, I, I well, I learned, that that, you know, there is a large burgeoning market out there. But it seems to be isolated right now to the hyperscalers and service providers. And those are the ones that are particularly interested in this. Some of them are, of course, interested in that in terms of AI workloads, ensuring multi-tenancy for AI workloads, which is is definitely an area that they are investing.
00:20:52:24 - 00:21:17:16
Joel Moses
I think I read a number that it, right now the DPU market's around 5.5 billion. And that's largely still just driven by non-enterprise workloads. I do think that enterprises will will cotton to this eventually. They learn from the hyperscalers. And when they learn that the hyperscalers are saving money and ensuring security using certain types of technology, they'll become interested in the same types of technology.
00:21:17:19 - 00:21:22:19
Joel Moses
But I've also learned that the industry has a long way to go to make this enterprise grade. Tim?
00:21:22:22 - 00:21:51:17
Tim Michels
Yeah, I would think, you know, the people forget enterprise is often the last to adopt. So, they are not the early adopters. They're safe, they're secure, this, they're conservative. So, I think we will see, as we saw the spread of DPU technology from hyperscalers to service providers, you're going to see it spread into enterprise. And like all technologies, you'll think, oh, it's just creeping along, but it'll reach a tipping point and then boom, it'll just be there everywhere.
00:21:51:17 - 00:21:56:28
Tim Michels
So, look for the tipping,
Lori MacVittie
Awesome
Tim Michels
look for the tipping point, it's coming.
Lori MacVittie
Tipping point.
00:21:56:28 - 00:22:07:27
Lori MacVittie
Awesome. Well, that's a wrap for this episode of Pop Goes the Stack. Subscribe now because the next collision needs spectators and maybe a cleanup crew.