HPE news. Tech insights. World-class innovations. We take you straight to the source — interviewing tech's foremost thought leaders and change-makers that are propelling businesses and industries forward.
Aubrey Lovell (00:10):
Hey everyone, and welcome back to Technology Now, a weekly show from Hewlett Packard Enterprise where we take a look at what's happening in the world, and explore how it's changing the way organizations are using technology. We're your hosts, Aubrey Lovell.
Michael Bird (00:22):
And Michael Bird. Right. So in this episode, we are keeping it colder than a mid January Thursday in the UK, diving into the world of liquid cooling. It's become a topic of much discussion, we are intentionally avoiding any hot topic puns, over the course of the last year and is being hailed as an essential part of future data center and HPC architecture. But what makes liquid cooling so essential? How is this a major step up over the cooling found in consumer PCs for the last decade or more, and why does it actually matter to the rest of us?
Aubrey Lovell (00:59):
Well, if you are the kind of person who needs to know why what's going on in the world matters to your organization, this podcast is for you. And if you haven't yet, subscribe to your podcast app of choice so you don't miss out.
(01:10):
All right, Michael. New Year, new you. Let's get into it.
Michael Bird (01:14):
Let's do it.
Aubrey Lovell (01:18):
Okay. So liquid cooling has become a significant talking point in HPC and super computing design in the last three to four years. In fact, according to a survey by The Register earlier this year, 38% of IT professionals expect to employ liquid cooling infrastructure in their data centers by 2026, up from just 20% at the start of 2024, and we've linked to that in our show notes. But why go through all the trouble? Well, it's all about the thermal efficiency. Essentially, it can wick away heat a lot faster.
Michael Bird (01:51):
Yeah. Especially as AI workloads take off, the need to cool components running close to their thermal limits has become increasingly pressing, and we're not talking about a few pipes here. Take the HP-built Frontier exascale computer at the Oak Ridge National Laboratory in Tennessee. According to a report from the laboratory, Frontier has to pass between 6,000 and 10,000 gallons of water over its components every minute to cool them. That's between 22 to 37,000 liters running over every component. The cooling system alone weighs around 500 tons, and we've linked that article in the show notes.
(02:31):
So why is it only now becoming a core part of enterprise level supercomputing architecture? What's changed to encourage this new technology? Well, I recently had the chance to meet with Benjamin Kufahl, senior liquid cooling expert in HPC and AI at Hewlett Packard Enterprise.
(02:48):
Benjamin, welcome to the show. Can you quickly explain what we mean by liquid cooling?
Benjamin Kufahl (02:53):
Yeah, good question. So with traditional cooling infrastructure, we use air. The reason is air is everywhere. It's plentiful. It's easy. If I have a fan, I can move air. If I move that air through a, it's called a heat sink on a device, I can pull the heat out of that device and reject it to the air. Very simple technology, and that's what we've done in the past.
(03:14):
Liquid cooling is a little bit different. What we utilize is we bring coolant into the server itself. We use the liquid to absorb the heat from the devices, whether it's a CPU processor, a GPU accelerator, but we absorb the heat into a liquid coolant versus a gaseous air coolant.
Michael Bird (03:31):
And the reason why we use liquid is because liquid is more efficient at, I guess, absorbing heat and losing heat at the other end?
Benjamin Kufahl (03:38):
Exactly, yeah. So it's really two things. So the ability of the fluid to absorb heat is almost four times higher with the water-based coolant versus air. The other thing is density. So the liquid is around 1000 times more dense than air, meaning that per kilogram of fluid moved, it's about 4,000 times more effective at removing heat from a server.
Michael Bird (03:59):
So I've never been into a data center, but I used to work in an IT department that had a server room. And my resounding memory is just the sheer volume of the screaming fans. So that's what most data center equipment is cooled by air at the moment. So why are we talking about liquid cooling? Air has been working perfectly fine for the last few years.
Benjamin Kufahl (04:18):
Yeah. The reason's very simple. So I've been doing this for about 10 years. I started as an intern in 2015. But in those days, we were selling processors that were about 85 watts, maybe 150 watts in power. Today, we're doing 500 watts at the chip, okay?
Michael Bird (04:19):
Right. Okay.
Benjamin Kufahl (04:35):
The GPUs, the accelerators we were doing 10 years ago at 300 watts, we're doing today almost 1000 watts, and five years from now, we're already looking at 2000, 3000 watts. So the power of the device has gone up so much that air cooling has either become impossible or highly impractical. So we just need those fluid property benefits that I talked about earlier in order to do the heat transfer out of the device.
Michael Bird (04:59):
So liquid cooling has been around for years. I'm quite familiar with it in the PC building space in there. They're all-in-one coolers that are a closed loop or cooling systems where you have to make your own pipes and connect it to the GPU and the CPU. And so how is the technology different from data center? Is it like that but just more extreme?
Benjamin Kufahl (05:18):
Exactly. The core principle is exactly the same. So I also have a liquid-cooled PC and CPU.
Michael Bird (05:25):
Which if you didn't, I would be very disappointed.
Benjamin Kufahl (05:27):
Yeah. Right. But the technology is very similar. In fact, we have servers that we sell today with more or less the same technology.
Michael Bird (05:27):
Right. Okay.
Benjamin Kufahl (05:35):
So the main fundamental difference when you get to the big leagues, so to speak, with some of the high-performance machines we have is we both collect the heat with the fluid in the server and then we reject that heat to another liquid. So it's not going in the data center at all. It's going into another liquid, being pumped outside, then rejected to the environment or reused. It can be reused, which I'll talk about later perhaps.
Michael Bird (05:56):
So you touched on the point that CPUs are using more power. I guess they're getting faster. What is the reason? What is the driver?
Benjamin Kufahl (06:05):
Yeah. Good question. So basically what happened is we got to a plateau in terms of how small we can make the transistor inside the silicon. So the only way to get more performance in that same silicon is to compact more and more transistors together in the same space. So the power densities have been going up extremely high. And so at the same time, we don't have a lot more, call it real estate. The size of the chips have not grown that large. So yeah. We need to get more and more heat out of a device that's the same size. So the watts per square millimeter, you could say, are going extremely high.
Michael Bird (06:38):
Wow. And presumably, the AI word comes into it because presumably, with the rise of AI, there's also the need for more GPU density, I guess also more CPU. Everything is just running at higher temperatures, higher frequencies. We're asking more of our systems.
Benjamin Kufahl (06:57):
Exactly. The AI workloads tend to use GPUs, and GPUs tend to be very power hungry. Also, a second note I'll make is as we combine these transistors into a smaller space, there's an industry term called case temperature, T case we call it. But basically that means what's the maximum allowable operating temperature of the chip? That number has been going down at the same time the power has been going up. So not only is it more difficult to cool, so we have to have better cooling systems. We actually have to cool it to a lower temperature than just a few years ago. So it's a combination of factors that's really driving the need toward liquid cooling.
Michael Bird (07:32):
So what does the future of cooling look like? Do you see a future where liquid cooling is the default?
Benjamin Kufahl (07:38):
It's a good question. Yeah. Certainly to meet the performance requirements of these faster and faster chips, you do need liquid cooling. I'll put a caveat. You could theoretically cool with air, but the practicality of doing so is not beneficial because it takes such a large air cooled heat sink to remove all of that heat from there. So the density, again the kilowatts per square meter of your data center, it's only really possible with liquid cooling. And so that's the one thing. Yeah.
Michael Bird (08:07):
So you can fit more compute in a rack if it's liquid cooled?
Benjamin Kufahl (08:12):
Yes, exactly. Exactly. And the future of the technology, I would say as we continue to see this trend in power and performance going up, liquid cooling will become a necessity. I do not think that we'll see very much air cooling, especially in the AI space anymore. There will always be some use cases for air cooling for lower powered servers, but with the workloads that we're seeing today, liquid cooling will become the standard. We're already seeing that actually. The vendors are basically specifying a liquid cooling solution directly to us.
Michael Bird (08:41):
And in the air cooling world, there's lots of standards in terms of fan sizes and fan connectors. Are there standards in the world of liquid cooling? Are CPU blocks, if that's the terminology we use in data centers, are they coming with standardized connectors so you can sort of a mixing match and?
Benjamin Kufahl (09:01):
Yeah, that's a good question. So there's a few key players in the server cooling world, and that has been a drive towards standardization. Because the chip sizes and locations inside the server change very frequently throughout the different generations, the goal has been to standardize how you can connect these chips together or cool memory or cool drives or whatever the device is, and then basically provide a server inlet and a server outlet where you can do 100% liquid cooling. That's the goal. I don't think everyone's really there yet, but that's something that we care about a lot at HPE as well, is how do we iterate quickly as the new devices come out? And how do we provide a liquid cooling solution for those devices?
Michael Bird (09:37):
And when we're liquid cooling a server in a data center, are we talking every single component there?
Benjamin Kufahl (09:42):
That's a good question. The traditional way to get in liquid cooling has been to go for the low-hanging fruit, which are the CPUs and the GPUs. These are your biggest power devices. If you just cool those components, you can get 70% cooling. We do that, 70% liquid cooling, excuse me. We do that with a Cray XD platform today. The 30%, the last 30% is the most difficult part. So like you mentioned, memory cooling for example, dim memories for CPUs, drive cooling, nick cooling, all of that stuff, we can do it. We do it with the Cray EX platform, but that's the challenge, is getting to it actually 100% fanless, which is something we've had now for some years. We do do the low-hanging fruit stuff, but we have offerings for something more than that for people that want the latest and greatest.
Michael Bird (10:22):
I read somewhere that different components like different temperatures. Like memory likes to be slightly warmer than a CPU. I don't know if that's true. So does that add some additional challenges?
Benjamin Kufahl (10:33):
It does indeed, because when you have a liquid-cooled server, you have a few dozen different components that each have their own temperature limit. And your job as a thermal engineer is to make sure that the right amount of cooling gets to the right place at the right flow rate, at the right temperature. So it's really a balancing act and that's something that we learn and iterate on as we build the next generation servers and learn from the past and try to do it better the next time. Yeah. So you're right. Each device has a threshold and some devices are more tolerant to heat than other devices. So we always really have to pay attention to that as thermal engineers.
Michael Bird (11:06):
Does the rise of AI mean that liquid cooling will become a necessity rather than a nice to have?
Benjamin Kufahl (11:11):
I do think so. I'll make one more note about that, which is that it will become a necessity, I think, but it's not just that it's a necessity. There are a lot of tangible benefits. We talked about earlier, just with the fluid properties, but it leads to a lot of savings for the customer in the long term. So that's something we try to do at HPE, is educate the entirety of the process. It's easy to do air cooling, we know that, but the long-term benefits come from liquid cooling infrastructure.
Michael Bird (11:34):
Yeah, yeah. So I've, much of the boring all of my colleagues, I've got a heat pump in my house. And what I found really fascinating is how you are able to get heat from what feels like nothing, and transfer that heat around your house and actually do something useful with it. Is that the sort of thing that businesses are starting to do with the waste heat from liquid-cooled data centers?
Benjamin Kufahl (11:58):
Yeah. Good question. Yeah. It's a great question. We actually do do this today with the Cray EX. We have customers that reuse the heat, the waste heat from the computer, and use an intermediate heat pump to upgrade the temperature of that heat essentially. So coming straight from the computer, most of your district heating facilities, especially the older ones, they won't have a modern enough infrastructure to utilize that heat directly. So what you do is you put a heat pump in between that and your district heating. So the heat pump raises the temperature of the coolant that goes out to the district heating. So you can actually reuse the computer's energy to heat your home, for example. So that's all part of the kind of sustainability around liquid cooling.
Aubrey Lovell (12:38):
Thanks, Michael. It's great to hear from Benjamin. Very good insights there.
Michael Bird (12:44):
Now it is time for today alert, the part of the show where we take a look at something happening in the world that we think you should know about.
Aubrey Lovell (12:51):
And it's another EV story from me this week, Michael. Two in a row. Lucky you.
Michael Bird (12:56):
Yeah.
Aubrey Lovell (12:58):
So researchers in Germany have unveiled a paint which can collect enough solar energy to power an EV for 32 kilometers. So that's around 20 miles on average every single day. The team from a well-known German auto manufacturer have been experimenting with the solar paint since 2022. Now, initially, they covered the roof and hood of a car and hooked it up to the internal 12-volt electrical system, which powers the vehicle's basic electronics. The test was so successful that they've now covered the entire car in the paint, giving it five times more surface area and hooked it up to the car's high voltage EV system.
(13:35):
During tests in the Northern Europe climate, the system produced enough voltage to give over 12,000 kilometers, which is about nearly 7,500 miles of free power every year. Testing in sunnier areas like Los Angeles provided enough power to cover an entire commute around 35 miles or 55 kilometers every day. Now let's hope that the tech will roll out commercially in five to 10 years, easing a huge burden on electrical grids and consumer utility bills. Very interesting.
Michael Bird (14:05):
Wow. I bet you get one mile here in the UK because it just rains all the time. It'd be great in Florida, wouldn't it? Great where you are.
Aubrey Lovell (14:13):
Oh, definitely.
Michael Bird (14:14):
Well, thank you for that, Aubrey. That was brilliant. Right. Well now it's time to return to my interview with Benjamin Kufahl to talk about liquid cooling.
(14:25):
So the rise in gen AI and the AI-optimized high-performance computing has massively increased the demand for liquid cooling, as well as putting more strain on traditional architectures. How are you adapting to that challenge?
Benjamin Kufahl (14:35):
So one way we're adapting at HPE is understanding that there really is no one-size-fits-all solution with liquid cooling. We do the really high-performance stuff, direct liquid cooled, 100% fanless, but we also offer everything in between. So all the way in the server, like we talked about with your home PC having a CPU that just projects the heat to the room, we can do that. We have products that basically, you can fill a rack with air-cooled servers and put a rear door heat exchanger on there. The rear door heat exchanger takes liquid coolant from a data center and it cools a radiator coil on the back of your rack. So you still have air-cooled servers in the rack, but those air-cooled servers reject their heat into a water radiator.
Michael Bird (15:15):
So it doesn't go into the room?
Benjamin Kufahl (15:17):
So it doesn't go into the room. So it stays room neutral. We have solutions where we can still use warm water cooling with the adaptive rack cooling system. This is closed loop air. So instead of rejecting the air to the room like with the rear door, we keep it in a closed space so we can use warm water. Still do air cooling, but capture it into water and reject it via a series of radiators basically, all the way up to 100% fanless. So we've got solutions all the way.
Michael Bird (15:40):
So it can be a, almost feels like it's retrofitted onto the back of a standard like off the shelf air-cooled server.
Benjamin Kufahl (15:45):
Yeah. Exactly. Exactly. Because there has been understandably some resistance to liquid cooling because it is a more complex technology. Providers aren't really familiar with it. So we offer very entry-level stuff to get your feet wet, I guess, no pun intended. And also some very advanced stuff with the 100% fanless.
Michael Bird (16:03):
Yeah. Let's talk sustainability. How can we make cooling sustainable, and if so how?
Benjamin Kufahl (16:07):
So the interesting thing is that the liquid cooling, by virtue of the physics of the fluid, is a sustainable option. Talked about the performance benefits that just go back to the physics of the fluid. What that means is you as the data center provider are not spending as much energy to cool your data center if you embrace liquid cooling. You get 15% energy savings at the chassis. There's a study that HPE has done to quantify this with an XD product. Depending on where your energy comes from in the data center, we see an 86% cost savings, and depending on where that energy is sourced from up to 87% carbon reduction.
(16:42):
So one of the major benefits of liquid cooling is that it is inherently a sustainable way to have these new technologies and do it responsibly.
Michael Bird (16:51):
Yeah. I guess the thing with air cooling is that the air just gets wasted into the space, isn't it?
Benjamin Kufahl (16:55):
Yeah.
Michael Bird (16:56):
All right, final question. Why should our organizations care about liquid cooling?
Benjamin Kufahl (17:00):
Yeah. So actually the answer is really bringing all of that stuff together. So we see right now this explosion in interest in AI. We see data centers being built like crazy. We want to try to be responsible with how that happens. Obviously, we have to care about things like climate change. We want to be able to have our cake and eat too. So we want to be able to have that technology and not feel bad about it. So going into liquid cooling is a way that we can not only be more efficient with how we utilize these energy resources, but also use that energy to do something useful.
Michael Bird (17:31):
Amazing. Benjamin, thank you so much for your time.
Benjamin Kufahl (17:33):
Yeah, absolutely. Thank you.
Michael Bird (17:34):
We really appreciate it. Thank you so much for being on this episode of Technology Now.
Aubrey Lovell (17:37):
Thank you so much, Ben, and thanks for bringing us that, Michael. What a fascinating conversation. And you can find more on the topics discussed in today's episode in the show notes.
(17:48):
Okay. So we're getting towards the end of the show, which means it's time for this week in history, a look at monumental events in the world of business and technology which has changed our lives. What was the clue last week, Michael?
Michael Bird (17:59):
Well, Aubrey, the clue last week was it's 1861, and this invention really lifted us up and kept us there. Did you get it? I think I thought it was something to do with a hot air balloon or something, but it's not. It's the anniversary of the patenting of the safety elevator by the American inventor, Elisha G. Otis of Yonkers New York. His idea for a hoisting apparatus hinged on the special safety brake, which would spring out and latch onto a toothed rail on the side of the elevator shaft if the lifting cable broke. The patent application came a decade after Otis first came up with the idea, and four years after the first prototype was tested. Unfortunately, Elisha died later in 1861 and didn't live to see it catch on. Instead, his sons took on the idea, making it a global sensation.
Aubrey Lovell (18:51):
Amazing. Thanks, Michael. That was a very elevating discussion. And the clue for next week, it's 2009, how do we spell extinction backwards?
Michael Bird (19:03):
Aubrey, it's N-O-I-T-C-N-I-T-X-E. Have I got that right, or is that the clue?
Aubrey Lovell (19:10):
It seems like it's on the right path.
Michael Bird (19:11):
That is the clue for next week, isn't it?
Aubrey Lovell (19:12):
I think producer Sam has something up his sleeve for sure. And that brings us to the end of Technology Now for this week. Thank you so much to our guest, Benjamin Kufahl, senior liquid cooling expert in HPC and AI at Hewlett Packard Enterprise. And to you, thank you so much for joining us.
Michael Bird (19:30):
Technology Now is hosted by Aubrey Lovell and myself, Michael Bird. And this episode was produced by Sam Datta-Paulin and Alicia Kempson-Taylor, with production support from Harry Morton, Zoe Anderson, Alison Paisley, and Alyssa Mitri.
Aubrey Lovell (19:41):
Our social editorial team is Rebecca Wissinger, Judy Ann Goldman, Katie Guarino, and our social media designers are Alejandra Garcia and Envar Maldonado.
Michael Bird (19:51):
Technology Now is a Lower Street Production for Hewlett Packard Enterprise. We'll see you the same time, the same place next week. Cheers.