Pop Goes the Stack

Multi-model AI isn’t a buzzword anymore, it’s how organizations are actually operating. In this episode of Pop Goes the Stack, Lori MacVittie and Joel Moses dig into fresh findings from F5's State of Application Strategy Report showing companies run an average of seven models, and more than half are already orchestrating multiple models together. That’s a big shift, and it changes what “infrastructure readiness” even means.
 
Why do teams chain models in the first place? The answer: cost, capability, and risk. The uncomfortable part? Most infrastructure is still built for deterministic systems, and AI routing is not the same problem as load balancing. Model routing isn’t about spreading traffic evenly. It’s about making a decision on every request: which model is best for this job, what will it cost, what’s the risk, and what’s the fallback when the answer is wrong or low quality.
 
Joel frames it as a category change, from “where should this request go?” to “what should happen as a result of this request?” That shift forces new requirements: policy enforcement across models, identity-aware access, decision justification, and mechanisms to recover when output quality degrades due to drift, configuration changes, or poisoned inputs like compromised RAG data. Lori ties it back to governance, not just availability, and why “AI workloads” expose gaps that traditional tooling can’t cover.
 
While many organizations are operationalizing AI, that doesn’t mean it’s manageable yet. If you want to know how to move forward from here, this is an episode you don't want to miss.

Get your copy of the 2026 State of Applications Strategy Report

Creators and Guests

Host
Joel Moses
Distinguished Engineer and VP, Strategic Engineer at F5, Joel has over 30 years of industry experience in cybersecurity and networking fields. He holds several US patents related to encryption technique.
Host
Lori MacVittie
Distinguished Engineer and Chief Evangelist at F5, Lori has more than 25 years of industry experience spanning application development, IT architecture, and network and systems' operation. She co-authored the CADD profile for ANSI NCITS 320-1998 and is a prolific author with books spanning security, cloud, and enterprise architecture.
Producer
Tabitha R.R. Powell
Technical Thought Leadership Evangelist producing content that makes complex ideas clear and engaging.

What is Pop Goes the Stack?

Explore the evolving world of application delivery and security. Each episode will dive into technologies shaping the future of operations, analyze emerging trends, and discuss the impacts of innovations on the tech stack.

00:00:05:23 - 00:00:29:21
Lori MacVittie
Welcome back to Pop Goes the Stack. The show where innovation meets infrastructure and suddenly you need a rollback plan. I am Lori MacVittie, watching the logs in real time. And they move really fast. I just wanted to say that. Really fast. So everybody loves to say that they're doing multi-model AI. Well, we actually do research every year and we found out that

00:00:29:22 - 00:00:36:26
Lori MacVittie
whoa, yeah, no, they are. So Joel, who is doubling as our guest this week.

00:00:36:28 - 00:00:39:14
Joel Moses
Proud to be here.

00:00:39:16 - 00:00:41:25
Lori MacVittie
Yeah. You had no choice, it was on your calendar.

00:00:41:27 - 00:00:43:27
Joel Moses
That's a good point.

00:00:44:00 - 00:00:52:08
Lori MacVittie
Would you like to guess how many models an organization is running on average? So give me a guess. Unless you've read the data.

00:00:52:09 - 00:00:59:27
Joel Moses
I have absolutely no clue about this. I think I will guess somewhere in the low double digits. Or are we quite there yet?

00:01:00:00 - 00:01:04:26
Lori MacVittie
No. Not quite. You want to guess again? The points don't matter.

00:01:04:27 - 00:01:06:07
Joel Moses
Let's say seven.

00:01:06:13 - 00:01:33:28
Lori MacVittie
You read something.

Joel Moses
Oh, I must have read something.

Lori MacVittie
Yes, it is seven. Yeah, or

Joel Moses
Lucky number,

Lori MacVittie
I told you. Yes, lucky number seven. So they're running an average of seven different models. Now that's interesting, because one of the other data points that we picked up from our annual research, which is the State of Application Strategy, is what they're doing with them, because 52% are actually doing multi-model orchestration.

00:01:33:28 - 00:01:44:07
Lori MacVittie
So they're chaining these models. Which is kind of fascinating because I didn't think we were going to jump like that fast to that kind of a technique.

00:01:44:09 - 00:02:03:26
Joel Moses
Yeah, it

Lori MacVittie
But

Joel Moses
it doesn't surprise me. I mean, when you people are making choices about the models that they use, they either use a really high reasoning one that's expensive to operate, and then they usually have some that are less expensive to operate. And so cost containment, cost management, I'm sure that's definitely a concern for a lot of organizations.

00:02:03:26 - 00:02:05:24
Joel Moses
This stuff can be really expensive.

00:02:05:28 - 00:02:32:27
Lori MacVittie
Costs did come up elsewhere in the data as one of the most important mitigating factors for AI as a challenge, right?

Joel Moses
Yep.

Lori MacVittie
So one of the other things that we asked about was where they believed that application delivery and security--because that's what we do--where they thought that was going to have the most impact on AI and the systems that they're building.

00:02:32:27 - 00:03:01:04
Lori MacVittie
And we gave them several options like, you know, the prompt layer--so input. Output layer, right, what's coming out, check for hallucinations. Kinda at the edge, you know, where you do authentication and stuff, because we hear identities coming up a lot as a big challenge. And then model routing. So the prompt and the output layers were actually the most, or at least that's what people said will have the most impact.

Joel Moses
Sure.

00:03:01:12 - 00:03:14:27
Lori MacVittie
But 17% also said that model routing would be the most impactful layer. So it wasn't security. So that's kind of what, you know, was interesting.

00:03:15:00 - 00:03:36:01
Joel Moses
So you know what my gut feeling on that is. Like, first of all, I'm kind of impressed that it's 17% at all. Like when you put a word out there like routing, people's minds automatically go to things that are fairly deterministic. And it, to me, the fact that 17% said that it's the most important is actually kind of low.

00:03:36:07 - 00:03:46:07
Joel Moses
I actually think, you know, when you're building with AI models, it's like saying, "I'm building a fleet of self-driving cars, but steering, eh, that feels optional."

00:03:46:09 - 00:04:11:14
Lori MacVittie
Well, I think you maybe unintentionally or intentionally kind of hit on what that problem is, is that there is a gap in capability around model routing right now. So I think they're thinking ahead. Model routing will be the most impactful, but we have to get there yet. Because we've spent decades training ourselves to think in terms of load balancing.

00:04:11:14 - 00:04:37:28
Lori MacVittie
That's how you scale things. You use a load balancer. This is just how it is, right? You distribute requests, you optimize for latency, you keep systems available. We talked about all those things. But that mental model breaks the moment you introduce AI, because model routing isn't about spreading traffic evenly and distributing it fairly.

Joel Moses
Yeah.

Lori MacVittie
You know, it's really about which model is best for this request.

00:04:38:00 - 00:04:43:16
Lori MacVittie
What's the cost? You mentioned that. What's the risk? What's the fallback if I'm wrong?

00:04:43:19 - 00:05:03:04
Joel Moses
It's interesting you mentioned that. So on one side you have the traditional methods of routing, which are very much based around same input, same output, Meaning I've got a metric and I'm measuring the metric, and then I'm trying to balance that metric against the output that I want to achieve. And literally it's like putting a ruler up against something, right?

00:05:03:07 - 00:05:33:14
Joel Moses
And then creating an outcome that directs the ruler, you know, the measurement up and down the ruler based on need. But now we've got like similar same output, but it's probably similar output unless the model version changed, unless the prompt drifted, unless the context window filled up, unless the temperature got bumped, unless there's some built in problem with the model that, you know, sends it in one direction and inclines it towards releasing data that should be held private.

00:05:33:15 - 00:05:58:00
Joel Moses
And we're still trying to measure that with tools that are designed for certainty. We're trying to measure uncertainty, probabilistic things with tools that are designed for certainty. It's like trying to measure a fog with a ruler. Right? And I think that that's probably the root of this issue, and that we're going to need to get a lot better on the model routing side.

00:05:58:03 - 00:06:00:04
Joel Moses
That's very clear to me.

00:06:00:07 - 00:06:22:24
Lori MacVittie
Yes, yes. And I think you hit on it, right. Load balancing is not routing. To be fair, routing was never load balancing in the first place. Right?

Joel Moses
True.

Lori MacVittie
It was about how do I pick the next, you know, next path.

Joel Moses
That's right.

Lori MacVittie
Well, not the next path, the next destination. We do that in the network. We do that at applications.

00:06:22:24 - 00:06:44:28
Lori MacVittie
We do that throughout the entire stack, actually. Model routing implies a lot more, which you just pulled out, right. This is not about distribution anymore, which is the role of load balancing; it's about control. It's about controlling the traffic and where it goes based on 100 different variables, many of which we still don't know how to measure.

00:06:45:02 - 00:07:16:13
Joel Moses
True. I also think there's a vocabulary change necessary. Instead of thinking distribute, optimize, and failover, which is kind of the stock in trade for a lot of the systems that do this type of thing, this type of delivery control. Now you have to get into, evaluate, decide, and justify. Right? Which are different things, fundamentally different operations. And, you know, that's an area definitely that we need to improve technologically.

00:07:16:15 - 00:07:38:24
Joel Moses
So it doesn't surprise me, I guess, that only 17% are identifying model routing at this point. I think people aren't really thinking about necessarily the number of models that they have in their toolkit. When it rises past two digits, I think people are really going to start getting interested in this area. But I think people kind of undersell. Like it's good to look at the inputs and outputs,

00:07:38:25 - 00:07:51:12
Joel Moses
absolutely. But looking at the input and choosing the right model to surface the correct output is, I think, the next order of business for what we've got to achieve in AI.

00:07:51:14 - 00:08:20:02
Lori MacVittie
Yes, I like to call them decision point, now. They're points where decisions are made and those decisions are: which model is best for this request?

Joel Moses
Right.

Lori MacVittie
Which is a way different question than this class of requests, which is usually how we look at it, right? Z- scalability is about, right, looking at the request itself, but it's also this request is a class and it's the same and it doesn't really matter.

00:08:20:03 - 00:08:56:12
Lori MacVittie
So.

Joel Moses
Right.

Lori MacVittie
It's all interesting, but most organizations haven't gotten there yet. You're right.

Joel Moses
Yeah.

Lori MacVittie
They're still treating it like a fancier load balancer. But that also kind of explains, you know, another data point that we pulled out, which is 35% say their infrastructure is not ready for AI workloads. And me, I think they're right, because the infrastructure today is designed for deterministic systems and they don't magically handle probabilistic ones just because

Joel Moses
True.

Lori MacVittie
you want them to.

00:08:56:15 - 00:08:58:27
Joel Moses
True.

Lori MacVittie
Right? They're not built for it.

00:08:59:00 - 00:09:15:18
Joel Moses
Now, do they say that because their infrastructure isn't ready for AI workloads, or that they aren't really sure what role infrastructure can play in AI workloads? Is that, what do you think is here? Is it an uncertainty? Or a certainty that it's not ready?

00:09:15:20 - 00:09:40:02
Lori MacVittie
I think it's a certainty that it's not ready because those same people, also 30, no, 53% of them are preparing for agentic AI with identity aware infrastructure. So they're well aware that, right, things need to change. And I think they're recognizing with this 35%, it's not ready. We're not there yet.

00:09:40:04 - 00:10:00:25
Joel Moses
It seems to me that infrastructure in the era of AI has to change the base question that's being asked. So here's how I would ask a question of traditional infrastructure. I would say, "where should this request go?" That's the question I would ask of the infrastructure. With AI infrastructure, the question uses the same verb, but the meaning is different.

00:10:00:26 - 00:10:29:00
Joel Moses
It's, "what should happen as a result of this request?" Same verbs, wildly different existential crisis being served by the infrastructure there. But you really need to think about it in terms of a higher order thing. It's not just things where I'm making a coin flip, it's more like a GPS system where I'm asking the system to tell me: what other traffic is there, what routing mechanisms am I going to use?

00:10:29:00 - 00:10:52:19
Joel Moses
Am I going to use something that's cost based? Do I need to choose the most efficient route, or do I need to choose the most scenic route? Ooo, that's an interesting idea, the most scenic AI route. It's literally the difference between a coin flip and a GPS route plan. Right? And I think that we're going to have to improve infrastructure to get to that point where it surfaces that in a much easier to use fashion.

00:10:52:21 - 00:11:19:22
Lori MacVittie
Yeah. And that's kind of the interesting point. It's not just making a decision about where it needs to go and how it needs to go. It's taking into consideration all of the factors that determine that. Like maybe model A is the best, right, for that request--that's what the GPS would say--but one of the constraints is you can't take highways and that road is a highway.

00:11:19:22 - 00:11:31:28
Lori MacVittie
Or maybe this driver is not ready to drive on that highway, so they can't go to model B. So there's factors on both the input and on the outcomes that have to be combined.

00:11:32:00 - 00:11:45:16
Joel Moses
Exactly. Or the quality of this road has degraded over time, so there's actually a better path forward through the other model.

Lori MacVittie
Oh, yes.

Joel Moses
Right. That's definitely something that the system needs to be able to composite. It's an interesting space.

00:11:45:19 - 00:12:08:07
Lori MacVittie
It's very interesting. And that's why you know treating model routing like traditional load balancing is, it's a category error, right, to go to logic there. And I think that's why so many organizations are unprepared. There's one calling you right now. And they would, they would like advice.

00:12:08:09 - 00:12:33:01
Lori MacVittie
I mean ultimately it's not the models that are the problem. I mean, okay, granted models are kind of wonky. They do, right, context drifts. They can hallucinate. They don't always have all the information necessary. That's a problem, but it's a different problem. It's not really an infrastructure problem. That's the way they operate. So we have to deal with that in a different way.

00:12:33:02 - 00:12:46:24
Lori MacVittie
When you look at things like how to route the traffic to it, the problem isn't the model, it's the assumptions that lead to that choice.

Joel Moses
I agree.

Lori MacVittie
And that's where we really have to step back.

00:12:46:25 - 00:13:07:00
Joel Moses
I agree.

Lori MacVittie
Yeah.

Joel Moses
When you tell me that the survey set found that 35% of survey respondents said that their infrastructure wasn't ready, I think that's 35% of people being honest with themselves. Being ready for AI doesn't mean you can run a model or even scale a model. It means that you can decide between models. It means you can enforce policy across models.

00:13:07:00 - 00:13:23:01
Joel Moses
Explain why a decision was made. Recover when a decision is wrong. Correct for quality issues that may emerge as the model continues to run. That is not infrastructure as we've known it. That's actually governance of the infrastructure.

00:13:23:01 - 00:13:26:09
Lori MacVittie
Ooo. That's, if you say compliance now I'm going to cry.

00:13:26:10 - 00:13:30:22
Joel Moses
I, you know, I'm going to step away from that word just to make you comfortable.

00:13:30:24 - 00:14:04:03
Lori MacVittie
Thank you. Thank you. Because that's a harsh word. But it is, the governance part, is usually associated with security. You know, security and compliance, governance and policies. And the thing with this is, is that model routing has to come live in that same world. Because decisions that it makes about where to send a request to which model are suddenly governed by policies that also live outside, that are not

00:14:04:04 - 00:14:25:19
Lori MacVittie
is it fast, is it available? Right. It's is it allowed?

Joel Moses
Yeah.

Lori MacVittie
Is this user allowed to touch this model? So that's something we haven't traditionally dealt with either. Which is why I go back when I always say, "hey, the control planes are collapsing."

Joel Moses
Yeah

Lori MacVittie
That's really what's happening. Everybody's got to play the same game and use the same policies.

00:14:25:20 - 00:14:56:07
Joel Moses
That's true. And of course you do need to verify the correctness of the output. I mean, a lot of people leverage RAG to improve the response of the responses that are coming back out of the AI system. But the reverse is also true. Like if there's some, if there's an attacker in position to place something in RAG that alternates or reconfigures the AI to deliver bad data. You know these systems are still garbage in, garbage out.

00:14:56:08 - 00:15:11:28
Joel Moses
I mean, as much as we like to think that they are producing valuable and correct data all the time, you're still garbage in, garbage out. The trouble is, a lot of times they will bake you cookies that taste a whole lot like garbage.

00:15:12:01 - 00:15:13:10
Lori MacVittie
But they're still cookies, so,

00:15:13:12 - 00:15:37:19
Joel Moses
Oh sure, sure.

Lori MacVittie
you know.

Joel Moses
You want to put them in your mouth and you want to eat them, but they're still garbage.

Lori MacVittie
Yeah.

Joel Moses
And you know it's, failover or correction of a problem used to mean if system A fails send to system B. Now it's if model A gives a weird answer, try model B or maybe C, or adjust the prompt, or lower the temperature, or adjust what you're feeding in via RAG.

00:15:37:19 - 00:15:43:16
Joel Moses
Now it's less failover and it's more dealing with bad vibes.

00:15:43:19 - 00:15:46:00
Lori MacVittie
Dealing with oh, you had to bring up vibes, didn't you?

00:15:46:01 - 00:15:47:04
Joel Moses
Sorry I couldn't resist that one.

00:15:47:12 - 00:16:10:02
Lori MacVittie
That's terrible. I was just looking up the data point to make sure I had it right. But yeah, 19% are still relying on RAG to do that fine, you know, the adaptation, if you will, to make sure the models are serving their purposes as opposed to the 52% that are doing multi-model chaining. There were other answers in there.

00:16:10:02 - 00:16:47:13
Lori MacVittie
Surprisingly, the top three were multi-model chaining,

Joel Moses
Okay.

Lori MacVittie
distillation was the second, and the third one was just good old prompt engineering.

Joel Moses
I see.

Lori MacVittie
Yeah, right?

Joel Moses
Okat.

Lori MacVittie
Getting the right prompt. So those techniques were very high on the list of how we do this. Things like LoRA, fine tuning, retraining, RAG, were all much lower, which I found interesting because that goes back to the 35% said their infrastructure is not ready for AI workloads.

00:16:47:13 - 00:16:54:24
Lori MacVittie
And a lot of that depends on having the compute and the infrastructure available to be able to, right, handle it.

00:16:54:25 - 00:17:16:15
Joel Moses
Yeah, I think asking them these questions with a lens towards the infrastructure is definitely going to surface that. I also think that based on the change between the questions we asked previously and this year's battery of questions, I think people are a little more certain of some things, and they're less certain than they were than they expected to be.

00:17:16:15 - 00:17:35:13
Joel Moses
So things like distillation as an active topic and LoRA and various other retraining techniques, people are getting up to speed on those, but they don't have enough information to translate those into concrete decisions about what architecture or what infrastructure to develop. That's my suspicion.

00:17:35:15 - 00:17:55:03
Lori MacVittie
I think it's a good suspicion. I think that's headed in the right direction. So we talked through just a few data points in the in the survey. And there's many, many more of course. But just given what we've discussed, like what would you have people take away from this discussion?

00:17:55:04 - 00:18:16:26
Joel Moses
Yeah. So we need to stop measuring the fog with a ruler. That's one thing.

Lori MacVittie
Okay.

Joel Moses
We need to recognize that the idea of AI is going to fundamentally change how we do a lot of different things from an infrastructure perspective. It's not just about, not just about, now, it is about managing load, managing availability, but it's not just about that.

00:18:16:26 - 00:18:40:00
Joel Moses
There's a lot of other things that you need to sample, like the quality of the information coming out of the systems. You need to look at the data that's going into the systems and score as to whether there's risks there. You need to basically turn yourself into what amounts to a GPS system, rather than just using coin flips to decide everything.

00:18:40:00 - 00:18:52:10
Joel Moses
And that's going to be that's going to be a learning experience, especially as we move to operationalize these systems. Not just deploy the right architecture, but also make the architecture usable and manageable.

00:18:52:13 - 00:19:17:25
Lori MacVittie
And I think that's I'm glad you said operational and, you know, manageable because most of the data points to, no, they're already operationalizing it. But I don't think that means it's manageable because we're still seeing a lot of these issues. And I think some of the solutions to problems we're just learning about don't exist yet. But they're coming, right?

00:19:17:26 - 00:19:45:16
Lori MacVittie
Everybody's working on it, and we're getting better about at least recognizing, like, okay, we need this information, where can we get it? You know, who needs to use it. And I think that's the next step is figuring out where all these decision points are in architectures, so that they can have the data they need and be able to make the right decisions at the right time, rather than just defaulting to, well, it's available and it's fast, let's use it.

00:19:45:16 - 00:20:02:27
Lori MacVittie
That's not going to work with real scale AI, if you will. So that's a wrap for this episode of Pop Goes the Stack. If your alert fatigue hasn't kicked in yet, subscribe because there's plenty more coming.