Pop Goes the Stack

The web we built—a tangle of HTML, JavaScript, CSS, APIs, and SEO quirks—has always been messy. But with AI agents and real-time apps now consuming the web as data, that mess becomes a liability. Firecrawl is one of the new tools reshaping how apps see and ingest web content, turning web pages into structured JSON, markdown, screenshots—everything you need for your agents to behave intelligently. 

In this episode, F5's Lori MacVittie, Joel Moses, and returning guest Aubrey King dig into how Firecrawl works and why it’s emblematic of a deeper shift: the web is no longer just for browsers. It’s now an ingestion surface, a layer to be crawled, parsed, cleaned, and trusted (or not) by your AI stacks. That means how your app presents itself—not just in UI, but in metadata, APIs, link structure, content semantics—matters more than ever. 

Creators and Guests

Host
Joel Moses
Distinguished Engineer and VP, Strategic Engineer at F5, Joel has over 30 years of industry experience in cybersecurity and networking fields. He holds several US patents related to encryption technique.
Host
Lori MacVittie
Distinguished Engineer and Chief Evangelist at F5, Lori has more than 25 years of industry experience spanning application development, IT architecture, and network and systems' operation. She co-authored the CADD profile for ANSI NCITS 320-1998 and is a prolific author with books spanning security, cloud, and enterprise architecture.
Guest
Aubrey King
Community Evangelist with F5's DevCentral, Aubrey is experienced in internal/external network, application and systems engineering.
Producer
Tabitha R.R. Powell
Technical Thought Leadership Evangelist producing content that makes complex ideas clear and engaging.

What is Pop Goes the Stack?

Explore the evolving world of application delivery and security. Each episode will dive into technologies shaping the future of operations, analyze emerging trends, and discuss the impacts of innovations on the tech stack.

00:00:05:03 - 00:00:32:03
Lori MacVittie
Welcome back to Pop Goes to Stack, the show that unboxes emerging tech like Pandora's laptop: curious, messy, and maybe full of bugs. I'm Lori MacVittie; let's see what breaks today. Turns out it's the web. The web we built. It's a tangle of HTML, JavaScript, CSS, APIs, and SEO quirks. And it's always been messy. But with AI agents and real time apps now consuming the web as data,

00:00:32:10 - 00:00:57:01
Lori MacVittie
that mess is becoming a liability. So Firecrawl is one of the new tools reshaping how apps see and ingest web content. Joel has thoughts about this he'll share.

Joel Moses
Sure.

Lori MacVittie
It turns web pages into structured JSON. Markdown, screenshots, you name it, anything you need for your agents to behave intelligently. So in this episode, we're going to dig into Firecrawl,

00:00:57:01 - 00:01:11:14
Lori MacVittie
how it works, whether it's good or not, and, you know, what it really means--the web being consumable--for you, whether you're going to go play with this or not. To help us do that, we've got Joel Moses, of course.

00:01:11:16 - 00:01:12:16
Joel Moses
Good to be here.

00:01:12:18 - 00:01:15:25
Lori MacVittie
All right. And we brought back Aubrey King for fun. Hi, Aubrey.

00:01:16:01 - 00:01:17:15
Aubrey King
Hey, how are you guys doing?

00:01:17:18 - 00:01:36:10
Lori MacVittie
Ha, ha. Well, we'll see, I'll tell you after this episode how it went. So, Firecrawl. Joel, you know, you are most likely to dig into just about anything that I give you, so tell us about Firecrawl and why it's the best thing since PHP.

00:01:36:12 - 00:02:06:07
Aubrey King
It's awesome.

Joel Moses
Well, it is pretty, I will say the functionality of it is pretty awesome. You know, all you got to do to to think about the the web and the the relative lack of cleanliness that there is--and by cleanliness, I mean for ingestion into other systems--is to think about the web pages that you made back in the late 90s for GeoCities, or something like that, where everything was ad hoc and nothing was tagged properly, and there was a lot of content, and it was spread around in unstructured ways.

00:02:06:09 - 00:02:31:04
Joel Moses
And to be honest, we haven't really come much farther from those days, except through the addition of things like structured APIs and different types of applications. But the web remains a largely textual, largely unstructured place. And that means that the problem with ingesting that stuff into an LLM is an LLM works best when it's given good data to process.

00:02:31:04 - 00:02:37:24
Joel Moses
And so if you don't have, if you're not feeding it with good stuff, you're going to get bad stuff out of it. That's that's effectively what it is.

00:02:37:25 - 00:02:41:24
Lori MacVittie
So, why are we letting it read the internet then? I mean, this is like

00:02:41:27 - 00:03:01:03
Joel Moses
Well, I mean, where do you think the LLMs get all their knowledge sets from? And when you want to improve the responsiveness and you want to improve the newness of the information that that your LLM talks to you about, you're going to need to feed better and more accurate and more up to date data into it.

00:03:01:03 - 00:03:25:20
Joel Moses
And you're going to have to take that unstructured, unclean, strangely formatted data, format it in some way that makes it easily digestible by these LLM systems in order to make it work. And that's kind of what Firecrawl does, it it browses a site for you and creates little elements that are well structured out of what it finds on those websites.

00:03:25:20 - 00:03:49:13
Joel Moses
So the website can be horribly crafted and you can get actual good samplable data out of it. It can have lots of different structural elements. It can have a navigation structure that's built on JavaScript, and it will navigate it for you and produce the structured data suitable for importing into an LLM system. It's pretty cool. And I believe the platform is built on open source.

00:03:49:13 - 00:03:59:11
Joel Moses
It's licensed under AGPL and that's also good. It means that you can read, review, and create your own plugins for the system.

00:03:59:13 - 00:04:03:11
Lori MacVittie
Okay. But why? So,

00:04:03:13 - 00:04:10:05
Lori MacVittie
I don't know if we should dig into the, "Okay, but why is it bad?" or "Why would you want this in the first place?" Like,

00:04:10:07 - 00:04:33:08
Aubrey King
Well, I can tell you from my perspective, finding Firecrawl was like a gift. We've been working on these labs at DevCentral in our Git repository, which are just teaching AI kind of principles and basics and then, you know, some safeguards and things like that. But I was working on a particular lab that takes a YouTube page,

00:04:33:11 - 00:04:55:20
Aubrey King
looks at it as RSS, checks to see if there's new entries, and converts it to XML then to be used with whatever else. Finding Firecrawl enabled me to really simplify that by just taking it and instead of converting it to XML, converting it directly to JSON. I was going to convert to XML, then convert to JSON, and it was kind of messy.

00:04:55:23 - 00:05:12:11
Aubrey King
So having something that really just goes into the native, most modern language that we've got seems like, I mean, JSON seems to be "the standard." So I really, really thought it was fantastic from a use case perspective. Saved me a lot of time in coding.

Joel Moses
Yeah.

00:05:12:13 - 00:05:36:18
Joel Moses
One of the cool aspects of it is you can, you can do a lot of different things with it, including like a single page scrape, or you can tell it to crawl the entire website, including, like, even without using a sitemap all the included pages, even if those pages are through navigation. You can give it a map of URLs in the sites do. You can search and scrape in a single call.

00:05:36:20 - 00:05:54:06
Joel Moses
It's really cool. You can, you can do lots of different, malleable things. Anybody who's ever worked with a site crawler knows that that functionality is hard enough. To combine it with something that that boils the website down into contextual components you can load into an LLM, that's a pretty cool trick.

00:05:54:09 - 00:06:16:00
Lori MacVittie
Yep. But most of these sites and most of, right, an enterprise is going to be, delivering its site from a CMS. So all of the little constructs are in there. So it's taking structured data, it's turning it into an unstructured web page so that, you know, Firecrawl or GPT can turn it back into a structured page to understand it.

00:06:16:05 - 00:06:38:28
Lori MacVittie
It seems like we're inserting something in the middle here that maybe, maybe we don't need to. Like maybe the, you know, traditional user interface is just unnecessary. Let's just get rid of it, right? Everybody uses video and audio books and everything else today. Do we really need this? I mean, what what are we doing with that?

00:06:39:00 - 00:06:41:04
Aubrey King
You know, when I,

00:06:41:06 - 00:06:47:02
Lori MacVittie
Didn't expect that question, did you? Do we even need websites anymore? Let's you know,

00:06:47:02 - 00:07:05:14
Joel Moses
A good point.

Aubrey King
No I, I think we definitely do. But the thing that that made me, you know, when you were saying that what I, what I was thinking about was the first time looking at this, I was reminded of another scanner that I used in red teaming stuff. W3af is something that I've always used to kind of pick apart websites to make sure I I know where the leaks are going to be.

00:07:05:16 - 00:07:27:00
Aubrey King
You can kind of accomplish the same thing with Firecrawl, so it kind of scares me giving, you know, some bad actors a new utility to really mess with. But you can do the job at w3af and it makes it much more programable. So if you're looking to automate red teaming and things like that, you're you're going to be able to do it neatly with that.

00:07:27:02 - 00:07:52:16
Joel Moses
And Lori, you know, you're kind of dead on about, you know, content management systems like WordPress for example. They're effectively taking well ordered data and then they're presenting it as a web page or a web experience to an end customer. And so you're changing from structured to slightly unstructured. But, you know, it's interesting we've used those CSMs to kind of create the way to manage content.

00:07:52:16 - 00:08:08:19
Joel Moses
But for the most part, we strap on a set of user experience modules and things like that that make it unstructured again; that make it hard to navigate. Or the things that make it pleasant for a human to navigate are the things that make it annoying for a site crawler to navigate.

Aubrey King
Yeah.

00:08:08:21 - 00:08:14:20
Lori MacVittie
Isn't that the, do we, you know, I that, then there's the other question of do we even want them to be.

00:08:14:22 - 00:08:38:25
Joel Moses
Yeah, why don't we just do it to the CSM and yeah, I'm sure you could. The thing is, tools like Firecrawl are able to serve the lowest common denominator and everything above it, right. So if a site is unplanned and unstructured, it can do that. If a site is on WordPress behind a very slick, navigable float UI, it can navigate that. It can do all sorts of different things.

00:08:38:25 - 00:08:46:09
Joel Moses
And so it's an easy one size fits all solution for getting things ready for an LLM, which is pretty cool.

00:08:46:11 - 00:09:27:17
Lori MacVittie
Right. But you could also use it to create perhaps the JSON-LD version of a page that has been recommended for consumption in the first place. And if it's just made available, right, then it doesn't have to actually go and parse it, screen, you know, all the

Joel Moses
Yeah.

Lori MacVittie
you know, that, what it's doing. And then it can just grab that file directly, which is going to be faster and you have more control over what it's actually reading. Because that's one of the things that shifts from humans are primarily, you know, the raters of content, to now we're using ChatGPT to search.

00:09:27:19 - 00:09:49:03
Lori MacVittie
We want to make sure that the information it's getting about our product, our business, our services is accurate. And one way to do that is to pre-position the content that is more consumable to it, rather than allow something like Firecrawl to go out and just grab it and go, "oh, that means bad" and now you're stuck with that.

00:09:49:03 - 00:09:49:27
Joel Moses
Yeah.

00:09:49:29 - 00:10:06:14
Joel Moses
It is funny to think about this. You know, the old hotness used to be RSS feeds. How do you get a knowledge of what the updated content on a site is without having to navigate the entirety of the site? You would actually hand prepare an RSS feed back in the day; that's how it was done.

00:10:06:21 - 00:10:29:06
Joel Moses
And so Firecrawl effectively allows you to crawl an entire site and create, effectively, a chunked feed of data and analyze it based on what, on the newness of it. You can actually order it by that. Which, you know, this is basically a replacement for RSS feeds for LLMs. It's, I didn't think of it that way, but it sort of is.

Lori MacVittie
Well, it's

Aubrey King
Kinda, yeah.

00:10:29:12 - 00:10:38:13
Lori MacVittie
it's the reader. It's the RSS reader, not the,

Joel Moses
Correct.

Lori MacVittie
cause there were things that generated RSS automatically. There were plug ins and there were scripts and, right Aubrey?

00:10:38:15 - 00:10:56:12
Aubrey King
Oh yeah. Yeah, plenty. And, you know, it's funny I find myself thinking about the fact that we three have talked about a time when these models end up going and doing the work for us. Like, you know, there's a time in the morning when you get up and say, you know, "hey, model whatever, what's the news?"

00:10:56:12 - 00:11:19:06
Aubrey King
and it just reads off the stuff that it crawled for you. You know, like that kind of thing, like, I can see coming. So to your point, do we need websites anymore? I kind of wonder, you know, are these things just going to read the websites we need and then proxy that information to us? And if so, what happens when your model is alignment switched or something like that

00:11:19:06 - 00:11:25:15
Aubrey King
and suddenly it turns into a murder bot and feeds you the wrong news? You know?

00:11:25:17 - 00:11:44:15
Joel Moses
Yeah. You know, if we're talking about

Aubrey King
It's tough.

Joel Moses
if we're talking about some negatives, there are definitely some negatives with Firecrawl. So there definitely is an implication here. Right now, Firecrawl is kind of in a, what I would call, going through that transition that every open source project goes through when they're trying to really knuckle down and make some money on the business.

00:11:44:18 - 00:12:18:09
Joel Moses
They've kind of let the self-hosted mechanism for Firecrawl atrophy. The documentation, users are reporting the documentation is not being kept up well

Aubrey King
It's bad.

Joel Moses
and they're trying to force people over to to a cloud delivered solution. Which is very easy to implement and operate, I will give it that. But it also means that when you tell it to go browse sites, when you're asking for certain site crawls and perhaps, you know, you have sites that are authenticated in context and you want to crawl those, the service is going to have to do that on your behalf and you're going to have to give it the credential and access

00:12:18:09 - 00:12:45:25
Joel Moses
to do that. You know, I myself, I really prefer, especially where data elements are being accessed and boiled down for LLM consumption, I like to have more control over the process and I like a self-hosted option. So if there is anything to critique, it's the fact that that self-hosted AGPL product is being pushed aside for something that the company is going to make money on.

00:12:45:26 - 00:12:52:09
Joel Moses
Now, companies can make money, but, in this particular case, I think there are privacy and security concerns.

00:12:52:11 - 00:13:23:17
Lori MacVittie
So the reason we wanted to talk about it, I mean, aside from being cool and, you know, interesting and whatnot, is that organizations have to be considering how are agents and AI consuming their information. And this is one way to, you know, to format it or get it an idea of it and whatnot, you know, as opposed to, I don't know, scraping other people's sites, which, you know, I don't really have a need for that

00:13:23:17 - 00:13:54:06
Lori MacVittie
and I don't think that's necessarily the best use. But you do have to have some tool or some way to understand what's the best way to present my site and my information to this, you know, new class of user, if you will. And this is a good way to do it, is run it against your site and say, well, what are you doing to my information so you can better serve the content directly to the AI rather than letting someone else use a tool like this?

00:13:54:09 - 00:14:03:18
Aubrey King
Yeah, and I mean, how will they simply monetize that data too? I mean, you know, take a look at they could very easily aggregate, hey, these are all the sites we've had to look up and

00:14:03:18 - 00:14:33:14
Joel Moses
Yeah. You know

Lori MacVittie
Yeah, yeah.

Aubrey King
Don't like that.

Joel Moses
well one of the things I'm taking away from this discussion for sure is that, you know, it's good to do this yourself if at all possible. And I mean,

Aubrey King
Yeah.

Joel Moses
I mean that in two ways. Number one, you should probably not turn over the task of browsing all of the data on your site under an authenticated context to a service that you're not really sure about, because it will browse everything and it will allocate everything.

00:14:33:14 - 00:14:55:18
Joel Moses
And so it's up to you to govern the usage of that data. The flip side of that, though, is if you already have the data in a structured format and it's simply being re-represented as a web page. For example, if you have something like WordPress backing the content, go get the data from that. Don't necessarily go through an intermediate service that browses the site.

00:14:55:20 - 00:15:01:28
Joel Moses
It'll be faster, it'll be more efficient, and it'll be better organized.

00:15:02:01 - 00:15:25:23
Lori MacVittie
That's interesting because, and when you said that, I'm like "huh." I remember a stat and I just looked it up. There was an estimate, 67% of the high quality top news sites, I don't know what they are, but they block access to AI models. And, you know, they just say your bots, your crawlers, no, you can't access our site.

00:15:25:25 - 00:15:43:22
Lori MacVittie
And that is increasingly a strategy, I guess, for sites whose primary product is information or news, right, is to just block the AI entirely. So, I mean, that is an option too if you don't want your site crawled.

00:15:43:24 - 00:16:16:04
Joel Moses
That is an excellent point. You know, the first time I ever saw a Firecrawl, the number one thing that I thought about it is, man, this is great for competitive research.

Lori MacVittie
Yeah.

Joel Moses
Going off and crawling all sorts of your competition sites and looking for data elements to feed in. So as you can, you can actually ask your LLM guided questions about what your competitors are up to versus what you are up to. That, being able to boil things down like that from the externally exposed public web content,

00:16:16:06 - 00:16:39:18
Joel Moses
this is invaluable for things like market research and competitive research. But it also comes with some drawbacks. You know, the relative legality of sucking in an entire site. You know, that sort of thing I think is left as an exercise for the implementer and people should consult their counsel before they do.

00:16:39:20 - 00:16:44:18
Lori MacVittie
Talk to a lawyer. That's what he means, talk to a lawyer first.

00:16:44:21 - 00:17:03:23
Aubrey King
Definitely. I'm certainly going to keep using this thing, I can tell you that. It's been invaluable. I think, though, to Joel's point, I really don't think we need to let these things just surf the web for us. I think that's a bad plan. Just because I still don't trust the alignment thing at all

00:17:03:24 - 00:17:23:02
Aubrey King
with models. But, aside from that, I think it would be nice if we could find a way to put pressure on them to bring back a more open source, you know, host it yourself option, because there's something really nice and neat about bringing up a container in your environment and, you know, hey, it's there.

00:17:23:02 - 00:17:30:15
Aubrey King
I can use that now and destroy it when I'm done. I don't have to give anyone else my data. I'd much prefer that.

00:17:30:18 - 00:17:58:04
Joel Moses
Yeah. And, you know, if you're running self-hosted, one of the huge benefits of this is the freshness of the content that your LLM can ingest and understand. You know, it's very difficult to go back and retrain something using a large corpus of information. It's a little difficult to set up a RAG scenario where it pulls in content ad hoc and selects things from it. But you can actually create elements with Firecrawl that are very ingestible, that are fully up to date,

00:17:58:06 - 00:18:11:19
Joel Moses
and it really does improve your own implementation of LLM for your purpose. So that's great. So I sincerely hope they do some more work on the self-hosted option. I love it.

00:18:11:21 - 00:18:38:00
Lori MacVittie
Yeah, it's important. You have to do this. I mean that's, you know, you mentioned the competitive views but, I mean, just in terms of the percentage of people that are exclusively using AI to do searches for things like compare products so that I can actually decide which one to buy for my business. Now that's a legitimate use case of doing this kind of research.

00:18:38:03 - 00:19:01:10
Lori MacVittie
And they are doing it, which means that you need to be delivering content in a way that AI can understand and actually present your pros and cons whatever in the best light, or you're going to end up, right, being on the bottom of those lists. Because increasingly people are like, why should I do an RFP and then have to read through them when I can just have AI do it for me?

00:19:01:10 - 00:19:13:14
Joel Moses
Sure.

Lori MacVittie
So it's, this is an important part to get right. And if Firecrawl helps you understand how to construct that, then it's a great tool and you should, you know, definitely check it out and use it.

00:19:13:16 - 00:19:15:00
Aubrey King
It does.

Joel Moses
I agree.

00:19:15:02 - 00:19:36:10
Lori MacVittie
Yeah. Maybe that's one of the takeaways right? This is something you need to be aware of and do, so, you know, definitely check it out. Even if it's not the whole tool you want to use, it's open source. There are probably good pieces and sections within it that will help you achieve what you need to achieve in order to make sure you don't fall behind competitively

00:19:36:18 - 00:19:44:02
Lori MacVittie
as everyone starts moving to let's have AI help me decide what to buy and where to go.

00:19:44:04 - 00:20:04:06
Joel Moses
Yeah, I'd agree with that. I'd also add that one of the takeaways is, you know, if you're going to use a system that does web crawling to help out an LLM, create your plan of attack first. Don't just allow ad hoc use of this. You know, it's the old adage that "garbage in, garbage out,"

00:20:04:06 - 00:20:21:14
Joel Moses
it still applies to LLMs. And one thing Firecrawl does not necessarily do for you is it doesn't necessarily de-duplicate. It doesn't really help you with your prompt engineering. It just gives you better, cleaner data sources. It's up to you to ensure the accuracy.

00:20:21:17 - 00:20:55:05
Aubrey King
Garbage in, garbage out, reminds me of Grok. Oh, sorry, that was a hot mic. No, I wanted to say, you know, as far as takeaways are concerned, I was going to say I love this utility's ability to take unplanned data as it was said here, but things from social media, YouTube, LinkedIn, etc., look at it in easy to manage chunks, and make up things like mailing lists or whatever kind of bullet points.

00:20:55:07 - 00:21:19:09
Aubrey King
In-depth stuff I wouldn't trust it with. But that's not my takeaway. My takeaway is it's integrated cleanly into n8n and that is important. If you're not using n8n, man, you should be. I love it. It makes every task easier when you're talking about making agents and doing AI work for yourself. So that is one great thing about it,

00:21:19:09 - 00:21:24:20
Aubrey King
whether it's web hosting or not. I love the easy integration.

Joel Moses
Yeah.

00:21:24:22 - 00:21:39:12
Lori MacVittie
Sounds like we need another episode just to talk about n8n

Aubrey King
It's hot.

Lori MacVittie
and bring bring Aubrey back to talk about it since he's so excited. I'm sure Joel has already used it, dissected it, and has very strong opinions about it. Am I wrong?

00:21:39:15 - 00:21:40:23
Joel Moses
No opinions whatsoever.

00:21:40:25 - 00:22:01:00
Lori MacVittie
Ooo, okay,

Aubrey King
It's hot.

Lori MacVittie
you're not allowed to look at it until we have this episode. I want you cold when we do that, because we'll have to come back because we are out of time today. Sorry. I know we could go on and on. So we're just going to say that's a wrap for Pop Goes the Stack this time.

00:22:01:02 - 00:22:11:04
Lori MacVittie
If you enjoyed the break down, subscribe. Because the only thing scaling faster than tech hype is our snark. Yeah. See you next time.

00:22:11:06 - 00:22:12:03
Joel Moses
Bye bye.

Aubrey King
Bye.