The Bootstrapped Founder

Building software is getting dramatically easier — so what exactly are we building our businesses on? In this episode, I dig into why real-world data is the only reliable moat left for software founders. I share what I'm seeing at Podscan, where fifty million transcribed podcast episodes matter far more than any algorithm, why purely transformative software is dangerously vulnerable to agents, and how making your business API-first with full platform parity is the move that turns a data advantage into a defensible one. Having data is half the moat. Availing data is the other half.

This episode of The Bootstraped Founder is sponsored by Podscan.fm

The blog post: https://thebootstrappedfounder.com/data-is-the-only-moat/
The podcast episode: https://tbf.fm/episodes/437-data-is-the-only-moat

Check out Podscan, the Podcast database that transcribes every podcast episode out there minutes after it gets released: https://podscan.fm
Send me a voicemail on Podline: https://podline.fm/arvid

You'll find my weekly article on my blog: https://thebootstrappedfounder.com
Podcast: https://thebootstrappedfounder.com/podcast
Newsletter: https://thebootstrappedfounder.com/newsletter

My book Zero to Sold: https://zerotosold.com/
My book The Embedded Entrepreneur: https://embeddedentrepreneur.com/
My course Find Your Following: https://findyourfollowing.com


Here are a few tools I use. Using my affiliate links will support my work at no additional cost to you.
- Notion (which I use to organize, write, coordinate, and archive my podcast + newsletter): https://affiliate.notion.so/465mv1536drx
- Riverside.fm (that's what I recorded this episode with): https://riverside.fm/?via=arvid
- TweetHunter (for speedy scheduling and writing Tweets): http://tweethunter.io/?via=arvid
- HypeFury (for massive Twitter analytics and scheduling): https://hypefury.com/?via=arvid60
- AudioPen (for taking voice notes and getting amazing summaries): https://audiopen.ai/?aff=PXErZ
- Descript (for word-based video editing, subtitles, and clips): https://www.descript.com/?lmref=3cf39Q
- ConvertKit (for email lists, newsletters, even finding sponsors): https://convertkit.com?lmref=bN9CZw

Creators and Guests

Host
Arvid Kahl
Empowering founders with kindness. Building in Public. Sold my SaaS FeedbackPanda for life-changing $ in 2019, now sharing my journey & what I learned.

What is The Bootstrapped Founder?

Arvid Kahl talks about starting and bootstrapping businesses, how to build an audience, and how to build in public.

Speaker 1:

Hey, it's Arvid, and this is The Bootstrapped Founder. Now here's a question that I've been sitting with lately. If building software keeps getting easier, and it clearly is, then what exactly are we building our businesses on? Because conversations about the quality of vibe coded or AI engineered software aside, it is obvious that with LLM originated tooling, AI coding, writing complex software has become significantly easier. It hasn't become completely solved.

Speaker 1:

It's still a process that requires an orchestrator and somebody who knows what the thing is gonna be, what it should look like. And at this point, that isn't just a technical capacity. It's becoming product management and customer development that intersects with engineering. But I think it's pretty clear that as these tools get better, the process of creating code bases that, when deployed, become products that people purchase or services that people pay for, that is clearly moving towards a threshold where software engineering and running software businesses as we know them will have changed quite a bit. It's still going to be a job that requires insight and capacity and skill, but it doesn't require 10 people anymore to build something meaningful.

Speaker 1:

It might just require three and a little bit of AI or two and some AI and maybe just one and a lot of AI. So if it becomes significantly easier to build products and the act of building a software business is not as expensive anymore because it's faster and requires fewer resources, then what will be the motes of the future? Of the immediate future where this change is already happening and we're just trying to adjust? And maybe five or ten years down the line when AI generated software products are so commonplace and easily built and deployed and maintained that that's just what people are going to do. There used to be a lot of things we could point to as moats.

Speaker 1:

How hard it is to build a software product reliably and how hard it is to make it consumable and maintainable, how hard it is to translate your knowledge from an industry that you have as an expert into a product that serves other people in this industry. But AI systems, these tools that we are starting to use, they're taking over a lot of that now. So what is left over here? I find that the one thing that stands out, matter how much AI you throw at it, is real world data, data that is generated by humans by human brains. Because data, all by itself, is right now experiencing this bifurcation, this fork in the road.

Speaker 1:

On one side, there's the data made by humans, by people. They're recording podcast episodes like this one, or they're putting videos out there. People actually still write their own social media posts sometimes, or they write blog posts, books, that stuff. Anything that is genuinely human generated. And then there's the synthetic side: AI generated images, synthetic voices from text to speech systems, completely AI made movies, YouTube clips, and of course every single spam email that is written by Agenetic Systems at this point.

Speaker 1:

One side is becoming more and more valuable the actual signal to human data and the other side is becoming more and more commoditized the AI generated slop, as you might call it, or even the AI generated stuff that actually works, which is increasingly available because models get faster, cheaper, and better. AI generated data, and let's not forget this, can be valuable, but it is a commodity. Human generated data is valuable just by the sheer fact that it's not AI generated at this point. Plus, it then has its own additional inherent value. The creativity, the effort, the expertise of the person creating it, and the exclusivity that went into creating this piece of data.

Speaker 1:

Human data tends to be only generateable by the person who was capable of generating it in the first place because that's the only entity that has all the knowledge to create that particular piece of data. And since AI, by definition, cannot generate human generated data, I believe that good data and that is real world data, mostly human generated, it's validated, it's clean, is the only reliable mode that we have as software founders in the near and midterm future. So let's say the next decade or so likely going to be all about data. I am experiencing this with Podscan right now. The most value that my customers draw from the system is not Podscan's capacity to ingest all these RSS feeds or to have an API that responds quickly to their requests with the right data.

Speaker 1:

That's something that any minimally instructed agentic system could build for me or for you for that matter. The actual value of the Podscan platform is the 50,000,000 podcast episodes that I have transcribed and had AI systems analyze for content, keywords, themes, sentiment. The value is in the additional work, the transcription work, the transformative work that makes the data accessible to others. Because it used to be just audio files, super hard to parse, you have to listen to the whole thing to get it, you have to transcribe it yourself. Podscan does this and then makes it accessible.

Speaker 1:

I'm working on top of existing public data, which is every podcast episode out there, and I put it into a shape and a form where other people can consume it for whatever needs they might have. Maybe they want to track brand mentions, they want to figure out what people are talking about right now, and maybe they want to see what a particular kind of podcast is discussing so they can check if they should sponsor it or place an ad. There are many different ways this data can be used, and I find that the more effort I make to increase the data fidelity, its completeness, accuracy, currentness, freshness of this data, the more people find value in it. And the shape it comes doesn't really matter. It doesn't matter to them how exactly they access it, how hard it might even be to access that data.

Speaker 1:

As long as the data exists, they will try to find a way to get to it. It can be a clunky user interface that's kind of hard to use, could be an API route that is restrictive and equally incomplete, doesn't matter. They will find a way because it's the data that is relevant, not the means of accessing it, not the software built around it. Here's the flip side of this: If you run a software business that is purely transformative, that takes incoming data, some kind of thing, some kind of data bunch, it does something to it, and it turns that data back out, creates new data from it, that will be a problem. Because transformative algorithms are something that agentic AI systems are extremely good at right now.

Speaker 1:

When you say, hey, ChatGPT or Claude or whatever, take this Excel sheet, generate a report from it, turn it into a PDF, and then email it to this account. It will do all of this autonomously without your input, without anybody else's input, without needing an external service. It understands how to parse an Excel file. It can run analytical queries on its content. It can render a PDF, and it can send an email.

Speaker 1:

All of these are tiny steps that somebody has already implemented or that the system itself is smart enough to implement on the fly. It is not needed to build a SaaS business around taking an Excel sheet and emailing out reports, even though many of those exist. But the AgenTic system can do most of this by itself right now. But when it comes to data collection at scale, agents don't usually do this. It's the ephemeral nature of the AgenTic mode here, right?

Speaker 1:

If you spawn a cursor session or a Cloud Code instance or you have a conversation in your browser with your ChatGPT, only during these brief moments of interaction does the agent actually exist. Everything else is just entries in a database somewhere, just a state in a state machine that's trying to get you to your results. But the agent gets only kind of placed on the GPU and runs for you when it's thinking not in between so an agent that constantly scans and constantly does work for you would consume so many tokens that it becomes almost prohibitively expensive so if you were to spawn an agent that tries to do what Podscan is doing taking in and transcribing and analyzing 50,000 podcast episodes a day that would likely cost you tens of thousands of dollars in tokens and API calls not per month, but per day. Whereas in my business, I've optimized all these processes so that it happens much cheaper, much more reliable in the background and makes the data available to my potential customers. Those are API customers, people using the website, or people using the new MCP integration that I put into the system recently so it can be plugged into agents directly.

Speaker 1:

So now agents are also consuming Podscan. Being the system of record, having data that is meaningful and isolated to me, only I have this data, that is why it's so valuable to my customers. If all Podscan did was give people the capacity to give me a URL and I'll transcribe and analyze it for you, then I think the software of the product, Podscan itself, would be like two hours away from being completely replaced by a well written skill inside Cloud Code or something similar. But since Podscan collects all this data, the audio, the freshness of it, chart rankings, social feeds, additional metadata from all over the place, from sources that most people don't even know exist and can be accessed that is the System of Record nature of this business. And that is pulling together data, making it comprehensible and accessible to other systems.

Speaker 1:

So when I say data moat or data is the only moat at this point, I mean you really need to make sure that whatever you offer has its own additional value data source. And on the other side, you need not only make the data collectible inside your system, you also need to use that data and make it available. Having data is half the mode. Availing data is the other half. And you can do this through a couple means.

Speaker 1:

I think making your software business an API first business is probably one of the smartest things that you can do today as a founder. You should ask yourself, is there anything in my application that I could make available through an API that I don't already do? And when I talk about APIs here, MCP and all that stuff is just another layer on top of an API. Usually, most frameworks that offer MCP capability, I'm thinking about Laravel here because I use that, let you set it on top of the REST API that you already have. So whether we're talking about programmatic access or MCP or APIs or webhooks, whatever it might be, to me, in this context, it's the same thing.

Speaker 1:

You make your system reliably connectable to other computers. That's the API. That's the programming interface. Right? The the connection there.

Speaker 1:

That needs to be in a business. That needs to be a focus of your software strategy. And what I find more and more interesting, and I see a lot of founders talk about this because they themselves are consumers of other software businesses, People are looking for software where there is near parity between what you could do in the user interface and what you can do on the API side of things. The more parity there is, the more people can do the exact same things they do in the interface through the API, the more likely they are to really buy into your product because it means that they can automate anything. They can delegate anything they want.

Speaker 1:

And every software agent, every AI agent out there wants to allow people to automate whatever they need. So making it easy to delegate this kind of work to the agent is offering any feature through the API. So I have this ongoing effort in my system where every couple of days, I run a sub agent of Cloud Code on my codebase through a skill to update a central file that I call my platform parity tracking file. Every single function that I have in the product, I have documentation all over the place, and it pulls from that doc to see what functions do I have. Every function gets a row in the table, and I note, can I do this in the UI?

Speaker 1:

Can I do this on the REST API? And can I do this through MCP? If it's available on all three, it's complete. And if not, it's a candidate to add more work to. It could be something as simple as search for a podcast, which is a basic feature in PodScan, or something as complicated as configure a keyword alert that automatically adds new mentions of your brand to a list that then automatically triggers a webhook.

Speaker 1:

If I can do it at any part of the platform, it should be possible in each of the other systems that I offer. And I also have another skill that allows me to then get the biggest candidate, the most high impact candidate, and work on that. And having an agent that can find candidates to implement to increase my parity, I hope that eventually there will be enough tooling for this kind of stuff that these things can be worked on automatically in the background. That's a bit of wishful thinking at this point, I'm aware, but the principle is important. Usage parity between human users, computer users, and agentic users, which to me almost a hybrid of the first two, a human telling an automated system what to do with semi autonomous effort to connect to your system, all three of these should be equally well served.

Speaker 1:

And I have a strong feeling that agentic use of software products is gonna be a big expectation over the next couple of years to come, that people will want to be able to use an agent on your product. They will. Like, they will connect agents to their browser and try to have it click through your application, which is why WebMCP, this new thing that is being currently developed, is such an interesting strategy where you make it available through the HTML so they don't have to randomly click, but they can actually pull in the capabilities and then, you know, in some way, I don't know exactly how it works just yet, do that step, do that take that task and execute it. But let's get back to data, maybe to metadata that you have. Because the metadata that you have is your mode as well, and it doesn't have to be podcast data, like in my case.

Speaker 1:

Whatever your product touches, think about the metadata that you collect when people use your platform or when you observe the consequences of using a product. Let's say you have a tool that lets people post to Twitter or Facebook or something. It could be that you track the times when most people post or when most people engage with these posts or what kind of content drives the most engagement in which kind of locale, which geographic region? That kind of stuff, that metadata, is your unique data mode. And maybe you did set out to track this data.

Speaker 1:

You just wanted to build a cross posting tool between all these social platforms. But tracking this data that only you have access to because you can kind of collate it into something meaningful, that is really unique. That is something that people aren't able to get easily, and that makes it valuable and that makes people pay money for it. So figure out what this is for you. Make sure you have it.

Speaker 1:

Make sure you grab that data and make sure you have it connected to something internally so you understand what it means. And then, clearly, second half, make sure you actually make it accessible in a way that improves how people use your product, If you can, it's the easiest way to implement this data. Or how they can run their own businesses better or run their projects better. That is the data mode. And that's what I will keep focusing on and I highly recommend you focus on as well in the future.

Speaker 1:

And that's it for today. Thank you for listening to The Bootstrapped Founder. If you're running a business and you're wondering who's talking about you out there, check out PodScan. We monitor millions of podcasts in real time, and we alert you when anybody mentions you, customers, competitors, or cool influencers, and you can then turn all this unstructured podcast conversation out there into actual competitive intelligence. And if you're looking for your next venture, you don't know what to do, check out ideas.podscan.fm.

Speaker 1:

We track thousands of podcasts and all the ideas that people mention on there so

Arvid:

you can build

Speaker 1:

what people are already asking for. Share this with anybody who needs to turn conversations into competitive advantages. You can find me on Twitter at Arvid Kahl, a r v I d k a h l. Thanks so much for listening. Have a wonderful day and bye bye.