Paul walks through his real-world podcasting workflow—how he records, edits, transcribes, and publishes with help from Python, OBS, Audacity macros, and ChatGPT. Hear how he splits multitrack audio, cleans voices intelligently, generates art + metadata, and moves toward a fully automated pipeline. A practical, curious conversation for creators experimenting with AI-powered production.
Audio conversation with AI chatbots
I am talking to AI.
A successful day, you don't need to know all the answers.
Just have good questions.
Chatting to AI is different from normal speech
and I hope you enjoy listening to the show
whilst getting ideas on how to pose your questions
to get the most out of AI.
My name is Paul.
The live conversations you hear are uncut
although sometimes the AI needs time to think.
In those cases, I've cut out the dead space.
Okay, Paul, how are you going?
Good.
I wanted to ask you about your process
with the podcast.
Are you sort of knocking things out via a spreadsheet
and all the rest of it?
Or how are you managing recordings and all that jazz
and what are you doing to get things
into the conveyer and belt them out
because you're kind of really steaming ahead here?
Yeah, look, I'm trying to get the podcast recorded
and I suppose I'm trying to use sort of tools to do it.
I'll explain my process then at the moment as it is.
I think this is my fun.
Interesting.
And I'll also maybe at the end just explain
how I'm planning to change it in the future.
But yeah, so at the moment I'm doing the audio recording
using my laptop and talking to chat GBT
and I've got a signal on the phone
so that I can speak to other people.
I've also gone to sort of an audio interface
so that I can plug in phones and things like that
and I needed that so that I could talk to GROC, for example,
that took quite a while to figure out how to do that.
So that's sort of how it's set up.
But then if I was to sort of explain what the process would be,
I suppose I do the audio recording
and at the moment I've gone through a few iterations
since I started the podcast.
Originally I was just recording everything into one channel
so I was just recording the conversation
and then putting that, mixing that down.
And what I realized is by doing that
you sort of lose the ability to clean up individual channels
because if I was to make changes to say chat GBT
and chat GBT has got a very different sort of sonic character
to me talking on my microphone.
So they kind of need different treatments
so they end up sounding good on the podcast.
So I couldn't, if it's all one-way file,
if you're trying to enhance my vocal
then it's going to affect chat GBT's vocal
and so it doesn't really work,
especially because the audio levels were so different
like the chat GBT was louder to begin with than I was.
So I went through a process of tweaking that
and so what I've now got is,
I'm talking to chat GBT, chat GBT's on Chrome,
at the moment you're on Signal,
then I'm using some software called OBS,
data streaming,
and that allows me to record all the channels separately
and that records it into something called an NKV file
and that's a multi-track file
so that allows all the participants,
say if there's three or four people on the call
then they'll all have a different WAV file
so it's all separate,
but it's all in one file.
So the first step that I have to do
is I have to split that file into a bunch of WAV files.
Now, I was rather hoping I'd be able to do that in Audacity,
which is the software I use for mixing down the podcast.
Oh yeah, you can't do that in there.
No, I can't load in NKV files now,
it might be that I could tweak Audacity somehow.
I tried doing it and it didn't work,
so what I'm now doing is I've got a Python script
that I run over the NKV file,
which then splits it into WAV files
and then the next step is I take those four files
because it always creates four files,
one for each participant
and I load up my Audacity
and import those WAV files into Audacity
and then I'll go through the files,
find the beginning of the podcast
because sometimes there might be a few false starts
and then I sort of delete everything up to then
and then I've got a,
and the good thing about what I like with Audacity
is you can create these macros.
So with the macro, I actually,
because it's been a long time since I've used Audacity,
so I couldn't remember how to set up a macro in Audacity
to achieve what I wanted
and what I wanted to achieve
was basically taking the audio from my iPhone headphones,
which is what I used to record this
and turn it into something that sounds good
and in order to turn it into something that sounds good,
you've got to reduce the noise,
you've got to reduce the clicks, reduce the hissing
and then do some EQ,
you've got to basically bring up the bass a bit
and there's a few,
and then there's something called compression,
which that sort of,
and what the compression does is it sort of
reduces the variance in volume
so that it, well, the sort of audio characteristics
are all more similar,
so it's just a bit more pleasing on the ear.
So, there's obviously a lot there
and it's quite a complicated thing,
it'll be a lot of trial and error,
which it was originally when I did it,
but for this time, I just asked ChatGPT how to do it
and it gave me some settings
and I've been using them ever since,
so I created the macro because I asked ChatGPT how to do it
and so I've got about six effects on the audio
and they've all been created by ChatGPT.
So it's working quite well then, isn't it, really?
Yeah, yeah, yeah, and with the audio,
I basically also assigned that macro to a key combination,
so all I have to do is click once on the WAV
and then click on the key combination
and it enhances the audio.
So, that process of loading the files in,
improving the audio and clipping off the rubbish at the beginning,
that just takes a couple of minutes,
depending on the size of the audio
because sometimes the data processing,
the signal processing takes a little bit of time,
but it doesn't take very long
and it's no brain power for me whatsoever,
which is great.
So I do that and then I've already recorded
the intro and the outro
and so I then get those and I swap them on the ends.
And one thing I learnt from doing podcasts before
because I have not had my own podcast,
but I've worked on a podcast.
Before, as an audio person,
is organising your files is super important,
so always having the same structure,
the same name and convention
and everything just makes life so much easier.
So I've got a very sort of rigid form for that,
the way that I'm doing things,
which helps me not get confused.
So that's the first step.
The first step is to set it all up on the Audacity.
The second set is to then export that as an MP3 file.
So as I was talking about folders and files,
I've got a folder for each episode of the podcast
and each episode of the podcast has a sub-directory called Files
and that's where all my working files are,
so the Audacity file,
which is like a file sort of sequencing file
for the show, that goes in there.
All of the WAV, or the NKW file that I recorded
first goes in there,
the WAVs that get created from that file go in there.
And then when I create the MP3,
which is the actual episode,
then that goes in the level up,
so that's in the root of the episode folder.
And then the next step is,
this is where it starts to get quite cool and AIE.
I then worked on a prompt
and this prompt has been quite a bit of work
and quite a bit of time to develop
to the point at which it works now.
And what the prompt does is it takes my,
oh no, sorry, sorry, missing a step, missing a step.
So I've got my MP3, right?
I've got my MP3, sorry.
And then I've got another Python script,
which I got chatGBT to write for me.
And this Python script,
what that does is it transcribes the podcast from the MP3,
turns it into text and creates a bunch of text files.
Now, I could have got a service to do that,
but I opted to do it this way
because it doesn't cost any money
if it runs on my computer.
Because transcription does normally cost money
if you want to get it done with AI.
I'm doing this way.
It's using my CPU and so I can do it for free.
It does take a little while.
I would say, so this Python script on an MP3,
say it's a 20 minute, 30 minute file,
it can take 15 minutes to run, maybe longer.
So what I normally do is I have that running
while I'm then working on maybe another podcast,
doing the same thing or doing some other thing.
So it's working in the background.
So once it's created these files,
so it creates a .text file and it creates a .srt file.
They're the two that I really need.
It's creating a few other files as well,
which I don't need,
but I just haven't had time to change the Python script
so it doesn't do it,
but it doesn't really make any difference to the speed
or anything like that.
So I haven't changed it.
So I create those two files.
Then those two files go in the root with the folder as well.
So now the root has got the MP3
and it's got the transcription and it's got an .srt file,
which is also a transcription,
but that's a subtitle file and Libsyn needs that.
So that's why I create that.
And then what I do is then I use this megaprompt, right?
And the megaprompt, what that does,
yeah, so what this prompt does is it takes the transcription.
So I basically paste this prompt into chat GBT
and I also attach a copy of the transcription.
And when I do that,
the prompt is then ordered to take that transcription
and create an image based on the episode,
based on the transcription.
Oh yeah, that's very cool, actually.
Great.
So it does that.
And then unfortunately,
something I have learned,
which I've explained on an earlier podcast,
but I'll just say it,
just in case other people haven't heard this
and you won't know it either,
is that the way chat GBT works is,
well, the way chat GBT version five
or version four works, right?
Then LLM is, yeah, just super quick.
Basically, every time you build a new model
and it gets all this data, it trains the model,
does that over six months or whatever
and creates chat GBT version five,
version 5.1 or whatever.
And that model is now static
until they create a new model.
But when I have a communication with them
in a chat window, then it remembers stuff
and then it can work with the sum total
of all the information in that chat window.
You've got these other things called projects
where you can have multiple chats
and it remembers stuff from all of that.
Oh, I've heard of these projects.
Yeah, and then you've also got,
there are also other ways of getting chat GBT
to remember information,
but it's all stuff that's between you and chat GBT.
So it can remember that stuff.
But then if you ask,
so generally chat GBT is just running the model
plus the data that's in its memory
that you've created with chat GBT.
However, there is other information
that is not that chat GBT doesn't have.
So say if you're asking for more up-to-date information
than what was compiled in the model
when the model was created,
then chat GBT will then talk to APIs,
which is a machine-to-machine interface,
and it will load the data from some website somewhere else.
And this is why sometimes you have to wait
a long time for chat GBT
because actually it's actually communicating
with the database somewhere,
and then it's loading the data in.
So it will do that.
And it will also use other models as well.
So for doing images, for example,
chat GBT doesn't create an image for you.
It talks to Dali.
And Dali is another model specifically
for image generation.
But when it does these things,
there's obviously limitations with the way
that these third parties have their own rules
and their own ways of doing things.
And one of the limitations I found
with training this podcast was
that if you use Dali,
you can only get it to create one image at a time.
And you have to approve the image
before it will do the next thing.
And there's no way of getting around that.
Yeah, so what you can't do
is you can't create one prompt
and say, create me 10 images.
You can't do that at the moment with Dali.
So the first step is for it to create an image,
and then I have to then approve it.
So then I've had to create a prompt that says,
ask me if I accept this image, yes or no.
And then that's in the prompt
that I've given to chat GBT.
And then I click on yes.
And then when I click on yes,
it then creates a second image
because I need a square image.
I need a rectangular image.
The rectangular image is for my WordPress site.
The square one is for Libsyn.
So it then creates both of those.
And then when it creates those,
then it also creates all the other metadata
to populate my WordPress post and my Libsyn post.
So it creates a title.
It creates summary.
It creates a summary in WordPress
and it creates a summary in Libsyn.
They're different because the requirements are different.
It creates tags.
It creates images, tags.
Yeah, that's about it.
And then in the WordPress site,
at the moment I then take that information
and paste it into a new post.
And the same thing with Libsyn.
So that's super good.
So the prompt for the images is quite long
because I want a certain style.
I want them to be consistent.
And also the way that I want it to talk,
the sort of tone that it's using in the podcast
is what I want.
But I also want there to be some human intervention
because as well.
So I've brought in a concept
of Paul's takeaways in the blog post.
So I get AI to create the text
and then there's a section underneath,
which I don't always populate.
But if I feel that I've really learned something
from one of the episodes,
I'll update something there
and that'll just be my,
and I'll just manually write that
so that I'm actually using my own brain.
But generally, and all these texts,
obviously for the WordPress,
I've requested it to be SEO,
to optimize SEO-wise
for specific...
So it selects an SEO keyword for the text
and then optimizes for that keyword.
So every post has a SEO keyword.
And then the tags are obviously related to the post.
And so the great thing about doing a mega prompt like that
is over time, I might have other ideas
so I can just sort of gradually improve it.
And one thing I was working on on Friday actually
was I think what I'm going to be able to do actually,
I mean ultimately where I want this podcast creation to go to
is I basically want to create an agent,
which I create the MP3 file
and then it does all of this in one go
and I don't have any interaction with it whatsoever
apart from Paul's takeaways.
So in that case, what it would have to do
is it would have to collect via an API to WordPress
and to Libsyn and update the data in their databases
so I wouldn't even have to create a podcast.
Oh, that would be amazing.
Yeah, so in order to sort of start getting there
on Friday, I started thinking about this
and I managed to get it.
I haven't tested it yet.
I think it'll probably be a bit of a mess.
And I'm trying not to get too lost
in the technicalities of things.
I'm focusing on creating the podcast.
So I'll probably give it a more of a go
in a cut in a few weeks.
But at the moment, what it can do
is instead of creating...
Well, it creates all of the text for the WordPress site
but it also creates a file
and then I can import that file
and that file should
have basically
a complete blog post in it.
So it's a text file and that should be enough
to create an entire blog post.
I'm not sure how well that's going to work
but that was my sort of interim
because obviously if we're going to automate it
you've got to be able to do something like that.
So I thought that would be kind of cool
if instead of me manually copying
and pasting the text into WordPress
I'll just get it to do a file
and then I'll just go tools, import
and import the file.
So that's the next step
with the creation.
So that's how I'm creating podcasts
at the moment, my sort of process flow.
In the future, I'm going to try and get an agent
to see if I can automate the whole thing
and it might even be that it's not particularly efficient.
Well, the way I see it...
It just saves some time.
Yeah, and it's time that I hate as well.
So it might be a small saving of time
but if it means I'm more motivated to do the podcast
it means it's easier to do
then I think it's just good.
So I'm going to put this and work on that
and it's all sort of good fun
learning as well.
And then the next big stage
for this is to
take the podcast
and then create a good
process for
sort of sharing the podcast
through social media.
At the moment I'm not doing any of that
I'm focusing on the podcast creation.
But yeah, so that will be probably
the next big thing that I build
to support this podcast.
Do you have any questions?
No, that is awesome.
Thanks. That's great.
I mean, it's just
it's already so much more
efficient now just listening to what you're doing
what you're doing to one.
You know, I might have been doing years ago.
Let's not go there.
So that's really exciting.
Yeah, brilliant.
Okay, great. Thanks, Colin.
And I hope you guys found that interesting
and we'll speak next time.
Thanks.