Talking to AI

Paul walks through his real-world podcasting workflow—how he records, edits, transcribes, and publishes with help from Python, OBS, Audacity macros, and ChatGPT. Hear how he splits multitrack audio, cleans voices intelligently, generates art + metadata, and moves toward a fully automated pipeline. A practical, curious conversation for creators experimenting with AI-powered production.

What is Talking to AI?

Audio conversation with AI chatbots

I am talking to AI.

A successful day, you don't need to know all the answers.

Just have good questions.

Chatting to AI is different from normal speech

and I hope you enjoy listening to the show

whilst getting ideas on how to pose your questions

to get the most out of AI.

My name is Paul.

The live conversations you hear are uncut

although sometimes the AI needs time to think.

In those cases, I've cut out the dead space.

Okay, Paul, how are you going?

Good.

I wanted to ask you about your process

with the podcast.

Are you sort of knocking things out via a spreadsheet

and all the rest of it?

Or how are you managing recordings and all that jazz

and what are you doing to get things

into the conveyer and belt them out

because you're kind of really steaming ahead here?

Yeah, look, I'm trying to get the podcast recorded

and I suppose I'm trying to use sort of tools to do it.

I'll explain my process then at the moment as it is.

I think this is my fun.

Interesting.

And I'll also maybe at the end just explain

how I'm planning to change it in the future.

But yeah, so at the moment I'm doing the audio recording

using my laptop and talking to chat GBT

and I've got a signal on the phone

so that I can speak to other people.

I've also gone to sort of an audio interface

so that I can plug in phones and things like that

and I needed that so that I could talk to GROC, for example,

that took quite a while to figure out how to do that.

So that's sort of how it's set up.

But then if I was to sort of explain what the process would be,

I suppose I do the audio recording

and at the moment I've gone through a few iterations

since I started the podcast.

Originally I was just recording everything into one channel

so I was just recording the conversation

and then putting that, mixing that down.

And what I realized is by doing that

you sort of lose the ability to clean up individual channels

because if I was to make changes to say chat GBT

and chat GBT has got a very different sort of sonic character

to me talking on my microphone.

So they kind of need different treatments

so they end up sounding good on the podcast.

So I couldn't, if it's all one-way file,

if you're trying to enhance my vocal

then it's going to affect chat GBT's vocal

and so it doesn't really work,

especially because the audio levels were so different

like the chat GBT was louder to begin with than I was.

So I went through a process of tweaking that

and so what I've now got is,

I'm talking to chat GBT, chat GBT's on Chrome,

at the moment you're on Signal,

then I'm using some software called OBS,

data streaming,

and that allows me to record all the channels separately

and that records it into something called an NKV file

and that's a multi-track file

so that allows all the participants,

say if there's three or four people on the call

then they'll all have a different WAV file

so it's all separate,

but it's all in one file.

So the first step that I have to do

is I have to split that file into a bunch of WAV files.

Now, I was rather hoping I'd be able to do that in Audacity,

which is the software I use for mixing down the podcast.

Oh yeah, you can't do that in there.

No, I can't load in NKV files now,

it might be that I could tweak Audacity somehow.

I tried doing it and it didn't work,

so what I'm now doing is I've got a Python script

that I run over the NKV file,

which then splits it into WAV files

and then the next step is I take those four files

because it always creates four files,

one for each participant

and I load up my Audacity

and import those WAV files into Audacity

and then I'll go through the files,

find the beginning of the podcast

because sometimes there might be a few false starts

and then I sort of delete everything up to then

and then I've got a,

and the good thing about what I like with Audacity

is you can create these macros.

So with the macro, I actually,

because it's been a long time since I've used Audacity,

so I couldn't remember how to set up a macro in Audacity

to achieve what I wanted

and what I wanted to achieve

was basically taking the audio from my iPhone headphones,

which is what I used to record this

and turn it into something that sounds good

and in order to turn it into something that sounds good,

you've got to reduce the noise,

you've got to reduce the clicks, reduce the hissing

and then do some EQ,

you've got to basically bring up the bass a bit

and there's a few,

and then there's something called compression,

which that sort of,

and what the compression does is it sort of

reduces the variance in volume

so that it, well, the sort of audio characteristics

are all more similar,

so it's just a bit more pleasing on the ear.

So, there's obviously a lot there

and it's quite a complicated thing,

it'll be a lot of trial and error,

which it was originally when I did it,

but for this time, I just asked ChatGPT how to do it

and it gave me some settings

and I've been using them ever since,

so I created the macro because I asked ChatGPT how to do it

and so I've got about six effects on the audio

and they've all been created by ChatGPT.

So it's working quite well then, isn't it, really?

Yeah, yeah, yeah, and with the audio,

I basically also assigned that macro to a key combination,

so all I have to do is click once on the WAV

and then click on the key combination

and it enhances the audio.

So, that process of loading the files in,

improving the audio and clipping off the rubbish at the beginning,

that just takes a couple of minutes,

depending on the size of the audio

because sometimes the data processing,

the signal processing takes a little bit of time,

but it doesn't take very long

and it's no brain power for me whatsoever,

which is great.

So I do that and then I've already recorded

the intro and the outro

and so I then get those and I swap them on the ends.

And one thing I learnt from doing podcasts before

because I have not had my own podcast,

but I've worked on a podcast.

Before, as an audio person,

is organising your files is super important,

so always having the same structure,

the same name and convention

and everything just makes life so much easier.

So I've got a very sort of rigid form for that,

the way that I'm doing things,

which helps me not get confused.

So that's the first step.

The first step is to set it all up on the Audacity.

The second set is to then export that as an MP3 file.

So as I was talking about folders and files,

I've got a folder for each episode of the podcast

and each episode of the podcast has a sub-directory called Files

and that's where all my working files are,

so the Audacity file,

which is like a file sort of sequencing file

for the show, that goes in there.

All of the WAV, or the NKW file that I recorded

first goes in there,

the WAVs that get created from that file go in there.

And then when I create the MP3,

which is the actual episode,

then that goes in the level up,

so that's in the root of the episode folder.

And then the next step is,

this is where it starts to get quite cool and AIE.

I then worked on a prompt

and this prompt has been quite a bit of work

and quite a bit of time to develop

to the point at which it works now.

And what the prompt does is it takes my,

oh no, sorry, sorry, missing a step, missing a step.

So I've got my MP3, right?

I've got my MP3, sorry.

And then I've got another Python script,

which I got chatGBT to write for me.

And this Python script,

what that does is it transcribes the podcast from the MP3,

turns it into text and creates a bunch of text files.

Now, I could have got a service to do that,

but I opted to do it this way

because it doesn't cost any money

if it runs on my computer.

Because transcription does normally cost money

if you want to get it done with AI.

I'm doing this way.

It's using my CPU and so I can do it for free.

It does take a little while.

I would say, so this Python script on an MP3,

say it's a 20 minute, 30 minute file,

it can take 15 minutes to run, maybe longer.

So what I normally do is I have that running

while I'm then working on maybe another podcast,

doing the same thing or doing some other thing.

So it's working in the background.

So once it's created these files,

so it creates a .text file and it creates a .srt file.

They're the two that I really need.

It's creating a few other files as well,

which I don't need,

but I just haven't had time to change the Python script

so it doesn't do it,

but it doesn't really make any difference to the speed

or anything like that.

So I haven't changed it.

So I create those two files.

Then those two files go in the root with the folder as well.

So now the root has got the MP3

and it's got the transcription and it's got an .srt file,

which is also a transcription,

but that's a subtitle file and Libsyn needs that.

So that's why I create that.

And then what I do is then I use this megaprompt, right?

And the megaprompt, what that does,

yeah, so what this prompt does is it takes the transcription.

So I basically paste this prompt into chat GBT

and I also attach a copy of the transcription.

And when I do that,

the prompt is then ordered to take that transcription

and create an image based on the episode,

based on the transcription.

Oh yeah, that's very cool, actually.

Great.

So it does that.

And then unfortunately,

something I have learned,

which I've explained on an earlier podcast,

but I'll just say it,

just in case other people haven't heard this

and you won't know it either,

is that the way chat GBT works is,

well, the way chat GBT version five

or version four works, right?

Then LLM is, yeah, just super quick.

Basically, every time you build a new model

and it gets all this data, it trains the model,

does that over six months or whatever

and creates chat GBT version five,

version 5.1 or whatever.

And that model is now static

until they create a new model.

But when I have a communication with them

in a chat window, then it remembers stuff

and then it can work with the sum total

of all the information in that chat window.

You've got these other things called projects

where you can have multiple chats

and it remembers stuff from all of that.

Oh, I've heard of these projects.

Yeah, and then you've also got,

there are also other ways of getting chat GBT

to remember information,

but it's all stuff that's between you and chat GBT.

So it can remember that stuff.

But then if you ask,

so generally chat GBT is just running the model

plus the data that's in its memory

that you've created with chat GBT.

However, there is other information

that is not that chat GBT doesn't have.

So say if you're asking for more up-to-date information

than what was compiled in the model

when the model was created,

then chat GBT will then talk to APIs,

which is a machine-to-machine interface,

and it will load the data from some website somewhere else.

And this is why sometimes you have to wait

a long time for chat GBT

because actually it's actually communicating

with the database somewhere,

and then it's loading the data in.

So it will do that.

And it will also use other models as well.

So for doing images, for example,

chat GBT doesn't create an image for you.

It talks to Dali.

And Dali is another model specifically

for image generation.

But when it does these things,

there's obviously limitations with the way

that these third parties have their own rules

and their own ways of doing things.

And one of the limitations I found

with training this podcast was

that if you use Dali,

you can only get it to create one image at a time.

And you have to approve the image

before it will do the next thing.

And there's no way of getting around that.

Yeah, so what you can't do

is you can't create one prompt

and say, create me 10 images.

You can't do that at the moment with Dali.

So the first step is for it to create an image,

and then I have to then approve it.

So then I've had to create a prompt that says,

ask me if I accept this image, yes or no.

And then that's in the prompt

that I've given to chat GBT.

And then I click on yes.

And then when I click on yes,

it then creates a second image

because I need a square image.

I need a rectangular image.

The rectangular image is for my WordPress site.

The square one is for Libsyn.

So it then creates both of those.

And then when it creates those,

then it also creates all the other metadata

to populate my WordPress post and my Libsyn post.

So it creates a title.

It creates summary.

It creates a summary in WordPress

and it creates a summary in Libsyn.

They're different because the requirements are different.

It creates tags.

It creates images, tags.

Yeah, that's about it.

And then in the WordPress site,

at the moment I then take that information

and paste it into a new post.

And the same thing with Libsyn.

So that's super good.

So the prompt for the images is quite long

because I want a certain style.

I want them to be consistent.

And also the way that I want it to talk,

the sort of tone that it's using in the podcast

is what I want.

But I also want there to be some human intervention

because as well.

So I've brought in a concept

of Paul's takeaways in the blog post.

So I get AI to create the text

and then there's a section underneath,

which I don't always populate.

But if I feel that I've really learned something

from one of the episodes,

I'll update something there

and that'll just be my,

and I'll just manually write that

so that I'm actually using my own brain.

But generally, and all these texts,

obviously for the WordPress,

I've requested it to be SEO,

to optimize SEO-wise

for specific...

So it selects an SEO keyword for the text

and then optimizes for that keyword.

So every post has a SEO keyword.

And then the tags are obviously related to the post.

And so the great thing about doing a mega prompt like that

is over time, I might have other ideas

so I can just sort of gradually improve it.

And one thing I was working on on Friday actually

was I think what I'm going to be able to do actually,

I mean ultimately where I want this podcast creation to go to

is I basically want to create an agent,

which I create the MP3 file

and then it does all of this in one go

and I don't have any interaction with it whatsoever

apart from Paul's takeaways.

So in that case, what it would have to do

is it would have to collect via an API to WordPress

and to Libsyn and update the data in their databases

so I wouldn't even have to create a podcast.

Oh, that would be amazing.

Yeah, so in order to sort of start getting there

on Friday, I started thinking about this

and I managed to get it.

I haven't tested it yet.

I think it'll probably be a bit of a mess.

And I'm trying not to get too lost

in the technicalities of things.

I'm focusing on creating the podcast.

So I'll probably give it a more of a go

in a cut in a few weeks.

But at the moment, what it can do

is instead of creating...

Well, it creates all of the text for the WordPress site

but it also creates a file

and then I can import that file

and that file should

have basically

a complete blog post in it.

So it's a text file and that should be enough

to create an entire blog post.

I'm not sure how well that's going to work

but that was my sort of interim

because obviously if we're going to automate it

you've got to be able to do something like that.

So I thought that would be kind of cool

if instead of me manually copying

and pasting the text into WordPress

I'll just get it to do a file

and then I'll just go tools, import

and import the file.

So that's the next step

with the creation.

So that's how I'm creating podcasts

at the moment, my sort of process flow.

In the future, I'm going to try and get an agent

to see if I can automate the whole thing

and it might even be that it's not particularly efficient.

Well, the way I see it...

It just saves some time.

Yeah, and it's time that I hate as well.

So it might be a small saving of time

but if it means I'm more motivated to do the podcast

it means it's easier to do

then I think it's just good.

So I'm going to put this and work on that

and it's all sort of good fun

learning as well.

And then the next big stage

for this is to

take the podcast

and then create a good

process for

sort of sharing the podcast

through social media.

At the moment I'm not doing any of that

I'm focusing on the podcast creation.

But yeah, so that will be probably

the next big thing that I build

to support this podcast.

Do you have any questions?

No, that is awesome.

Thanks. That's great.

I mean, it's just

it's already so much more

efficient now just listening to what you're doing

what you're doing to one.

You know, I might have been doing years ago.

Let's not go there.

So that's really exciting.

Yeah, brilliant.

Okay, great. Thanks, Colin.

And I hope you guys found that interesting

and we'll speak next time.

Thanks.

More episodes

Chapters

What is Talking to AI?