Postgres FM

Michael is joined by Chelsea Dole, Staff Software Engineer and Tech Lead of the Data Storage Team at Brex, to discuss bloat. They cover what it is, why it's a problem, strategies for minimising it, and the options when it's really bad.

Here are some links to things they mentioned:

Managing your tuple graveyard (talk by Chelsea) https://www.youtube.com/watch?v=gAgbzvGT6ck
pgstattuple https://www.postgresql.org/docs/current/pgstattuple.html
pg_class https://www.postgresql.org/docs/current/catalog-pg-class.html
Bloat in PostgreSQL: a taxonomy (talk by Peter Geoghegan) https://youtube.com/watch?v=JDG4bMHxCH8
It's Not You, It's Me: Breaking Up with Massive Tables via Partitioning (talk by Chelsea) https://youtube.com/watch?v=TafwSuLNxe8
pg_repack https://github.com/reorg/pg_repack
pg_squeeze https://github.com/cybertec-postgresql/pg_squeeze
VACUUM https://www.postgresql.org/docs/current/sql-vacuum.html
autovacuum https://www.postgresql.org/docs/current/runtime-config-autovacuum.html
CLUSTER https://www.postgresql.org/docs/current/sql-cluster.html
HOT updates https://www.postgresql.org/docs/current/storage-hot.html
Our episode on BUFFERS https://postgres.fm/episodes/buffers-by-default
Our episode on TOAST https://postgres.fm/episodes/toast
Our episode on index maintenance https://postgres.fm/episodes/index-maintenance
Chelsea's website: https://chelseadole.com/

~~~

What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!

~~~

Postgres FM is produced by:

Michael Christofides, founder of pgMustard
Nikolay Samokhvalov, founder of Postgres.ai

With special thanks to:

Jessie Draws for the amazing artwork

Creators and Guests

Host

Michael Christofides

Founder of pgMustard

Guest

Chelsea Dole

Staff Software Engineer (and Tech Lead of the Data Storage Team) at Brex, and speaker at several prestigious Postgres conferences over the past couple years.

What is Postgres FM?

A weekly podcast about all things PostgreSQL

Michael: Hello, and welcome to
PostgresFM, a weekly show about

all things PostgreSQL.

I am Michael, founder of pgMustard,
and today I am joined by

Chelsea Dole, staff software engineer
and tech lead of the data

storage team at Brex, and speaker
at several prestigious Postgres

conferences over the past couple
of years.

Thanks so much for joining me,
Chelsea.

Chelsea: Thanks so much for having
me, Michael.

Michael: Wonderful.

Well, I have seen several of your
talks and loved how well you've

explained a few concepts now.

One of those was bloat.

And looking back at our previous
episodes, I realized we hadn't

actually done an episode on bloat
itself.

We've done plenty that mention
it, that are around it, but I

loved your explanations and I'm
looking forward to having a chat

with you about what it is, how
people can think about it, and

some strategies around it.

Chelsea: Thank you.

Well, as a PostgresFM loyal listener,
I've definitely listened

to a couple of those bloat sphere
conversations, let's say.

So it's nice to be addressing it
more directly.

Michael: Yeah, awesome.

So in terms of where to start,
I guess we should cover what is

bloat.

How do you think about it?

Chelsea: So Postgres bloat basically
occurs as a function of

MVCC and it's kind of an extension
of the autovacuum process.

So in MVCC, all the actions you're
doing, inserts, updates, deletes,

those are all actually updates
or editing metadata on a tuple,

not hard-deleting it in place.

This basically allows MVCC to be
both compliant with ACID principles

and still fast, so it doesn't slow
things down too much.

When those tuples are generally
hard-deleted is through vacuum,

which runs every now and again,
depending on your configurations

and autovacuum.

But if autovacuum can't keep up,
and you have lots of those

inserts and updates and deletes,
then you can get into a state

where your Postgres pages are basically
bin-packed with a bunch

of dead or soft-deleted tuples.

And that leads to bloat.

So bloat is the state where you
have these table pages that are

full of basically useless data
that vacuum is hurrying to kind

of run around and catch up.

And so Postgres has to keep on
adding new empty pages to the

end of your heap, which leads to
all sorts of, I'd say, non-optimal

outcomes.

So too long, didn't read.

It's unoptimal tuple density in
your pages.

Michael: Yeah, I really like that.

So if we end up in a situation
where a large proportion of our

table, maybe including the indexes,
is sparsely stored on disk,

that has knock-on effects and that's
referred to as a table with

a lot of bloat.

I've thought about it in the past
as almost the diff between

the table's current state and if
we completely rebuilt that table

with its indexes.

Like the diff I think of as bloat.

I don't think it's the only definition
that's acceptable because

of like fill factor and things
like there are other technical

details but I like it because I
think it's quite practical and

in terms of what it means.

So what but why is this a problem?

Like when have you seen this
cause issues and how bad are

these issues?

Chelsea: There's a huge spectrum
there.

And I would first preface by saying,
I wouldn't call this a hot

take, but maybe a lukewarm take
is that bloat is really not always

a problem.

I think that a lot of people think
of this as this, oh God, our

tables are bloated, what are we
ever going to do?

But there's plenty of situations
where you have a bloated table

that's serviceable and you're able
to get it to a slightly better

state without any drastic measures.

But the main issue that bloat can
cause, which can lead to downstream

issues, is really increased IO.

And IO is sort of the grandfather
of all these downstream issues

that nobody likes, like high latency
on your reads.

Michael: You

Chelsea: know, it leads to IO because
essentially if you think

of it logically, like if I have
10 rows across 2 pages, if I

do a sequential scan, I've now
scanned 2 pages.

That's a certain amount of IO.

But if I have really bad tuple
density, I've got table bloat,

then maybe I'm scanning 10 rows
or 10 tuples across 8 pages.

And so I'm scanning the same amount
of data, but I just had,

you know, 4x the IO.

So that can lead to downstream
negative effects with Reads, of

course.

And you guys have talked a couple
of times about, you know, explain

buffers and things like that.

So that's a really good way to
sort of see off the cuff, you

know, whether you're using an appropriate
amount of IO.

But as I said, there are places
where you can have, you know,

some amount of table bloat and
it's not really causing an issue.

I would say that where I look for
it as the biggest long-term

issue to solve and really address,
is those workloads that are

going to be very update and delete
heavy.

Having some bloat on a normal workload,
if your users aren't

seeing effects on latency through
the IO, I would sometimes just

say shrug, you know, the important
thing is the end-to-end experience

and the long-term maintainability
for you as a dev.

Michael: Yep, love that.

And very practical as well.

I've seen a couple of your talks
on this.

And you mentioned IO, but you mentioned
it quite late on.

I'm like, oh yeah, it is as simple
as that really.

Obviously there are other bad things.

Like it's taking up more disk space,
right?

Like it's taking up more space in
memory, but ultimately the user

experience wise, that's normally
what a lot of us are fighting

with, especially on busy systems.

So slow queries that users are
reporting is a natural sign.

How do you go from that to realizing,
oh, my problem is a bloated

table?

Chelsea: I think there's sort of
2 paths for that.

In my experience, the first one is
before you ever know what bloat

is, how do you discover bloat?

And I would say that usually the
path is you have a really bloated

system and there's really no other
explanation for it until you

go down those paths and try to
figure it out.

For myself, just sort of going
through how I discovered table

bloat, thinking back, you know,
I've worked, I would say, in

Postgres at scale, at the last two companies
I've worked at.

Before that, with it more as like
a passing, just a tool I happened

to use.

And I saw it in great scale when
I first started at Brex.

And the way that I actually saw
it is that we had a developer

reach out.

And they said, hey, you know, I
have this, let's say, 50 gigabytes

of tables total on this server,
there's only one database, and

for some reason, we're almost running
out of disk.

Why do I have so?

Where's all this space going?

Because I can see my table size
and then remaining table space

on disk.

We use RDS on Cloud, so you're
able to see that pretty easily.

I went and I said, I actually don't
totally know.

Let me check this out.

When I went and checked it out,
I could see that there was just

a ton of bloat.

Here, interestingly, the bloat
was actually coming from a TOAST

table bloat.

Michael: Oh, nice.

Chelsea: I know, which is an interesting
little side quest here,

conversationally.

But TOAST is just other tables.

The oversized attribute storage technique
or whatever fun acronym they

made up with it.

Those are just tables under the
hood too, so they can also get

Bloat.

So I'd say that was my first time
needing to go through the entire

song and dance of bloat that needed
to be fixed through, I would

say, like strong measures versus
maybe just tinkering some little

stats here and there.

But I think that once you know
what bloat is, it's pretty easy

to look out for it.

Operationally, there's observability
tools and dashboards.

You can instrument your own.

I think a whole lot of DBAs have
their own private stash or ideally

GitHub open source stash of fun
queries they like to use or extensions

they have in their back pocket.

And we can dive a little more into
those if you want.

But I think that there's sort of
those 2 paths.

There's the path of, oh, God, what's
going on?

And the path of once you've seen
it, you can kind of pattern

match.

Michael: Yeah, nice.

I like the system-level like path
in and I've all, because of

my background and the tool I work
on, I see it more from the other

direction, so somebody looking at
an incredibly slow read query

that has no right being that slow
and using explain analyze buffers.

And as you mentioned, when we say
incredibly bloated, you could

easily have at least before Postgres
14, which had some great

optimizations in it, you could
have an index that was 10 times

bigger than it needs to be.

If you, let's say, you rebuilt the
index concurrently, the result

afterwards might be 10 times smaller.

So that's a significant amount
of extra reads you might

be doing, especially if you're
returning a bunch of rows.

So you mentioned like 8 reads instead
of 1, but if you're looking

at the last thousand rows, if you're
doing 8,000 instead of 1,000,

you start to notice that in latency.

So it's quite cool.

We, in fact, we used to call the
tab bloat likelihood,

and I renamed it a couple of years
ago to read efficiency, partly

because of your, like it goes back
to your density question again.

It's not necessarily bloat, but
it's not necessarily

about the likelihood of it.

It's more about the efficiency
of those reads, and it might be

a locality issue which is somewhat
bloat-related as well.

So yeah, love this.

Great.

So you mentioned, and you've spoken
before about queries to estimate

and other techniques for looking
into like getting more, maybe

accurate or, depending on your definition
of accurate, measures

of this. What's your advice on
minimizing this, dealing with

it?

Chelsea: Yeah, I guess the first
thing to the measure of how

to identify whether you have bloat
or how much you have and decide

What to do with it.

The first step there is really,
again, kind of a trade-off of

what matters to you.

Does the speed and low system impact
matter to you in this sort

of DBA process of figuring out
how much bloat you have?

Or does accuracy of the exact percentage
of bloat or the exact

tuple count and being 100% sure
that your stats are right matter

to you.

And so again, kind of like I said
in the discussion about how

bloat can kind of be okay or not
something you really need to

deal with in certain situations,
this is one where you kind of

get to decide your priorities.

If your priority is 100% accuracy,
or I would say also, if you

have downtime hours on your database
where you can easily do

this without any user effect, or
if you have a small system and

for whatever reason, you happen
to be bloated through your own

write patterns, but it's not actually
that important, then I

would suggest pgstattuple.

pgstattuple is a Postgres contrib
module.

Basically, it gives you some functions
that you're able to run,

and they will basically do a sequential
scan through a target

table, or I think through a target
database if you want to run

all tables, and it will return
to you the count of live dead

tuples, free space map stuff, as
well as a few other statistics.

That one I would say is on the side
of you have resources to spare,

you're not gonna impact users because
CPU does spike during this.

And there's no way to get around
the sequential scan because

you can't exactly index it.

The point is to actually look at
the pages.

So it's always going to be a sequential
scan.

The other option is through using,
I would say, estimation queries.

And these tend to leverage internal
tables already used by Postgres

or kept up to date during the Analyze
process.

So these would be like pg_class
reltuples and things like that

will estimate the number of live
dead tuples you have based on

a sampling during Analyze.

So before you run anything that's
an estimate, you do want to

run Analyze, you know, right before,
but then you're able to

guess a proportion based on a sampling
of your table data.

This is a better option, I would
say, for really high-criticality

systems or extremely large tables.

If you have a 500-gigabyte, 800-gigabyte
table, or even just

one that's 80 gigs, but is very,
very critical for users, and maybe

you're already kind of medium on
CPU, you can't stand those resources,

then there's really no downside
to just using a table sample.

There's no reason that this number
needs to be precise.

Michael: Yeah, so what is alarming?

Well, I guess the answer is it
depends.

But you've in fact, I've rewatched
your talk recently.

So I know I'm stealing from you
there.

But you've got some great rules
of thumb on what you consider

to be bad on the table bloat front.

Chelsea: Yeah.

Well, I don't think I could possibly
stall, it depends.

Cause that's just like everybody
in engineering should be

saying all the time, but you know,
my personal rules of thumb

speaking just for myself is that,
on very small tables, bloat

is not a problem.

A gigabyte, 2 gigabytes, that as
I'm saying is very, very small.

Even if you would see up to 75
percent bloat, out of vacuum we'll

be able to take care of this, the
total impact on your system, there's

just no way that it can be high
or significant, you know, knock

on wood.

I'm sure somebody will comment
with some counterexample, but

at least in my mind.

Beyond that, I would say if you
get to a 1 to 30 gigabyte range,

I would say 25% bloat.

So 25% of your total table space
being taken by dead tuples is

acceptable.

And then as you go higher, from
about 30 gigs, I would say you

want to inch that acceptability
downwards.

So I would say once you get up
to 100 gigabytes, I would aim

for like 18% and then I would flatline
at about 18%.

1 important thing is you should
never expect a 0% dead tuples.

If you completely rebuild a table
or if you tweak every autovacuum

setting or you've used an extension
and repacked it, you still

will have some bloat and that's
okay.

Michael: Yeah, right.

Like unless you're gonna show off
because you've got like an

append-only table with no bloat,
but if you've got updates and

deletes, it's a trade-off of the
system, right?

It's the trade-off for the design
that Postgres has given us.

You've recommended this talk a
few times.

I'm most of the way through watching
it, but Peter Geoghegan's given

another great talk that is criminally
underwatched at 200 views,

so I'm going to share that.

But he mentioned some of the trade-offs
that Postgres is

making in this area.

So it's guaranteed, I think as
soon as we start doing updates

and deletes, it's guaranteed that
at least past a certain frequency

of those updates and deletes, we're
going to have some bloat.

So yeah, it's a really good point.

Chelsea: Yeah, there's no whole way
to avoid it.

It's more about mitigating the downstream
effects and making sure you

don't accidentally look up and
realize you're in a crazy situation

in a couple of months or years.

Michael: Where did, I'm curious
though, I understand totally

as you get larger data
volumes, trying to aim for a smaller

percentage of bloat, because like
in real terms, that's more gigabytes

of bloat or more pages that could
be being scanned through for

reads and things like that.

But did 18% come from trying to
get it lower than that, like

fighting some update-heavy tables?

Or where, that seems quite a specific
number to me for like a

rule of thumb.

Chelsea: Yeah, great question.

And actually, this is the first
time I've been asked this question,

which I agree.

The first answer to why 18% or
why these numbers is, as someone

who watches Postgres talks and
as somebody who takes it in, I

have to admit, there's part of
me that gets annoyed when nobody

gives real numbers.

I realized that it comes from the
reality of it depends as we

already kind of espoused on, but
I really wanted to provide something

concrete.

When I gave those numbers, I thought
about my previous projects

and I thought about, okay, well,
what happens if I completely

rebuild this table or I repack
it and don't insert anything?

Usually, you would see a table bloat around 8 to 10% then still,

at least in my experience, because
if you're rebuilding a table,

at least if you're using an extension
like pg_repack, for example,

or pg_squeeze, you know, it's still
gonna need to kind of build

up and write all the live data
into it as it's rebuilding the

duplicate table.

So you're not gonna end up at like
0% unless you, I believe,

I assume, unless you VACUUM FULL.

And so 18% came out of, I'd say,
like opportunism of what I wanted

to provide, and also just on experience
of when I started to

see a tipping point into performance,
and when you start to get

far enough that you can't really
recover it without rewrites.

So the reason I would say 18% is
not because 20% or 25% even

on a large table is the end of
the world, but because I think

once you start slipping into that
more bloated table space at

large tables, it's harder to recover
from.

And it needs more drastic measures
than a little tuning here

and there.

And also, if you imagine you have
a, you know, 1 terabyte table,

which I know that I've had before
at companies, then if you get

to 20% of dead tuples, you now
have, you know, 200 gigabytes

of just bloat.

So at scale, you know, that becomes
money, whether you're in

the cloud or not, you're paying
for that disk.

That's kind of another downstream
negative effect other than

IO.

So it's kind of, it's money, it's
personal experience, and it's

also just wanting to put a sticker
and a number on something.

So you have a target to aim at.

Michael: Yeah, I really like that.

I agree, it's difficult without
any real numbers, but it also

like you could have said 15% I probably wouldn't

have questioned it.

It's just more it's more practical.
It's more likely to be based

on your real experience, which
I liked a lot. Cool.

So I wanted to come back to the,
you mentioned TOAST and the

acronym or probably backronym.

I don't know if you've got that
phrase here.

That's a really fascinating case
that I don't think I've not

seen, I've not spoken to anybody
about, but it makes intuitive

sense because of the nature
of TOAST, we could have potentially

large values across multiple pages,
but I would have guessed

they would be more easily reused,
but I'm not, but that's a total

guess.

I've yeah.

Do you, Do you have any more details
on that?

Chelsea: Well, I'm sure I'll get
some of this possibly wrong,

but I think, actually, it's surprising
that we don't see TOAST

in more situations of bloat.

Because when you're updating a
value in TOAST, To my knowledge,

it's not able to be as intelligent
as other update methods can

be.

You're actually replacing or updating
more often than not.

TOAST tables themselves, as I guess,
a background in case anybody

listening is not familiar, What
happens is if you have an extended

field, you know, you have extended
and non-extended data types

in Postgres.

If you've an extended field, so
things such as JSONB or, you

know, VAR cars with I think longer
limits or byte A, then all

these are examples when they can
go over the maximum single tuple

size.

If they reach that threshold, then
Postgres will compress it

through some method depending on
your Postgres version, and they'll

just store it in a separate table
called TOAST table.

In the main table, let's say you
have a users table, then that's

basically just a pointer to this
TOAST table.

In the example I mentioned before
when I ran into bloat on the

TOAST table, the reason you would
see this is maybe you have

a small table even, it could be
a 5 gigabyte table, but every

row in this is extremely large
data types.

I'm sure we've all seen those tables
where there's 5 columns

and each of them is like a massive
JSONB value.

They could be updating those all
the time.

You could be doing field updates,
you could be doing anything.

And if you get up to a certain
TPS on that, then every single

time you do an update, it's gonna
be rewriting that value in

the toast table.

And Postgres does look at TOAST
as far as I know, like just any

other table.

So autovacuum is running on it the
same exact way.

You know, I could talk on partitioning
and I kind of say the

same.

Partitioning to me is like UX.

It's DBA UX.

We see it as 1 table.

Postgres just sees tables.

And same thing with, same thing with TOAST.

And so, in that case, we had a small table that was just super

high update for those very large fields.

Correct me if I'm wrong, Michael, maybe you know more about this,

but I think that the nature of how TOAST does those updates is

they're not able to do, like, I think, key-value updates in TOAST

fields.
They have to always rewrite.

Does that sound right to you?

Michael: Yeah, well, it sounds like any other value in Postgres,

right?
Like, it's a single entity.

So like, even in a large text field, without it being toasted,

if you had it pasted in the entirety of War and Peace and then

only changed 1 word, but it was all a single value, I believe

we'd need a new tuple with, well, bad example, because that one

definitely wouldn't fit in a single page, as it would be toasted.

But yeah, if you, let's say you took a 1-kilobyte chunk of text

and changed only like a few characters in it, I think it would

work the exact same way.

So, yeah.

Chelsea: The only caveat to that, I believe, is that when you

TOAST something, toast an object, you toast it in chunks of a

discrete size.

So, if you did toast War and Peace, then, you know, it would

be, you know, you would run into the tuple length issue in TOAST

as well.

TOAST doesn't have a longer, you know, maximum tuple length.

It's just that it will chunk it via some algorithm, compress

it.
And then so War and Peace will actually be a pointer to let's

say 10 tuples all of them compressed.

So I believe that when you edit 1 word, you know, you fix your

spelling mistake, you know, Leo Tolstoy really has to go back

and fix that.

Then when he re-TOASTs it, then he has to recompress and re-chunk

all those 10 values again, not just the one.

So I think it might be, you know, we're figuring this out, really

talking about it, I think it might be something that scales the

issue with depending, with increased size of the object itself.

Michael: Yeah, I'd love to hear from anybody who has dealt with

this or looked into it or written about it.

Sometimes we get some quite in-depth responses, which we love.

The question I guess is, let's say we've got something that's

spanning tens of kilobytes, so multiple pages, your 8-page example

might be good.

And we rewrite it so we get 8 more
pages.

The question is what happens to
the 8 previous ones?

Once they get marked as dead and
if they could get reused easily,

we shouldn't get too much bloat.

But if they're getting, I guess
if autovacuum can't keep up,

that's when it accelerates.

So it would make sense in a system,
for example, where autovacuum

isn't able to keep up with that
toast table for some reason,

or maybe it's been disabled on
that table.

Chelsea: Yeah, totally.

That can also be a place where
you need to tune autovacuum max

workers higher, because autovacuum
max workers defaults to 3.

And this doesn't really have an
impact depending on your table

size, it has impact depending on
your table count.

If you have, I would say, more
than hundreds, so getting into

thousands of tables, that's or
and many of them are large, maybe

the Autovacuum worker takes a long
time on 1 table.

That's where you're going to want
to start tuning that up, giving

extra resources on your server
over to vacuum compared to servicing

queries.

I would say for the greater good.

Michael: Yeah, and just to be clear
when you say like hundreds

of tables We're counting each partition
as a table in that case.

Chelsea: Yeah

Michael: Yeah makes a lot of sense.

Cool So anything I haven't asked
you about that I should have

done or any other tips for this?

Chelsea: Let's see.

I think the only thing we didn't
get around to that I think I

would want to make sure I shill
is some more details about how

to fix a bloated table once you
see it.

So if you have discovered you have
a bloated table, let's say

that you're above that metric,
that rule of thumb, maybe well

above it.

Like I said, when I found that
toast table, it was like 95 percent

bloat.

That explained a lot.

I would say that at that point,
there's a couple routes you need

to go down.

The first is to recognize that
bloat is caused by a combination

of autovacuum configuration and
update and delete heavy workloads.

Just 1 of them won't get you there.

It's both together.

And so if you're only a little
bit bloated, maybe you just kind

of keyed onto it via some, you
know, observability metric or,

you know, warning level ping you
got in some place or other.

At that point, I would recommend
going the route of tuning your

autovacuum configurations, as
well as really talking to the

dev team or working as a dev yourself
to figure out whether there's

anything you can do to reduce the
volume or difficulty of those

updates and deletes.

Some common anti-patterns I see
are things like cron jobs that

run every day or week and delete
a huge amount of data.

And they often think they're being
helpful when in reality they

can be kind of degrading the quality
and I.O.

Performance of that database.

In terms of tuning configurations,
usually you want to go one of

two broad ways.

You either give the server more
resources for Autovacuum through

Autovacuum Max Workers, or you
tune it to run more frequently.

So you tune Autovacuum to actually
run more aggressively, which

I generally recommend based on
system defaults.

Autovacuum did get more aggressive
in more recent versions of

Postgres.

However, it's still generally good
to tune up like Autovacuum

Vacuum Scale Factor, especially
if you have those large tables,

you know, defaults to only triggering
Autovacuum when 20% of

your table is dead rows.

So, you know, that's already beyond,
I would say, my recommended

18% goal.

So if you really wanted to trigger
more proactively, you would

need to tune that down from 0.2
to let's say like 0.1 or far

less.

You know, I see a lot of recommendations
online that I've used

as a user that suggest, you know,
every 1% or 3% of dead tuples.

Michael: Yeah, because for a huge
table, that's still a lot of

tuples.

We could still be talking about
tens of thousands, hundreds of

thousands, even millions of tuples.

Chelsea: And by the time that it's
actually done, you could be

up higher than that because that's
what triggers.

What if it takes hours and hours?

Michael: Yeah, absolutely.

Chelsea: If you do get really,
really far behind, you check the

clock and you're up to that 90
percent bloat table, that is a

good time to consider rebuilding
the table.

If you can afford it, vacuum full.

Most people can't these days if
it's a user-facing application,

so that's the reality.

But I always say that first because
there are situations in which

you can, again, looking from the
user perspective, and it's a

lot easier.

The other thing you could do is
use an extension like pg_repack

or pg_squeeze to rewrite it.

And this basically creates a
shadow table duplicate schema.

It will copy over all the data,
use triggers to update all the

incoming data coming in from one to
another.

And then once it's all caught up,
it will within an access exclusive

lock on the which lasts you know
less than a second.

Definitely.

It will basically switch the table
names to the prod traffic

points towards the new table.

This is something that I have seen
be flaky.

I wouldn't personally recommend
automating pg_repack.

I've seen and heard of cases in
various talks and just through

friends in the industry, people
that try to automate pg_repack.

And, you know, I've seen it lead
to a lot of incidents and issues.

You know, I've personally never
run into an issue where there's

any data loss, because in the case
that you just kill the PID

of pg_repack, for example, you
just have some dead tables you

have to manually drop.

The duplicate ones aren't complete,
so you use the old one.

But I've heard of it causing other
issues, so I would say be careful,

but I wouldn't stray away from
it totally.

Michael: Nice, I've always thought
of it as like, reindexing concurrently,

but for the whole table.

Is that a reasonable comparison?

Or I guess it's missing some subtlety
around potential flakiness,

but that can fail as well, right?

Like if reindexing concurrently
fails, you can end up with some

invalid indexes.

Chelsea: I think that's a really
good analogy, actually, mainly

because they're both nontransactional.

Because concurrently or anything
concurrent, the thing that it's

an unusual move by Postgres, and
I'm sure there was a lot of

discussion on the core team about
this when they first started

releasing concurrent features,
because it's a decision by the

core team to value the user Postgres
experience by DBAs and applications

over the strict rules of MVCC.

Because when you create something
index concurrently, if it fails,

you have the invalid index.

So it's not atomic, it's not able
to roll back.

The same thing with pg_repack.

If it fails, then you have these
invalid tables and valid indexes

that you need to drop.

If you try to rerun it, it's not
a no-op.

You'll have to clean it up first.

Michael: Yep, cool.

Okay.

Oh, wow.

That's good.

So that's a really good point in
terms of autovacuum and in

terms of repacking or vacuum full
if you can afford the heavy

locks, or if your system just doesn't
have any users at a certain

time of day or something like that
right but it is, it's rare

but it is common enough that I
have been caught out by not recommending

it a couple of times which is super
interesting.

CLUSTER is in the same category
right, like same as VACUUM FULL

but you get to order it by an index,
which could be helpful for

like reads.

Definitely.

Cool.

Last, in fact, you mentioned the
right at the beginning and I

had to bite my tongue not to jump
straight on the pun because

that's how my brain works.

You mentioned having a hot take.

So you talked quite a lot in the
past about access patterns.

And one point I loved was the idea
of considering if you've got

an update or delete heavy workload
which could be the reason

you're in this situation.

Do you even need to be doing those
updates and deletes?

That's a question that doesn't
get asked very often, and you

made a really good point.

So there's that angle that I'd
love you to talk on if you want

to.

And there's also the hot update
optimization we have in Postgres

that can be a huge help for some
avoidings.

If you're aware of it, not indexing
a certain column if you don't

have to.

There are some trade-offs there that
might be interesting.

I don't know if you've got experience
with those.

Chelsea: Yeah, I guess so first
address the first part of it.

Glad you brought it up because
this is definitely

a strong opinion of mine.

And I think that's something that
comes from coming to being

what I would describe as somewhere
between the liminal space

of a software engineer and a DBA
through, you know, backend engineer

to data engineer to DBA, just,
you know, sinking my way down

into infrastructure.

And, you know, I think that I still
tend to think from a perspective

of a backend or data engineer a
lot of the time.

And from that, I think that it's
good for us to all remember

that so many of these access patterns
and rights, the biggest

hammer you can use is to just not
do it or to rewrite it.

If you're a DBA managing a couple
or hundreds of databases, speaking

for myself, I'm managing the hardware
and the high-level metrics.

I don't really have access or knowledge
about talking to somebody

into the why.

Why do we have this query?

Why do we have this database itself?

What the heck's in it?

I think that if you really want
to address bloat, often the best

thing you can do is to start a
conversation and say, hey, what's

the value of this?

Can we simplify it?

Do we need to be deleting this?

Do we need to have this table?

You know, it's crazy how many times
that I've dealt with a problem

that way, and I've never needed
to delve into the more, I would

say, extensive measures.

And also, if you can, keep those
relationships with people

at your organization or whoever
you're working on a project with

to try to be able to let them keep
that up by themselves.

You know, at my company, we've
built some automation around let's

say like auto education as far
as we can.

We're still working on it, but
a way to kind of allow developers

to be proactively educated about
how they're building their systems.

And so I think that as much that
you can do that and just, I

would say, change the patterns
from the code side is the quickest

way.
That's a PR.

Michael: Awesome.

I was not expecting it to go that
way.

What's this automatic education
thing?

Is there anything you can share
there?

Chelsea: Yeah.

I will mea culpa here and say
that I wish I could say it were

a more, a more, you know, like
amazing system than it probably

is, but we've used, for example,
git hooks and GitHub webhooks

to automatically comment documentation
that we've written on

PRs.

For example, if we see running
a migration, we pin the migration

guide to your PR rather than requesting
us as reviewers because

I work in an organization of larger
than, you know, a thousand

people.

So I don't wanna be a blocker on
your migration, I want to educate

you.

Same thing with partitioning.

I wrote, after we dealt with partitioning
stuff, you may have

noticed that a lot of the talks
I write are based off whatever

the heck I'm struggling with at
work.

And so I wrote a very in-depth
partitioning migration

guide, both for future people on
my team, as well as people who

might want to partition and need
to understand why, how, whether

it's a good idea.

So I think that creating documentation
is good, but we all know

it falls out so quickly.

You change 1 thing, it's deprecated,
you forget about it, the

person leaves.

So I think that the underappreciated
side of it is figuring out

easy systems where you're auto-commenting
it or you're pushing

it to people in a way that
actually keeps it actively read.

Michael: Awesome.

Sounds great.

Is there anything else you'd like
to plug or give a shout out

to?

Chelsea: Not particularly.

I think that I'm sort of a relative
newbie to the Postgres community

being involved in the open source
side.

I went to my first Postgres
conference last year and then

sort of ran out it at 100 miles
an hour ever since then, which

has been really fun to get involved.

So I guess I would just say thank
you to all you guys for inviting

me in.

It's been a great past year being
more involved in Postgres.

Michael: It's awesome to have you
in the community.

I still feel new, and I feel like
I've been here for like 5 or

6 years.

So it's awesome having you here.

You're a wonderful contributor
to the community.

Your talks are great.

Please keep giving talks on issues
you're hitting with Postgres

that are some of the most fascinating
that the community can have

and also not common enough at conferences,
I think, personally.

So, yeah, appreciate it.

And appreciate the real numbers
as well.

So, thanks so much Chelsea.

Chelsea: Yeah, thank you for hosting this.

I listen to you guys, as many
people do, in the car. So, you know,

keep me going with good things
to read while yelling at

various drivers on the road.

Michael: Well, apologies, you're
probably going to have to skip

a week.

Chelsea: Okay.

Yeah.
Yeah.

Listening to my own voice is too
hard.

Michael: Tell me about it.

Take care, Chelsea.

More episodes

Chapters

Creators and Guests

What is Postgres FM?