Postgres FM

Nikolay and Michael discuss Postgres running out of disk space — including what happens, what can cause it, how to recover, and most importantly, how to prevent it from happening in the first place.

Here are some links to things they mentioned:

Disk Full (docs) https://www.postgresql.org/docs/current/disk-full.html
pgcompacttable https://github.com/dataegret/pgcompacttable
Our episode on massive deletes https://postgres.fm/episodes/massive-deletes
Getting Rid of Data (slides from VLDB 2019 keynote by Tova Milo)
pg_tier https://github.com/tembo-io/pg_tier
Data tiering in Timescale Cloud https://docs.timescale.com/use-timescale/latest/data-tiering/
Postgres is Out of Disk and How to Recover (blog post by Elizabeth Christensen) https://www.crunchydata.com/blog/postgres-is-out-of-disk-and-how-to-recover-the-dos-and-donts
max_slot_wal_keep_size https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-MAX-SLOT-WAL-KEEP-SIZE
Our episode on checkpoint tuning https://postgres.fm/episodes/wal-and-checkpoint-tuning
Aiven docs on full disk issues https://aiven.io/docs/products/postgresql/howto/prevent-full-disk

~~~

What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!

~~~

Postgres FM is produced by:

Michael Christofides, founder of pgMustard
Nikolay Samokhvalov, founder of Postgres.ai

With special thanks to:

Jessie Draws for the elephant artwork

Creators and Guests

Host

Michael Christofides

Founder of pgMustard

Host

Nikolay Samokhvalov

Founder of Postgres AI

What is Postgres FM?

A weekly podcast about all things PostgreSQL

Michael: Hello and welcome to Postgres
FM, a weekly show about

all things PostgreSQL.

I am Michael, founder of pgMustard.

This is my co-host, Nikolay, founder
of Postgres.AI.

Hello, Nikolay.

What are we talking about this
week?

Nikolay: Hi, Michael.

We are out of disk space.

What to do?

Michael: Panic.

Nikolay: This is what Postgres does, right?

Michael: Yeah, in some cases, right?

Nikolay: Yeah, well, I saw it a
lot.

I recently saw it again, actually.

And yeah, it's not fun.

Why panic?

Don't panic, right?

Michael: Keep calm and carry on,
right?

Nikolay: Yeah, it's Postgres, you
should, yeah.

Just use Postgres, you know.

Yeah, so as we discussed before
this call, before recording,

Let's just talk about possible
reasons, mitigation and avoidance,

right?

Yeah.

Prevention.

Prevention is better.

Avoidance.

Michael: Avoidance.

Denial.

It's like the stages of grief.

Nikolay: It's not my problem, actually.

Yeah, I'm a DBA.

It's an SRE problem, right?

They should just give us more space.

To start about problems, I mentioned
it before the call, but

let's repeat.

The only reason, The only true
reason of this is an insufficient

disk space allocation.

It's always so.

And it's a joke, but we will be
returning to this joke, I'm quite

sure.

Because sometimes you do need more
space, that's it.

Of course, I would deprioritize
this path and consider everything

else.

But sometimes we end up with this
reason and mitigation action,

like just add this space.

But let's consider this as a last
resort, right?

Michael: Well, yeah, I like your
structure.

So, causes, then recovery, and
then mitigation or prevention.

Mitigation is recovery.

Yeah, okay, great.

Prevention being the last 1.

So, causes.

What are the most common reasons
you see this happening to people

in the wild?

Nikolay: Good question.

There are many reasons.

I don't have statistics.

Michael: I can tell you the 1 I
see most blog posts about, because

I don't see this that often, but
the 1 that comes up time and

time again in blog posts is an
open replication slot.

Nikolay: Oh, it's too narrow.

Let's just say a lot of WAL written
and it remains, a lot of

WAL files remain in pg_wal directory.

Or pg_xlog if we talk about ancient
versions of Postgres.

Michael: Yes, but that's the root
cause of why it's the most

common 1 I've seen.

Nikolay: I think it's a super popular
reason, but I would structure

this like if we need to provide
a comprehensive list of reasons,

I would put it to the second place
and for the first place I

would put...

And not only because of slots,
slots is just 1 subsection here.

Whole section is just...

PG_WAL directory takes a lot of
disk space.

This is the big class of reasons,
right?

But the very first class of reasons,
I would say, anything related

to the data directory itself.

Like a lot of disk space consumed
by tables and indexes, right?

It's also like a whole class, so
we have specific reasons inside

it.

Michael: Yeah, so we actually have
more data than the size of

Nikolay: the dictionary.

Than we expected.

We tried to delete, but the space
is not free, and so on.

Michael: Lots of bloat, for example,
that we talked about.

Nikolay: Data-related, all related,
and the third class, probably,

or category is probably anything
else.

Anything else is tricky and interesting,
we will talk about it

soon, right?

So how about this

Michael: classification?

Yeah, just to give people an idea
of anything else, like you

including logs in that, like things
that can grow quickly that

aren't data, like main data directory
stuff or WAL.

Nikolay: Base sub-directory is
where data is stored.

So if we think about PGDATA directory
structure, we can think

about base, so tables and indexes,
right?

Materialized views as well.

And then PG_WAL sub-directory,
and it is huge, for example.

And everything else.

Everything else is interesting
as well.

Maybe also outside of PGDATA, but
let's not reveal secrets before

it's time.

So what about tables and indexes?

What do you think?

Like, possible specific reasons?

I deleted 90% of my table, but
it didn't help.

This consumption is still 99% what
to do.

Time to panic, right?

So like obviously delete doesn't
delete data, right?

It just marks it deleted, but it's
a two-stage process.

Michael: Well, and this is a tricky
1 because a lot of the times

we've discussed how to deal with
issues like this, we actually

require quite a lot of disk space
in order to repack or to actually

mitigate that problem.

The easiest way, the way to do
that in an online fashion, requires

double the size of the relation
or double the table size normally.

So it is a tricky 1 in this case.

Nikolay: Yeah.

Exactly.

Just a few days ago, I was discussing
this exactly problem with

1 of our customers.

And they mentioned that they have
a huge table which consists

of 90% of their database.

And to fight with bloat, they know
there is bloat.

And to fight with bloat, especially
in heap, it's obviously like

you need to use pgRepack, and to
use it, you need the same size.

Like, temporarily, you need to
have two large tables of the same

size, because it rebuilds the whole
table in the background and

then switches to it.

So yeah, this is a problem, and
if you don't have enough disk

space, oops.

But yeah, so I would say this is
edge case.

Normally we, there's no single
large table, which is like 90%

of whole database, right?

So normally it's not a big problem.

And if we keep free disk space
30, 40%, which should be so like

at least 25%, maybe, we have this
space and we know that it's

just temporary.

We like just when we're repacking
the heap, the table, right?

Not a problem usually.

But if you're in this situation,
well, not good.

Maybe you should do something else.

There are alternatives to PgRepack.

I think PgSqueeze alternative from
Cybertech also requires, also

has this issue.

It requires additional disk space
because it rebuilds the table.

But older solutions, like in this
particular edge case, I would

think I would return to solutions
which exist for quite some

time, several implementations of
not let's rebuild table, but

another idea.

It's called PG compact table.

Again, there are 3 probably attempts
from different persons to

implement this idea and all of
them I think are in Perl.

But the idea is let's issue some
updates which don't change data.

When you issue an update even saying
update blah blah set id

equals id where id equals some
number, you know a new tuple is

created.

Always, right?

So this tool is issuing these updates
in a smart way.

It checks which tuples are in the
end of table and it knows there

is bloat, so there is empty space
in first pages, for example.

So if we update tuples which sit
at the end of table, they will

move to the beginning of the table.

And once there are no tuples left
in the final page, final block,

it will be truncated.

By default, this is what vacuum
does, unless it's turned off.

And then we do the same with next,
from the end, the next page,

and so on and so on.

Like basically, this is similar
to like, I remember in Windows,

like defragmentation, right?

So it's moving data to different
places.

Michael: Remember that defrag screen
on Windows?

Nikolay: Yeah, yeah.

People love that.

People visualize it.

Yeah, yeah, yeah.

Maybe like red and blue, right?

So, yeah.

And this is much slower process
than using pgrepack for table.

But...

Michael: Also, in our case, if
we notice we've got 99% disk full,

maybe this kind of thing helps.

But if we're actually out of disk
space, even this isn't

Nikolay: an option.

Yeah, yeah, yeah.

I'm talking about mitigation here,
maybe too early, but of course,

if we already achieved 100%, we
cannot move, it's not what I

would do.

But you just raised a very good
point about extra disk space

needed, and I just remember there
are alternative solutions and

I think they don't require any
extensions so they could be used

anywhere.

True, true.

Like RDS, Cloud SQL, anything.

It's just simple updates.

Super slow, but not requiring extra
resources, which is great.

Also probably you will need to
make sure the tool is working

well with current version of Postgres
because they are quite

old, and again, there are 3 of
them.

I think Depesz also tried to implement
one of them.

Michael: That makes sense that
it's in Perl then.

Nikolay: This is a small zoo of
ancient tools.

And I like the idea, actually.

But it's not an emergency tool.

It's right before an emergency tool.

Trying to avoid it.

So yeah, this, right?

But as for indexes, probably yes,
when we rebuild an index concurrently,

like Postgres builds another index.

But I think this should not be
super noticeable because no index

can...

Yes, it requires extra disk space,
but we should have it if we

do it, right?

So anyway, if we delete, we should
think about, okay, those tuples

and bloat.

Keep it in mind.

Try to avoid massive deletes or
design them properly with MVCC

and vacuum behavior in mind.

Michael: We did a whole episode
on that, I think.

Nikolay: Right, exactly.

So yeah, I saw crazy cases when
some tables and indexes were

bloated like 99%.

Michael: Yeah.

And you saw these cause out of
disk issues.

Nikolay: Well, they contributed
a lot.

And if we keep it as is, we will
soon be out of disk space.

So it's definitely worth keeping
an eye on the situation in this

area, bloat and vacuum behavior.

And yeah, if you don't do it and
do it very infrequently, then

you once in 5 years you take care
of it and then you realize

out of 10 terabytes of disk space
you had, now it's below 1 terabyte.

And what do you do with the other 9
terabytes?

You keep paying the cloud provider,
right?

Because it's easy to add disk space,
but it's not easy to reduce

disk space because they don't have
such a function and it's possible

only if you create a new standby
cluster with a smaller disk and

then just switch to it, switch
over, right?

Michael: Yeah, But yeah, it's quite
funny seeing that message,

you know, this is a one way ticket.

Nikolay: One way ticket, yeah.

Michael: So I think those make
sense, right?

Normal data exceeding the size
of the disk, whether that's actual

data or whether that's bloat, makes
perfect sense.

Do you want to talk any more about-

Nikolay: Of course.

Actually, I still believe since
VLDB in Los Angeles, I think

in 2018, I still believe in the
keynote there.

I like this keynote about data
deletion is a huge, big problem

we engineers need to solve and
find better solutions.

So sometimes it's not about bloat,
it's about a lot of garbage,

duplicated data also.

And I saw many times when we start
paying attention to bloat

and pushing customers to fight
with it, suddenly they say, you

know what?

I had it like maybe a month ago.

You know what?

We don't need this table.

Let's just not spend time and just
drop it.

Drop it.

Drop this table.

Or truncate it.

We don't need the data.

And it's like, wow, some terabytes
are free.

So...

Michael: Yeah.

I see this.

I see this with like analytics
data, for example, people tracking

everything.

Like they don't know, they never
look at their analytics and

never look at who's using which
feature, but they track everything

because we might need it someday.

Or, you know, like that, there's
all these data use cases and,

And even the solution to this,
like, in terms of...

I've heard the phrase retention
policy.

I know it's not quite the same
topic, but talking about how long

we should retain data for, even
that we don't even mention deletion.

It's not deletion policy, It's
retention policy, which is quite

funny.

Nikolay: Also, maybe 1 of the last
things in this section I would

mention is it's becoming more popular
instead of deleting old

data to have some better approach
tiering, tiers of data storage.

And I know there is from Tembo,
there is extension called PG

tier.

First of all, timescale has this,
right?

But in cloud only, I think it's
not open source.

If I'm not right, if I'm not wrong.

And so idea is, let's consider
some data archived and it still

looks like it's present in Postgres,
but it's not stored on an

expensive disk which also has capacity
limits.

We offload it to object storage,
like S3 or AWS.

And this is a great idea, actually,
but I think some latency

issues will appear, and also maybe
some errors sometimes, inconsistencies

or so.

Like, it's interesting.

But it's a great idea.

And again, like I said, Tembo has
a PG tier extension, which

I think I hope to test someday
when I, or my team has an opportunity

and time for it.

It's super interesting to check
how it works.

And I think it's a great idea.

Like, if you have some huge tables,
maybe already partitioned,

you can go either with sharding,
or if some historical data is

not needed all the time, you can
just offload it to object storage,

which is like virtually infinite.

And then, okay, users' experience
some worse latency when reading

this old data.

Imagine e-commerce,

Michael: for example.

Just to quickly, I feel like this
is an old, kind of an old-fashioned

solution to this was table spaces.

Nikolay: Yeah, but table spaces
in cloud reality became not popular

at all.

Last time I used table spaces in
a serious project was more than

10 years ago, I think.

Michael: Yeah, but that's what
I mean.

We had this before, it's just it
had a different name.

Nikolay: Maybe, yeah.

Table spaces, yeah, you can have
cheaper disk attached to your

machine.

But I still don't like this particular
approach because I like

offloading to object storage more
than that.

Because imagine you have primary
and multiple standby nodes.

If you use table spaces and cheaper
disks, you are forced to

use the same structure of disks
on each node and it becomes still

expensive and slower, for example,
right?

While object storage, it's worse
in terms of uptime compared

to EBS volumes on AWS, for example.

But it's much better in terms of
reliability or vice versa.

Michael: Durability or other things.

Nikolay: If you check, yeah, availability
and durability, if

you check SLAs, S3s, I forget.

So if you check characteristics,
you think, okay, this is what

I actually would like to have for
archived data.

Maybe latency will be slower, worse,
right?

But if you think about e-commerce
and order history, for example,

if you're off a large e-commerce
website, if you have the ability

to offload all the activities and
users touch them very rarely,

but it's still good to have the
ability to see the old order

history.

But it's not needed often, so we
can keep it outside of Postgres

and evict from caches and so on.

And don't pay for our main cluster,
we just pay for S3.

There we also can have tiering,
right?

Usually.

And even automated.

Usually providers have these features,
like if it's older than

like 1 month, it goes to some colder
and cheaper space.

So I think this direction will
receive some more popularity and

will be better developed and so
on.

Turing for storage.

Michael: Once again, we've been
sucked into kind of like prevention.

Nikolay: I think yes, it's hard
to talk about reasons and then

yeah, like this is a big flaw of
the structure I proposed.

Michael: Well, no, I like it still,
but it feels like we haven't

really touched on...

Nikolay: The most popular, let's
talk about.

Yeah.

WAL, right?

We have some...

So, PGWAL is huge.

We identified it, which is actually
already some good skill.

Not everyone can do that, but if
you manage to identify that

your PGWAL data is huge, there
are a few certain reasons.

And there is a good article from
CyberTech about this.

Why Postgres doesn't delete all
files?

It's already time to exclude them.

Michael: Ah, yeah, the various
reasons.

Nikolay: This is exactly when we
need to apply this article as

the list of possible reasons and
just exclude 1 of them until

we find our case, our real reason.

So 1 of them you mentioned, some
replication slot, logical or

physical, not progressing and accumulating
a lot of changes.

It's not like accumulating, it's
not like slot.

Postgres doesn't write to slots.

It writes to PGWAL, a lot of WALs,
right?

But slot has position, and if it's
inactive or position is frozen,

consumers don't consume and don't
shift this position.

It means Postgres must keep those
WALs in the PGWAL directory

until it finishes.

And since recently, a few years
ago, 1 of, I think, Postgres, maybe

13 or 14, received a setting to
limit this, to have threshold

when we give up and say, let's
better to kill our slot, to destroy

our slot, but let's stop this situation
and WALs should be deleted

already.

And I remember this big fear in
a good engineer I worked with,

he was not Postgres, he was more
SRE, but with a lot of Postgres

experience.

And when, it was long ago, when
initially replication slots for

physical replication, we were introduced,
we had basically a

small fight.

Like I said, let's use it, a great
feature.

He said, no, no, no, I'm not going
to use it.

He said, this is super dangerous.

And he understood that the danger
is if some replica is somehow

stuck, the slot is not progressing,
the primary is out of disk

space, and this is the last thing
he wanted.

After lost backups, of course.

But over time, we started using
slots, but then this setting,

I think it's a great setting, you
can understand your disk layout

and how much free disk space you
can afford.

If you approach 90%, it's time
to kill slots.

So you can do some math and understand
that the maximum number

of gigabytes you can allow for
a lag is this.

Of course, over time, the database
grows and still this...

It still can be a problem, right?

Because if data grows, Tables and
indexes grow, and probably

your setting will not save you
from emergency, right?

Maybe.

But this is definitely 1 of the
reasons.

What else?

What reasons?

Because of failing archive command,
right?

Because for Postgres it's important
if the archive command is

configured, for Postgres it's important
to archive.

And if it's failing, any reason
can be.

Postgres cannot archive, so if
it cannot archive, it cannot remove

these walls.

And that's a problem.

So you should monitor archiving
command lag separately.

And if it's growing, it's very
bad for backups already.

But it's also bad because it can
hit you in terms of disk space.

So yeah, these processes, replication
and archiving of walls

are 2 processes that if something
is wrong with them, Postgres

cannot remove walls.

Another reason actually...

Michael: And by the way, this can
grow quickly.

I think people don't always realize
how quickly this can grow.

Yeah.

Some database.

Nikolay: Yeah, it can be small,
but very active database and

you can have terabytes of wall
generated per days.

It's actually good to know how
much you generate per day or per

hour per second.

Michael: So I think even with like
a low activity, as long as

something's happening every time
there's a checkpoint, I've seen

people with toy databases like,
you know, like free tier RDS,

be surprised that it's generating
like a couple of gigabytes

a day, like 64 megabytes per 5
minutes or so.

Adds up quite quickly with small...

Nikolay: Funny reason, I also can
mention, you have not a huge

database, but also quite active,
and you think, oh, it's time

for checkpoint tuning.

We had an episode about it, checkpoint
tuning.

Let's raise max wall size and checkpoint
timeout.

You raise it, but you didn't do
proper math in terms of how much

disk space you have, and end up
having too big, normally too

big, pgwall.

Because distance between checkpoints
increases, and Postgres

needs to keep more walls.

And this can be just a mistake
of configuration.

So yeah, after this, I think we
can move on and let's talk about

other reasons.

Michael: Yeah, what else do you
see?

Nikolay: I would put on the first
place in this category, log

directory, especially if it's inside
PGDATA, inside this main

Postgres directory.

Not especially, like, no.

Especially if log directory where
Postgres writes logs, if it's

in the same drive as a data directory.

For example, inside PGDATA.

Or just next to it but on the same
drive.

If logs are suddenly, like we have
some performance degradation

and log_min_duration_statement
is configured or auto_explain.

Or, for example, we decided, oh,
we need more audit and PG Audit

is a great tool.

Right, let's bring it.

It starts writing a lot to logs.

And if you didn't separate your
data from logs, you should separate.

I'm not talking about PgWAL, I'm
not a big fan of having a separate

disk for PgWAL, actually.

It was a thing in the past, but
recently I don't see benefits

from doing this often.

Benchmarks don't prove it and so
on.

So, it depends, of course.

It depends if it's your own data
center, maybe it's worth doing

this.

But if you don't offload logs,
regular logs, and keep it on the

same disk, then you bring PG Audit,
they suddenly start writing gigabytes

per hour, or maybe some problem
with rotation of logs.

Michael: Yeah.

Nikolay: Like retention or retention.

So they are not deleted properly,
for example.

And then this can hit you badly
and you have panic actually for

your database, and your selects
are not working.

But if you're offloaded to a different
disk, and at least with

logging collector enabled, I know
that database actually will be

working fully.

That's great.

Actually, recently, maybe a few
months ago, I had this experience.

I got used to it, Postgres is fully
down if we are out of disk

space, and I expected the same
behavior when our log drive is

full.

If we are flooded, if it's full,
I expect big problems.

Postgres didn't notice almost.

So okay, we cannot log, but we
keep working.

That's great, actually.

So this is super big benefit of
having different drive for regular

Postgres logs.

And even if it's slow, it can affect
performance, right?

But if it's 100% full, not that
bad.

Michael: Yeah, we've talked several
times about excessive logging

having an impact on performance.

Nikolay: Observer effect.

Michael: Yeah, and I've definitely
done benchmarks, you know,

this is the age old log min duration
statement 0.

Like what, why we recommend not
set or I recommend never setting

it to 0.

Because this, like you can generate
gigabytes in like a 30 minute

benchmark, like with pgbench or
something.

So it's, it's surprised me how
much it can grow.

Nikolay: Yeah, it's actually a
good idea maybe to just, you know,

like maybe to ask our bot to benchmark
with regular PgBench and

then to have sampling for logging
of queries in 1 way or another

and just to see how observer effect
grows and kills your performance.

Only people with small databases
can recommend log min duration

statement 0.

Michael: Or quiet databases, yeah.

Nikolay: Quiet, yeah.

Not active.

Small databases and small workloads.

Michael: That's it.

So yeah.

Or I guess super low, like turn
it to that only very, very briefly,

like a minute or 2 or something.

Nikolay: Yeah.

Yeah.

So for, for serious workloads,
you cannot do that.

I already shared my experience
putting big data, critical databases,

productions down and yeah, don't
repeat my mistakes.

Michael: As well as logging, what
else did you have in this category

of things?

Nikolay: Oh, that's a good question.

Well, let's not maybe spend too
much time discussing some unknowns

or external things like we have,
I don't know, something else

installed on the same machine and
using the same disks and suddenly

we are out of disk space.

Also it can happen sometimes.

Or maybe, for example, you do some
troubleshooting, you have

self-managed Postgres, you can
SSH to the box, you started sampling

every second some good logs, but
do it for example to home directory

and then it's full or something
like this.

So you always need to be careful
with that.

But interesting case when, for
example, you try to upgrade, you

have Patroni, or you some, I don't
remember exact details, but

in some cases Patroni decides to
retry provisioning of a standby

node, for example, and it fetches
PgData in the way you configured

with PgBaseBackup or from archives,
but it renames the old directory

to make it like backup, not like
local backup, right?

It just renames it and you end
up having 2 directories suddenly,

right?

Data directories, full-fledged,
like, wow, it can quickly be

a problem in terms of disk space.

What

Michael: else?

Does that mean like if you had,
let's say your data storage was

about 40% of your disk size, So
you thought you had loads of

headroom.

Suddenly you're at 80% plus whatever
else is on there.

Yeah.

Nikolay: Yeah, okay.

Wow.

Maybe large temporary files, but
it's quite exotic.

I like, it can be, they can be
huge, right?

But it's like, it's exotic.

Michael: I saw some Stack Overflow
posts mention that, you know,

people that were out of disk got
the error, but when they checked,

they did have free disk space.

And one of the suggested answers
mentioned check for temp file,

like your queries could be doing
a lot of temp file stuff, but

yeah, I've never seen that myself.

What about, well, backups is an
interesting one.

Like backups, if you, cause some
people recommend keeping some,

obviously keeping all your backups
on your, on the same disk

as your database is suboptimal
for recovery purposes, but keeping

some on it makes sense, especially
for huge databases, right?

Nikolay: I don't think so.

I would not recommend doing this.

Michael: But what about for recovery,
like no network?

Nikolay: Regular backups should
be separate, of course.

But if you do some operations manually,
for example, you decided

to back up some tables and just
dump them to some compressed

form and keep on the same drive,
well, it might happen.

But there are many mistakes you
can do manually.

Or you just create a table select,
for example.

You basically just clone the table
and then forgot it.

Well, many such mistakes can be
done.

I think there are many exotic situations
we don't discuss here,

and I'm curious if our listeners
had some interesting situation

we didn't discuss yet.

It will be interesting.

Like I think many exotic things
can happen.

Let's talk about what to do in
emergency.

Right?

Yeah.

First thing, understand the reasons.

Or maybe no.

This is extremely interesting.

Michael: No, I don't think so.

Yeah.

I mean, bear in mind, if we're
actually in an emergency, our

database is no longer not only
not accepting updates and writes

and things, but it's also in a
lot of cases not accepting SELECTs,

which...

Nikolay: Which is ridiculous, right?

You shared with me before our call,
you shared Aiven doc, which

says if a managed Postgres service
has automation, if it understands

that we are approaching a full
disk space, it converts our database

to read-only state.

Just setting a parameter, which
actually any client can override.

So it's weak protection.

Default transaction read-only turned
on.

This parameter, default transaction
read-only.

I like the idea, actually, because
it's weak protection, but

it's quite reliable if you don't
have this set to off all the

time in your code.

Usually, there are low chances
you have it in your code.

So it means that it will protect
you.

And then you have alerts, you need
to have proper monitoring

and so on.

But this is like a prevention measure,
maybe.

Michael: Yeah.

Again.

Nikolay: Yeah, it's again prevention.

Yeah.

But if you're in an emergency, this
is like, I think we should write

some good how-to in general, how
to for any case.

Michael: I did check your how-tos
and I did.

I was surprised not to find one for
this actually.

But I think also one thing worth
mentioning, nobody's going to

be listening to a podcast and be
like half an hour in or whatever

and be like, you know, because
they're panicking.

Yeah.

But one thing to remember that does
seem to catch people out is,

I mean, hopefully everybody listening
here won't, this won't

happen to you, but they know that
WAL stands for Write-Ahead

Logging.

Assume that it's logs and therefore
can delete it.

And then because it's like a large
amount of what they think

are logs, they delete those to
try and recover.

And I think that the main advice
from me in terms of recovery,

like from every guide you'll read
on this, is don't do that.

Nikolay: Yeah, I will say the main,
the like classic approach

is understand reasons and fix them,
mitigate, right?

But you might have pressure of
time.

In this case, it's good if you
can quickly identify something

you can delete safely without big
consequences.

For example, regular Postgres logs.

If you have a lot of logs, gigabytes,
it's good to delete just

1 file or so, start database if
it's down, and then you do need

to do full analysis and mitigate
the real reason and so on.

Or maybe you can delete something
outside Postgres data directory.

Something.

You can check quickly if there
are different directories around

which have some gigabytes of data,
or at least megabytes, you

can quickly delete and let Postgres
behave normally for quite

some time, it buys you some room,
right?

For analysis and proper mitigation.

But mitigation is inevitable, You
need to understand what happened.

Is it a WAL or data or something
else?

And then you need to mitigate.

Michael: Yeah, I was reading some
guides on this.

There's a good 1 on the Crunchy
Data blog, and they made a really

good point that we don't all think
of doing when you're in that

kind of panic mode.

Get, you know, get things back
online is, is my primary objective

in cases like that.

But they recommend taking a file
system level copy of everything

in its current state.

Like take a snapshot of the current
directory as it is before

you start deleting things.

Nikolay: Yeah, it's additional
work in progress.

Michael: Yeah, because it takes
time and we're down.

Nikolay: In cloud actually, if
you don't see what you can delete

quickly, like in Cloud, it's managed,
for example, you cannot

go and run the UDF and so on.

But in this case, probably it's
easier just to add a few percent

of disk space quickly to buy some
room again, and then to investigate.

Obvious approach, right?

Michael: I also think because of
some reasons we're going to

discuss in a bit, this is just
so much less likely to happen

in cloud environments.

Like it just feels like this is
1 of those features.

There aren't that many.

Nikolay: This is their job, right?

Michael: Yeah, this is 1 of those
2 or 3 features that they really

do manage and can manage for you
automatically.

Nikolay: Notify properly and so
on, right?

Michael: Or just auto-scale.

Auto-scale, yeah.

You just don't hit this button.

Nikolay: I don't like auto-scaling
in this area because it's

auto-scaling of budgets.

Michael: Well, but it's also not
solving the root cause of the

problem.

If you're, if several of those
problems we talked about, it's

not your actual data growing.

It's not your like, It's not your
true tables and indexes.

It's write-ahead logging going
out of control because of an open

replication.

Some reason that it shouldn't be
that much data.

So auto scaling those is just not
solving the problem.

Exactly.

Yeah.

Nikolay: Like hiding dirt under
the rug, is it?

Michael: As long as you've been
told when it has scaled up, then

at least you can go and investigate.

Nikolay: Yeah, so I think we covered
quite well mitigation in

general.

Michael: Well, what do you think
about...

Yeah, I think increasing makes
sense.

I saw some warnings not to increase
in place necessarily.

Maybe you should look at setting
up a replica and failing over,

but I thought that would be slower.

Like I didn't, I wasn't sure myself.

Nikolay: I don't know.

Like I think if you can add space
or delete something safe, it's

good.

First step, then you have big step
of analysis, proper analysis

and mitigation of true reason,
main reason why you approached

it and not noticing, right?

So a big part of mitigation, if
you perform proper root cause

analysis, I'm quite sure will be,
we didn't notice this and it's

a problem itself, so our observability
is bad.

Michael: The process is not- Either
monitoring or probably alerting

is, in fact, yeah, we have to go
to prevention.

Nikolay: Yeah, prevention, I think,
philosophically, is quite

simple.

You just need, like, the better
way to prevent it is understand,

like, keep a good level of understanding
of how you spend your

bytes of disks, of gigabytes.

So you understand the layout, you
understand wall data, real

database data, is it like actual
data or a lot of bloat, and

then you understand logs and everything
else and you understand,

okay, we are good and we understand
how we spend our disk bytes.

And if this understanding is good,
everything will be fine and

you will notice problems early.

But it's hard to do if you have
hundreds or thousands of clusters,

right?

In this case, you do need some
rules to be alerted if bloat is

high, if pgwall grows without good
control, if logs are not properly

– like also, logs consume too much.

You just need to have good observability,
right?

To have mitigation.

And then you just, with understanding
of possible reasons, you

navigate in these reasons and find
mitigation.

But the key of prevention is understanding
and understanding

is observability, right?

Michael: Yeah, 2 other things.

I think like whilst I'm kind of
against auto scaling in principle,

I think if you've got the option,
going up in budget and having

to migrate later back down to a
smaller size is so much better

than being down.

I personally would have that on
myself if I had that option.

And I think also, oversizing.

Like, disk is so cheap compared
to everything else we're doing.

Unless you really are in the like
extreme...

Huh?

Nikolay: IM.

Disk is not cheap if you have a
lot of terabytes and many standby

nodes and oh, it can be not cheap.

Michael: But most of us aren't
in that case.

Like most of us aren't in, so yeah,
I agree that for like people

with serious, serious days.

Nikolay: Some people pay like millions
for disk space.

It's like, it can be headache.

That's why I mentioned PgTier is
a quite very good, potentially

good thing and Teering is good.

And like some design, understanding
how the structure of your

disk space consumption, you design
some rules and to float data

properly and keep everything under
control.

And that's, that's, this is good.

This is good.

But yeah,

Michael: I think...

Imagine, imagine advising some
startup and they had like some

10 gigabyte disk or something,
and their database was only a

few hundred megabytes.

The cost of being on a hundred
gigabyte disk instead is just

so tiny, like it's just such a
small, but yeah, I take your point.

Nikolay: But again, I think observability
can be always improved,

I think.

So like if, especially if we know
that this is visual, this is

data, this is logs, everything
else, and we have some proper

analysis and alerts for it.

Michael: I think that's the bit
that trips people up.

I think a lot of people actually
have quite decent observability,

like they can see what's going
on, but alerting people aren't

getting alerted early enough, or
not at all is the 1 I see quite

often.

No alerts at all for, you know,
disk is at 50% sends an email,

70% raises a, you know, 90% sets
off a page, you know, like these

kinds of like serious alarms.

Nikolay: Yeah, you know, I know
we need to go already out of

time, but let's mention some small
practical advice in the end.

If you cannot SSH to the box, for
example, it's RDS or other

managed Postgres service, and Postgres
is still alive, but you

wonder why the disk usage goes
up.

At SQL level, you can check what's
happening in directories using

pg_ls_dir, pg_ls_valdir.

There is a set of administrative
functions which allow you to

inspect the content of directories.

And this is very helpful when you
troubleshoot and understand

what's happening.

And maybe also it's helpful, can
be helpful in mitigation activities,

in observability.

For example, if you have a report
triggered out of disk, almost

soon usage exceeded 90%.

Then you can include some analysis
automated to have a snapshot

of what's happening in key directories
and understand what are

consumers of disk space.

Just some advice I thought we could
miss.

Let's not miss it.

That's it, I think, right?

Michael: I think so.

I hope none of you ever actually
have to face this and you can

all mitigate or prevent it from
happening.

And yeah, thanks so much, Nikolay.

Catch you next week.

More episodes

Chapters

Creators and Guests

What is Postgres FM?