A weekly podcast about all things PostgreSQL
Michael: Hello and welcome to Postgres
FM, a weekly show about
all things PostgreSQL.
I am Michael, founder of pgMustard.
This is my co-host, Nikolay, founder
of Postgres.AI.
Hello, Nikolay.
What are we talking about this
week?
Nikolay: Hi, Michael.
We are out of disk space.
What to do?
Michael: Panic.
Nikolay: This is what Postgres does, right?
Michael: Yeah, in some cases, right?
Nikolay: Yeah, well, I saw it a
lot.
I recently saw it again, actually.
And yeah, it's not fun.
Why panic?
Don't panic, right?
Michael: Keep calm and carry on,
right?
Nikolay: Yeah, it's Postgres, you
should, yeah.
Just use Postgres, you know.
Yeah, so as we discussed before
this call, before recording,
Let's just talk about possible
reasons, mitigation and avoidance,
right?
Yeah.
Prevention.
Prevention is better.
Avoidance.
Michael: Avoidance.
Denial.
It's like the stages of grief.
Nikolay: It's not my problem, actually.
Yeah, I'm a DBA.
It's an SRE problem, right?
They should just give us more space.
To start about problems, I mentioned
it before the call, but
let's repeat.
The only reason, The only true
reason of this is an insufficient
disk space allocation.
It's always so.
And it's a joke, but we will be
returning to this joke, I'm quite
sure.
Because sometimes you do need more
space, that's it.
Of course, I would deprioritize
this path and consider everything
else.
But sometimes we end up with this
reason and mitigation action,
like just add this space.
But let's consider this as a last
resort, right?
Michael: Well, yeah, I like your
structure.
So, causes, then recovery, and
then mitigation or prevention.
Mitigation is recovery.
Yeah, okay, great.
Prevention being the last 1.
So, causes.
What are the most common reasons
you see this happening to people
in the wild?
Nikolay: Good question.
There are many reasons.
I don't have statistics.
Michael: I can tell you the 1 I
see most blog posts about, because
I don't see this that often, but
the 1 that comes up time and
time again in blog posts is an
open replication slot.
Nikolay: Oh, it's too narrow.
Let's just say a lot of WAL written
and it remains, a lot of
WAL files remain in pg_wal directory.
Or pg_xlog if we talk about ancient
versions of Postgres.
Michael: Yes, but that's the root
cause of why it's the most
common 1 I've seen.
Nikolay: I think it's a super popular
reason, but I would structure
this like if we need to provide
a comprehensive list of reasons,
I would put it to the second place
and for the first place I
would put...
And not only because of slots,
slots is just 1 subsection here.
Whole section is just...
PG_WAL directory takes a lot of
disk space.
This is the big class of reasons,
right?
But the very first class of reasons,
I would say, anything related
to the data directory itself.
Like a lot of disk space consumed
by tables and indexes, right?
It's also like a whole class, so
we have specific reasons inside
it.
Michael: Yeah, so we actually have
more data than the size of
Nikolay: the dictionary.
Than we expected.
We tried to delete, but the space
is not free, and so on.
Michael: Lots of bloat, for example,
that we talked about.
Nikolay: Data-related, all related,
and the third class, probably,
or category is probably anything
else.
Anything else is tricky and interesting,
we will talk about it
soon, right?
So how about this
Michael: classification?
Yeah, just to give people an idea
of anything else, like you
including logs in that, like things
that can grow quickly that
aren't data, like main data directory
stuff or WAL.
Nikolay: Base sub-directory is
where data is stored.
So if we think about PGDATA directory
structure, we can think
about base, so tables and indexes,
right?
Materialized views as well.
And then PG_WAL sub-directory,
and it is huge, for example.
And everything else.
Everything else is interesting
as well.
Maybe also outside of PGDATA, but
let's not reveal secrets before
it's time.
So what about tables and indexes?
What do you think?
Like, possible specific reasons?
I deleted 90% of my table, but
it didn't help.
This consumption is still 99% what
to do.
Time to panic, right?
So like obviously delete doesn't
delete data, right?
It just marks it deleted, but it's
a two-stage process.
Michael: Well, and this is a tricky
1 because a lot of the times
we've discussed how to deal with
issues like this, we actually
require quite a lot of disk space
in order to repack or to actually
mitigate that problem.
The easiest way, the way to do
that in an online fashion, requires
double the size of the relation
or double the table size normally.
So it is a tricky 1 in this case.
Nikolay: Yeah.
Exactly.
Just a few days ago, I was discussing
this exactly problem with
1 of our customers.
And they mentioned that they have
a huge table which consists
of 90% of their database.
And to fight with bloat, they know
there is bloat.
And to fight with bloat, especially
in heap, it's obviously like
you need to use pgRepack, and to
use it, you need the same size.
Like, temporarily, you need to
have two large tables of the same
size, because it rebuilds the whole
table in the background and
then switches to it.
So yeah, this is a problem, and
if you don't have enough disk
space, oops.
But yeah, so I would say this is
edge case.
Normally we, there's no single
large table, which is like 90%
of whole database, right?
So normally it's not a big problem.
And if we keep free disk space
30, 40%, which should be so like
at least 25%, maybe, we have this
space and we know that it's
just temporary.
We like just when we're repacking
the heap, the table, right?
Not a problem usually.
But if you're in this situation,
well, not good.
Maybe you should do something else.
There are alternatives to PgRepack.
I think PgSqueeze alternative from
Cybertech also requires, also
has this issue.
It requires additional disk space
because it rebuilds the table.
But older solutions, like in this
particular edge case, I would
think I would return to solutions
which exist for quite some
time, several implementations of
not let's rebuild table, but
another idea.
It's called PG compact table.
Again, there are 3 probably attempts
from different persons to
implement this idea and all of
them I think are in Perl.
But the idea is let's issue some
updates which don't change data.
When you issue an update even saying
update blah blah set id
equals id where id equals some
number, you know a new tuple is
created.
Always, right?
So this tool is issuing these updates
in a smart way.
It checks which tuples are in the
end of table and it knows there
is bloat, so there is empty space
in first pages, for example.
So if we update tuples which sit
at the end of table, they will
move to the beginning of the table.
And once there are no tuples left
in the final page, final block,
it will be truncated.
By default, this is what vacuum
does, unless it's turned off.
And then we do the same with next,
from the end, the next page,
and so on and so on.
Like basically, this is similar
to like, I remember in Windows,
like defragmentation, right?
So it's moving data to different
places.
Michael: Remember that defrag screen
on Windows?
Nikolay: Yeah, yeah.
People love that.
People visualize it.
Yeah, yeah, yeah.
Maybe like red and blue, right?
So, yeah.
And this is much slower process
than using pgrepack for table.
But...
Michael: Also, in our case, if
we notice we've got 99% disk full,
maybe this kind of thing helps.
But if we're actually out of disk
space, even this isn't
Nikolay: an option.
Yeah, yeah, yeah.
I'm talking about mitigation here,
maybe too early, but of course,
if we already achieved 100%, we
cannot move, it's not what I
would do.
But you just raised a very good
point about extra disk space
needed, and I just remember there
are alternative solutions and
I think they don't require any
extensions so they could be used
anywhere.
True, true.
Like RDS, Cloud SQL, anything.
It's just simple updates.
Super slow, but not requiring extra
resources, which is great.
Also probably you will need to
make sure the tool is working
well with current version of Postgres
because they are quite
old, and again, there are 3 of
them.
I think Depesz also tried to implement
one of them.
Michael: That makes sense that
it's in Perl then.
Nikolay: This is a small zoo of
ancient tools.
And I like the idea, actually.
But it's not an emergency tool.
It's right before an emergency tool.
Trying to avoid it.
So yeah, this, right?
But as for indexes, probably yes,
when we rebuild an index concurrently,
like Postgres builds another index.
But I think this should not be
super noticeable because no index
can...
Yes, it requires extra disk space,
but we should have it if we
do it, right?
So anyway, if we delete, we should
think about, okay, those tuples
and bloat.
Keep it in mind.
Try to avoid massive deletes or
design them properly with MVCC
and vacuum behavior in mind.
Michael: We did a whole episode
on that, I think.
Nikolay: Right, exactly.
So yeah, I saw crazy cases when
some tables and indexes were
bloated like 99%.
Michael: Yeah.
And you saw these cause out of
disk issues.
Nikolay: Well, they contributed
a lot.
And if we keep it as is, we will
soon be out of disk space.
So it's definitely worth keeping
an eye on the situation in this
area, bloat and vacuum behavior.
And yeah, if you don't do it and
do it very infrequently, then
you once in 5 years you take care
of it and then you realize
out of 10 terabytes of disk space
you had, now it's below 1 terabyte.
And what do you do with the other 9
terabytes?
You keep paying the cloud provider,
right?
Because it's easy to add disk space,
but it's not easy to reduce
disk space because they don't have
such a function and it's possible
only if you create a new standby
cluster with a smaller disk and
then just switch to it, switch
over, right?
Michael: Yeah, But yeah, it's quite
funny seeing that message,
you know, this is a one way ticket.
Nikolay: One way ticket, yeah.
Michael: So I think those make
sense, right?
Normal data exceeding the size
of the disk, whether that's actual
data or whether that's bloat, makes
perfect sense.
Do you want to talk any more about-
Nikolay: Of course.
Actually, I still believe since
VLDB in Los Angeles, I think
in 2018, I still believe in the
keynote there.
I like this keynote about data
deletion is a huge, big problem
we engineers need to solve and
find better solutions.
So sometimes it's not about bloat,
it's about a lot of garbage,
duplicated data also.
And I saw many times when we start
paying attention to bloat
and pushing customers to fight
with it, suddenly they say, you
know what?
I had it like maybe a month ago.
You know what?
We don't need this table.
Let's just not spend time and just
drop it.
Drop it.
Drop this table.
Or truncate it.
We don't need the data.
And it's like, wow, some terabytes
are free.
So...
Michael: Yeah.
I see this.
I see this with like analytics
data, for example, people tracking
everything.
Like they don't know, they never
look at their analytics and
never look at who's using which
feature, but they track everything
because we might need it someday.
Or, you know, like that, there's
all these data use cases and,
And even the solution to this,
like, in terms of...
I've heard the phrase retention
policy.
I know it's not quite the same
topic, but talking about how long
we should retain data for, even
that we don't even mention deletion.
It's not deletion policy, It's
retention policy, which is quite
funny.
Nikolay: Also, maybe 1 of the last
things in this section I would
mention is it's becoming more popular
instead of deleting old
data to have some better approach
tiering, tiers of data storage.
And I know there is from Tembo,
there is extension called PG
tier.
First of all, timescale has this,
right?
But in cloud only, I think it's
not open source.
If I'm not right, if I'm not wrong.
And so idea is, let's consider
some data archived and it still
looks like it's present in Postgres,
but it's not stored on an
expensive disk which also has capacity
limits.
We offload it to object storage,
like S3 or AWS.
And this is a great idea, actually,
but I think some latency
issues will appear, and also maybe
some errors sometimes, inconsistencies
or so.
Like, it's interesting.
But it's a great idea.
And again, like I said, Tembo has
a PG tier extension, which
I think I hope to test someday
when I, or my team has an opportunity
and time for it.
It's super interesting to check
how it works.
And I think it's a great idea.
Like, if you have some huge tables,
maybe already partitioned,
you can go either with sharding,
or if some historical data is
not needed all the time, you can
just offload it to object storage,
which is like virtually infinite.
And then, okay, users' experience
some worse latency when reading
this old data.
Imagine e-commerce,
Michael: for example.
Just to quickly, I feel like this
is an old, kind of an old-fashioned
solution to this was table spaces.
Nikolay: Yeah, but table spaces
in cloud reality became not popular
at all.
Last time I used table spaces in
a serious project was more than
10 years ago, I think.
Michael: Yeah, but that's what
I mean.
We had this before, it's just it
had a different name.
Nikolay: Maybe, yeah.
Table spaces, yeah, you can have
cheaper disk attached to your
machine.
But I still don't like this particular
approach because I like
offloading to object storage more
than that.
Because imagine you have primary
and multiple standby nodes.
If you use table spaces and cheaper
disks, you are forced to
use the same structure of disks
on each node and it becomes still
expensive and slower, for example,
right?
While object storage, it's worse
in terms of uptime compared
to EBS volumes on AWS, for example.
But it's much better in terms of
reliability or vice versa.
Michael: Durability or other things.
Nikolay: If you check, yeah, availability
and durability, if
you check SLAs, S3s, I forget.
So if you check characteristics,
you think, okay, this is what
I actually would like to have for
archived data.
Maybe latency will be slower, worse,
right?
But if you think about e-commerce
and order history, for example,
if you're off a large e-commerce
website, if you have the ability
to offload all the activities and
users touch them very rarely,
but it's still good to have the
ability to see the old order
history.
But it's not needed often, so we
can keep it outside of Postgres
and evict from caches and so on.
And don't pay for our main cluster,
we just pay for S3.
There we also can have tiering,
right?
Usually.
And even automated.
Usually providers have these features,
like if it's older than
like 1 month, it goes to some colder
and cheaper space.
So I think this direction will
receive some more popularity and
will be better developed and so
on.
Turing for storage.
Michael: Once again, we've been
sucked into kind of like prevention.
Nikolay: I think yes, it's hard
to talk about reasons and then
yeah, like this is a big flaw of
the structure I proposed.
Michael: Well, no, I like it still,
but it feels like we haven't
really touched on...
Nikolay: The most popular, let's
talk about.
Yeah.
WAL, right?
We have some...
So, PGWAL is huge.
We identified it, which is actually
already some good skill.
Not everyone can do that, but if
you manage to identify that
your PGWAL data is huge, there
are a few certain reasons.
And there is a good article from
CyberTech about this.
Why Postgres doesn't delete all
files?
It's already time to exclude them.
Michael: Ah, yeah, the various
reasons.
Nikolay: This is exactly when we
need to apply this article as
the list of possible reasons and
just exclude 1 of them until
we find our case, our real reason.
So 1 of them you mentioned, some
replication slot, logical or
physical, not progressing and accumulating
a lot of changes.
It's not like accumulating, it's
not like slot.
Postgres doesn't write to slots.
It writes to PGWAL, a lot of WALs,
right?
But slot has position, and if it's
inactive or position is frozen,
consumers don't consume and don't
shift this position.
It means Postgres must keep those
WALs in the PGWAL directory
until it finishes.
And since recently, a few years
ago, 1 of, I think, Postgres, maybe
13 or 14, received a setting to
limit this, to have threshold
when we give up and say, let's
better to kill our slot, to destroy
our slot, but let's stop this situation
and WALs should be deleted
already.
And I remember this big fear in
a good engineer I worked with,
he was not Postgres, he was more
SRE, but with a lot of Postgres
experience.
And when, it was long ago, when
initially replication slots for
physical replication, we were introduced,
we had basically a
small fight.
Like I said, let's use it, a great
feature.
He said, no, no, no, I'm not going
to use it.
He said, this is super dangerous.
And he understood that the danger
is if some replica is somehow
stuck, the slot is not progressing,
the primary is out of disk
space, and this is the last thing
he wanted.
After lost backups, of course.
But over time, we started using
slots, but then this setting,
I think it's a great setting, you
can understand your disk layout
and how much free disk space you
can afford.
If you approach 90%, it's time
to kill slots.
So you can do some math and understand
that the maximum number
of gigabytes you can allow for
a lag is this.
Of course, over time, the database
grows and still this...
It still can be a problem, right?
Because if data grows, Tables and
indexes grow, and probably
your setting will not save you
from emergency, right?
Maybe.
But this is definitely 1 of the
reasons.
What else?
What reasons?
Because of failing archive command,
right?
Because for Postgres it's important
if the archive command is
configured, for Postgres it's important
to archive.
And if it's failing, any reason
can be.
Postgres cannot archive, so if
it cannot archive, it cannot remove
these walls.
And that's a problem.
So you should monitor archiving
command lag separately.
And if it's growing, it's very
bad for backups already.
But it's also bad because it can
hit you in terms of disk space.
So yeah, these processes, replication
and archiving of walls
are 2 processes that if something
is wrong with them, Postgres
cannot remove walls.
Another reason actually...
Michael: And by the way, this can
grow quickly.
I think people don't always realize
how quickly this can grow.
Yeah.
Some database.
Nikolay: Yeah, it can be small,
but very active database and
you can have terabytes of wall
generated per days.
It's actually good to know how
much you generate per day or per
hour per second.
Michael: So I think even with like
a low activity, as long as
something's happening every time
there's a checkpoint, I've seen
people with toy databases like,
you know, like free tier RDS,
be surprised that it's generating
like a couple of gigabytes
a day, like 64 megabytes per 5
minutes or so.
Adds up quite quickly with small...
Nikolay: Funny reason, I also can
mention, you have not a huge
database, but also quite active,
and you think, oh, it's time
for checkpoint tuning.
We had an episode about it, checkpoint
tuning.
Let's raise max wall size and checkpoint
timeout.
You raise it, but you didn't do
proper math in terms of how much
disk space you have, and end up
having too big, normally too
big, pgwall.
Because distance between checkpoints
increases, and Postgres
needs to keep more walls.
And this can be just a mistake
of configuration.
So yeah, after this, I think we
can move on and let's talk about
other reasons.
Michael: Yeah, what else do you
see?
Nikolay: I would put on the first
place in this category, log
directory, especially if it's inside
PGDATA, inside this main
Postgres directory.
Not especially, like, no.
Especially if log directory where
Postgres writes logs, if it's
in the same drive as a data directory.
For example, inside PGDATA.
Or just next to it but on the same
drive.
If logs are suddenly, like we have
some performance degradation
and log_min_duration_statement
is configured or auto_explain.
Or, for example, we decided, oh,
we need more audit and PG Audit
is a great tool.
Right, let's bring it.
It starts writing a lot to logs.
And if you didn't separate your
data from logs, you should separate.
I'm not talking about PgWAL, I'm
not a big fan of having a separate
disk for PgWAL, actually.
It was a thing in the past, but
recently I don't see benefits
from doing this often.
Benchmarks don't prove it and so
on.
So, it depends, of course.
It depends if it's your own data
center, maybe it's worth doing
this.
But if you don't offload logs,
regular logs, and keep it on the
same disk, then you bring PG Audit,
they suddenly start writing gigabytes
per hour, or maybe some problem
with rotation of logs.
Michael: Yeah.
Nikolay: Like retention or retention.
So they are not deleted properly,
for example.
And then this can hit you badly
and you have panic actually for
your database, and your selects
are not working.
But if you're offloaded to a different
disk, and at least with
logging collector enabled, I know
that database actually will be
working fully.
That's great.
Actually, recently, maybe a few
months ago, I had this experience.
I got used to it, Postgres is fully
down if we are out of disk
space, and I expected the same
behavior when our log drive is
full.
If we are flooded, if it's full,
I expect big problems.
Postgres didn't notice almost.
So okay, we cannot log, but we
keep working.
That's great, actually.
So this is super big benefit of
having different drive for regular
Postgres logs.
And even if it's slow, it can affect
performance, right?
But if it's 100% full, not that
bad.
Michael: Yeah, we've talked several
times about excessive logging
having an impact on performance.
Nikolay: Observer effect.
Michael: Yeah, and I've definitely
done benchmarks, you know,
this is the age old log min duration
statement 0.
Like what, why we recommend not
set or I recommend never setting
it to 0.
Because this, like you can generate
gigabytes in like a 30 minute
benchmark, like with pgbench or
something.
So it's, it's surprised me how
much it can grow.
Nikolay: Yeah, it's actually a
good idea maybe to just, you know,
like maybe to ask our bot to benchmark
with regular PgBench and
then to have sampling for logging
of queries in 1 way or another
and just to see how observer effect
grows and kills your performance.
Only people with small databases
can recommend log min duration
statement 0.
Michael: Or quiet databases, yeah.
Nikolay: Quiet, yeah.
Not active.
Small databases and small workloads.
Michael: That's it.
So yeah.
Or I guess super low, like turn
it to that only very, very briefly,
like a minute or 2 or something.
Nikolay: Yeah.
Yeah.
So for, for serious workloads,
you cannot do that.
I already shared my experience
putting big data, critical databases,
productions down and yeah, don't
repeat my mistakes.
Michael: As well as logging, what
else did you have in this category
of things?
Nikolay: Oh, that's a good question.
Well, let's not maybe spend too
much time discussing some unknowns
or external things like we have,
I don't know, something else
installed on the same machine and
using the same disks and suddenly
we are out of disk space.
Also it can happen sometimes.
Or maybe, for example, you do some
troubleshooting, you have
self-managed Postgres, you can
SSH to the box, you started sampling
every second some good logs, but
do it for example to home directory
and then it's full or something
like this.
So you always need to be careful
with that.
But interesting case when, for
example, you try to upgrade, you
have Patroni, or you some, I don't
remember exact details, but
in some cases Patroni decides to
retry provisioning of a standby
node, for example, and it fetches
PgData in the way you configured
with PgBaseBackup or from archives,
but it renames the old directory
to make it like backup, not like
local backup, right?
It just renames it and you end
up having 2 directories suddenly,
right?
Data directories, full-fledged,
like, wow, it can quickly be
a problem in terms of disk space.
What
Michael: else?
Does that mean like if you had,
let's say your data storage was
about 40% of your disk size, So
you thought you had loads of
headroom.
Suddenly you're at 80% plus whatever
else is on there.
Yeah.
Nikolay: Yeah, okay.
Wow.
Maybe large temporary files, but
it's quite exotic.
I like, it can be, they can be
huge, right?
But it's like, it's exotic.
Michael: I saw some Stack Overflow
posts mention that, you know,
people that were out of disk got
the error, but when they checked,
they did have free disk space.
And one of the suggested answers
mentioned check for temp file,
like your queries could be doing
a lot of temp file stuff, but
yeah, I've never seen that myself.
What about, well, backups is an
interesting one.
Like backups, if you, cause some
people recommend keeping some,
obviously keeping all your backups
on your, on the same disk
as your database is suboptimal
for recovery purposes, but keeping
some on it makes sense, especially
for huge databases, right?
Nikolay: I don't think so.
I would not recommend doing this.
Michael: But what about for recovery,
like no network?
Nikolay: Regular backups should
be separate, of course.
But if you do some operations manually,
for example, you decided
to back up some tables and just
dump them to some compressed
form and keep on the same drive,
well, it might happen.
But there are many mistakes you
can do manually.
Or you just create a table select,
for example.
You basically just clone the table
and then forgot it.
Well, many such mistakes can be
done.
I think there are many exotic situations
we don't discuss here,
and I'm curious if our listeners
had some interesting situation
we didn't discuss yet.
It will be interesting.
Like I think many exotic things
can happen.
Let's talk about what to do in
emergency.
Right?
Yeah.
First thing, understand the reasons.
Or maybe no.
This is extremely interesting.
Michael: No, I don't think so.
Yeah.
I mean, bear in mind, if we're
actually in an emergency, our
database is no longer not only
not accepting updates and writes
and things, but it's also in a
lot of cases not accepting SELECTs,
which...
Nikolay: Which is ridiculous, right?
You shared with me before our call,
you shared Aiven doc, which
says if a managed Postgres service
has automation, if it understands
that we are approaching a full
disk space, it converts our database
to read-only state.
Just setting a parameter, which
actually any client can override.
So it's weak protection.
Default transaction read-only turned
on.
This parameter, default transaction
read-only.
I like the idea, actually, because
it's weak protection, but
it's quite reliable if you don't
have this set to off all the
time in your code.
Usually, there are low chances
you have it in your code.
So it means that it will protect
you.
And then you have alerts, you need
to have proper monitoring
and so on.
But this is like a prevention measure,
maybe.
Michael: Yeah.
Again.
Nikolay: Yeah, it's again prevention.
Yeah.
But if you're in an emergency, this
is like, I think we should write
some good how-to in general, how
to for any case.
Michael: I did check your how-tos
and I did.
I was surprised not to find one for
this actually.
But I think also one thing worth
mentioning, nobody's going to
be listening to a podcast and be
like half an hour in or whatever
and be like, you know, because
they're panicking.
Yeah.
Yeah.
But one thing to remember that does
seem to catch people out is,
I mean, hopefully everybody listening
here won't, this won't
happen to you, but they know that
WAL stands for Write-Ahead
Logging.
Assume that it's logs and therefore
can delete it.
And then because it's like a large
amount of what they think
are logs, they delete those to
try and recover.
And I think that the main advice
from me in terms of recovery,
like from every guide you'll read
on this, is don't do that.
Nikolay: Yeah, I will say the main,
the like classic approach
is understand reasons and fix them,
mitigate, right?
But you might have pressure of
time.
In this case, it's good if you
can quickly identify something
you can delete safely without big
consequences.
For example, regular Postgres logs.
If you have a lot of logs, gigabytes,
it's good to delete just
1 file or so, start database if
it's down, and then you do need
to do full analysis and mitigate
the real reason and so on.
Or maybe you can delete something
outside Postgres data directory.
Something.
You can check quickly if there
are different directories around
which have some gigabytes of data,
or at least megabytes, you
can quickly delete and let Postgres
behave normally for quite
some time, it buys you some room,
right?
For analysis and proper mitigation.
But mitigation is inevitable, You
need to understand what happened.
Is it a WAL or data or something
else?
And then you need to mitigate.
Michael: Yeah, I was reading some
guides on this.
There's a good 1 on the Crunchy
Data blog, and they made a really
good point that we don't all think
of doing when you're in that
kind of panic mode.
Get, you know, get things back
online is, is my primary objective
in cases like that.
But they recommend taking a file
system level copy of everything
in its current state.
Like take a snapshot of the current
directory as it is before
you start deleting things.
Nikolay: Yeah, it's additional
work in progress.
Michael: Yeah, because it takes
time and we're down.
Nikolay: In cloud actually, if
you don't see what you can delete
quickly, like in Cloud, it's managed,
for example, you cannot
go and run the UDF and so on.
But in this case, probably it's
easier just to add a few percent
of disk space quickly to buy some
room again, and then to investigate.
Obvious approach, right?
Michael: I also think because of
some reasons we're going to
discuss in a bit, this is just
so much less likely to happen
in cloud environments.
Like it just feels like this is
1 of those features.
There aren't that many.
Nikolay: This is their job, right?
Michael: Yeah, this is 1 of those
2 or 3 features that they really
do manage and can manage for you
automatically.
Nikolay: Notify properly and so
on, right?
Michael: Or just auto-scale.
Auto-scale, yeah.
You just don't hit this button.
Nikolay: I don't like auto-scaling
in this area because it's
auto-scaling of budgets.
Michael: Well, but it's also not
solving the root cause of the
problem.
If you're, if several of those
problems we talked about, it's
not your actual data growing.
It's not your like, It's not your
true tables and indexes.
It's write-ahead logging going
out of control because of an open
replication.
Some reason that it shouldn't be
that much data.
So auto scaling those is just not
solving the problem.
Exactly.
Yeah.
Nikolay: Like hiding dirt under
the rug, is it?
Michael: As long as you've been
told when it has scaled up, then
at least you can go and investigate.
Nikolay: Yeah, so I think we covered
quite well mitigation in
general.
Michael: Well, what do you think
about...
Yeah, I think increasing makes
sense.
I saw some warnings not to increase
in place necessarily.
Maybe you should look at setting
up a replica and failing over,
but I thought that would be slower.
Like I didn't, I wasn't sure myself.
Nikolay: I don't know.
Like I think if you can add space
or delete something safe, it's
good.
First step, then you have big step
of analysis, proper analysis
and mitigation of true reason,
main reason why you approached
it and not noticing, right?
So a big part of mitigation, if
you perform proper root cause
analysis, I'm quite sure will be,
we didn't notice this and it's
a problem itself, so our observability
is bad.
Michael: The process is not- Either
monitoring or probably alerting
is, in fact, yeah, we have to go
to prevention.
Nikolay: Yeah, prevention, I think,
philosophically, is quite
simple.
You just need, like, the better
way to prevent it is understand,
like, keep a good level of understanding
of how you spend your
bytes of disks, of gigabytes.
So you understand the layout, you
understand wall data, real
database data, is it like actual
data or a lot of bloat, and
then you understand logs and everything
else and you understand,
okay, we are good and we understand
how we spend our disk bytes.
And if this understanding is good,
everything will be fine and
you will notice problems early.
But it's hard to do if you have
hundreds or thousands of clusters,
right?
In this case, you do need some
rules to be alerted if bloat is
high, if pgwall grows without good
control, if logs are not properly
– like also, logs consume too much.
You just need to have good observability,
right?
To have mitigation.
And then you just, with understanding
of possible reasons, you
navigate in these reasons and find
mitigation.
But the key of prevention is understanding
and understanding
is observability, right?
Michael: Yeah, 2 other things.
I think like whilst I'm kind of
against auto scaling in principle,
I think if you've got the option,
going up in budget and having
to migrate later back down to a
smaller size is so much better
than being down.
I personally would have that on
myself if I had that option.
And I think also, oversizing.
Like, disk is so cheap compared
to everything else we're doing.
Unless you really are in the like
extreme...
Huh?
Nikolay: IM.
Disk is not cheap if you have a
lot of terabytes and many standby
nodes and oh, it can be not cheap.
Michael: But most of us aren't
in that case.
Like most of us aren't in, so yeah,
I agree that for like people
with serious, serious days.
Nikolay: Some people pay like millions
for disk space.
It's like, it can be headache.
That's why I mentioned PgTier is
a quite very good, potentially
good thing and Teering is good.
And like some design, understanding
how the structure of your
disk space consumption, you design
some rules and to float data
properly and keep everything under
control.
And that's, that's, this is good.
This is good.
But yeah,
Michael: I think...
Imagine, imagine advising some
startup and they had like some
10 gigabyte disk or something,
and their database was only a
few hundred megabytes.
The cost of being on a hundred
gigabyte disk instead is just
so tiny, like it's just such a
small, but yeah, I take your point.
Nikolay: But again, I think observability
can be always improved,
I think.
So like if, especially if we know
that this is visual, this is
data, this is logs, everything
else, and we have some proper
analysis and alerts for it.
Michael: I think that's the bit
that trips people up.
I think a lot of people actually
have quite decent observability,
like they can see what's going
on, but alerting people aren't
getting alerted early enough, or
not at all is the 1 I see quite
often.
No alerts at all for, you know,
disk is at 50% sends an email,
70% raises a, you know, 90% sets
off a page, you know, like these
kinds of like serious alarms.
Nikolay: Yeah, you know, I know
we need to go already out of
time, but let's mention some small
practical advice in the end.
If you cannot SSH to the box, for
example, it's RDS or other
managed Postgres service, and Postgres
is still alive, but you
wonder why the disk usage goes
up.
At SQL level, you can check what's
happening in directories using
pg_ls_dir, pg_ls_valdir.
There is a set of administrative
functions which allow you to
inspect the content of directories.
And this is very helpful when you
troubleshoot and understand
what's happening.
And maybe also it's helpful, can
be helpful in mitigation activities,
in observability.
For example, if you have a report
triggered out of disk, almost
soon usage exceeded 90%.
Then you can include some analysis
automated to have a snapshot
of what's happening in key directories
and understand what are
consumers of disk space.
Just some advice I thought we could
miss.
Let's not miss it.
That's it, I think, right?
Michael: I think so.
I hope none of you ever actually
have to face this and you can
all mitigate or prevent it from
happening.
And yeah, thanks so much, Nikolay.
Catch you next week.