Postgres FM

Nik and Michael discuss the various changes to EXPLAIN that arrived in Postgres 18.

Here are some links to things they mentioned:

EXPLAIN (official docs) https://www.postgresql.org/docs/current/sql-explain.html
Using EXPLAIN (official docs) https://www.postgresql.org/docs/current/using-explain.html
EXPLAIN glossary (pgMustard site) https://www.pgmustard.com/docs/explain
Postgres 18 release notes https://www.postgresql.org/docs/release/18.0/
Enable BUFFERS with EXPLAIN ANALYZE by default (commit) https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c2a4078eb
Our (first) BUFFERS by default episode https://postgres.fm/episodes/buffers-by-default
Show index search count in EXPLAIN ANALYZE (commit) https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=0fbceae841cb5a31b13d3f284ac8fdd19822eceb
Our episode on Skip scan with Peter Geoghegan https://postgres.fm/episodes/skip-scan
What do the new Index Searches lines in EXPLAIN mean? https://www.pgmustard.com/blog/what-do-index-searches-in-explain-mean
pg_stat_plans presentation by Lukas Fittl https://www.youtube.com/watch?v=26coQV3f-wk
Improve EXPLAIN's display of window functions (commit) https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=8b1b34254
Show Parallel Bitmap Heap Scan worker stats in EXPLAIN ANALYZE (commit) https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5a1e6df3b
Add information about WAL buffers being full (commit) https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=320545bfc

~~~

What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!

~~~

Postgres FM is produced by:

Michael Christofides, founder of pgMustard
Nikolay Samokhvalov, founder of Postgres.ai

With credit to:

Jessie Draws for the elephant artwork

Creators and Guests

Host

Michael Christofides

Founder of pgMustard

Host

Nikolay Samokhvalov

Founder of Postgres AI

What is Postgres FM?

A weekly podcast about all things PostgreSQL

Nikolay: Hello, hello, this is
Postgres FM.

My name is Nik, Postgres.AI, and
as usual, my co-host is Michael,

pgMustard.

Hi, Michael.

Michael: Hello, Nik.

How's it going?

Nikolay: Good, good.

So, you proposed, I proposed this
topic, honestly, right?

I saw you worked on updating your
glossary on the pgMustard

website glossary for EXPLAIN and
I thought maybe it's a good

idea because I have some feeling
I'm missing something in understanding

what's new in Postgres 18.

So I would like to learn about
details and if you have fresh

look at this, why not, right?

Let's talk about that.

Michael: Yeah, it's a good idea.

It is what I've been working on
the last couple of weeks which

is updating our EXPLAIN glossary.

So the documentation on EXPLAIN
is good in the Postgres docs

but it's not very extensive and
when I was first learning about

all this you know 7 8 years ago
it was really hard to learn what

various of the phrases, like various
node types exactly were,

what some of the fields were, and
we basically started looking

it up for each 1 and then started
making a note of it and that

started as a blog post and ended
up as about 160 pages of this

glossary.

Nikolay: How many?

I think

Michael: it might be a bit more
now actually it's definitely

over 150.

Nikolay: Okay.

Michael: So yeah it's quite extensive
and As a result actually

now have to update what I don't
have to update each year But

try to update each year when the
major versions come out so that

it's up to date.

Nikolay: That's great Yeah, so
it also highlights how complex

explained plans are right?

Michael: Yeah, yeah exactly So
yeah, we've got a page for each

parameter, a page for each operation
type and a page for each

field type.

So it is, they're not all operation
types thankfully, but yeah

it is a little bit complex.

Nikolay: I see, yeah.

So what's new in 18?

Michael: Right yeah well the first
1 I put on the list is 1 we've

spoken about many times which is
BUFFERS being on by default

with ANALYZE.

Long-term listeners will be very
familiar with this topic but

until this version we had to specify
EXPLAIN (ANALYZE, BUFFERS)

if we wanted to get information
about the data read written whether

it came from shared buffer cache or not so yeah that was the

change I was most happy about making.

I think I posted about that.

The only thought that I might not have mentioned before is that

this isn't true for auto_explain.

So in auto_explain, if you want buffers, I don't mind it.

I think it's not as bad.

So in auto_explain, if you want buffers, you have to specify.

Nikolay: Yeah, so for regular EXPLAIN ANALYZE, it means that

we can expect, since BUFFERS are by default, we can expect to

rely more on those numbers are present in analysis.

So it should shift methodologies a little bit towards direction

we discussed over more than 2 years right?

Michael: I think so I think hopefully people will be curious

why these numbers are showing up if they're used to just running

EXPLAIN ANALYZE or if they're reading a guide or a blog post

that only suggests running EXPLAIN ANALYZE if they're on version

18 or above which is not many people at the moment but obviously

will grow over time then hopefully people will be curious what

those numbers are and what they mean and not only will it help

them but if they if they can't solve their problem and they have

to ask for help on the mailing list or to a consultancy or to

a tool or Yeah, wherever the people that are more familiar with

those numbers or the tools that use those numbers to give tips

can get them automatically.

So get them on the first try without having to ask the person

to run it again and maybe having to guess at what the problem

is.

So yeah, I think it's a big win for people who are trying to

solve their own problems, but also people that are asking others

for help.

Nikolay: Right.

That's great.

Yeah.

Those who listen to our podcast for quite some time know how

we appreciate buffers.

So such big relief it's now by default.

We had several episodes about that.

Michael: Yeah, and only 4 or 5 years until we can just stop telling

people to run it with buffers at all.

Nikolay: Yeah, yeah, yeah.

So like TLDR version is like just use buffers because in most

cases query is slow because of I/O and buffers brings you I/O focused

analysis.

Yeah.

Without it, it's like you don't see the important piece.

Yeah, great.

And I think you, do you plan any, like, no, I think, I question,

I have a question to you.

Do you plan to focus on buffers even more in pgMustard to do something,

some changes with Postgres 18 release or not?

Michael: Not with buffers, but with all the others, yes.

Buffers, because we have a string in pgMustard where you can copy

from, we automatically suggest people gather buffers, and most

do.

Maybe like, it's not quite 95% but it's close.

So we already focused quite a lot on buffers and I don't think

this is gonna massively change that.

The only change I'm looking forward to making is dropping it

from that string.

In about 4 or 5 years' time, once Postgres 17 is out of support,

I think we'll drop buffers from the string because we'll be fairly

confident that if they've got, well, if they've got ANALYZE

they'll get buffers.

So that's the 1 change I'm looking forward to making.

Nikolay: Yeah, great, great, great, yeah, yeah.

Okay, that's, yeah, pretty straightforward, let's move on, what

else?

Yeah.

This I knew very well.

Michael: Yeah, yeah.

Nikolay: Like, you know I knew, yeah.

Michael: There's 2 that I'm not sure which is the most important

or most exciting.

Let's go most important first.

I think actual rows being now rounded to, or being reported,

sorry, to 2 decimal places is really really big so until this

version so in particular 17 the number of actual rows is a is

an average number per loop and that's mostly fine except when

you have lots of loops and specifically like many many many loops

so for example a nested loop that has 10,000 loops because if

it's being rounded especially at low numbers that number can

be quite far off, especially between 0 and 1 so if that 10,000

loops is returning fewer than 5,000 rows in total then it would

be reported as 0 actual rows And that's a big difference from

5000.

So yeah, that for me is a huge change in terms of seeing the

details in some quite common performance issues.

So it's quite common that a performance issue would show up where

you're getting lots of loops either because looping was a bad

idea in the first place or because there isn't like the best

index available so that 1 I think will really help in terms of

giving a more sensible number once you multiply actual rows by

a number of loops.

Nikolay: Is it only in nest loops or somewhere else?

It's only in nest loops, right?

Michael: Definitely in parallel plans as well.

So if you've got like 2 workers, Then the number of actual rows

you need to multiply it by 3 to get You know leader plus the

2 workers because it gets average per work process as well So

I think there are other cases as well Maybe not

Nikolay: so this this basically bottom line is just this helps

us reduce the error, right?

Yeah.

Yeah, that's it, right?

Because it can be big if a lot of loops happening.

Michael: Exactly, exactly.

More accurate in a lot of cases and a huge improvement for specific

performance issues.

Nikolay: Does it help with some kind of analysis when, for example

pgMustard recommends users what to do no

Michael: a little bit like

Nikolay: numbers right

Michael: I mean if if for example you were so it helps in a few

ways right you can get a more accurate difference between the

estimate and the actual rows slightly but estimated rows are

always an integer so it's not a that's not a huge deal the times

where it's most useful is like the tips making sense like if

you're trying to tell people this many rows were filtered and

if you're telling them all of the rows were filtered that doesn't

quite make sense if they're getting some rows returned by the

query so it helps a little bit but you're still going to get

the tip regardless of this, It's just more a case of like

Nikolay: making sense.

Maybe it should be helpful at different level when you have a

bunch of queries, like say hundreds of queries, and you need

to prioritize and choose top 5 which you which you need to focus

first on.

Michael: Yeah that's a good point.

Yeah so let's take our case where we had 10,000 loops.

The difference between returning 5,000 rows and returning 5

rows is huge.

Like that.

Depending on the size of the table, it might be that a totally

different index scan is like, for 1, a bitmap heap scan might

be really efficient, and for the other, you might actually want

an index scan.

So it might actually matter for how much optimization potential

there is as well if you're returning 5,000 rows there's only

so fast you can make that if you're returning 5 there's a different

maybe an order of magnitude more like optimization potential

so yeah that's a good point

Nikolay: yeah so more like more precise numbers can help you

to have more precise prioritization when you're dealing with

a lot of plans.

You know, my point of view, more and more we should think about

more automated way of processing and because we have a lot of

instances and so on.

And plan flip analysis, we have sometimes using pgMustard for

that.

So it's great if you have more
precise numbers.

And yeah, it can give you precision
and prioritization as well.

What's what matters more?

What matters less?

Good, good, good.

Okay, what?

That's it in this topic or something
else

Michael: I think so

Nikolay: let's move on then right

Michael: index searches I've got
next do you remember the conversation

we had with Peter Geoghegan about skip
scan of course

Nikolay: yeah 2 meanings of it
also

Michael: right yeah yeah well yeah,
you mean with loose index

scan?

Nikolay: This is not, this is the
other meaning.

Michael: This is skip scan.

Nikolay: Basically let me, like,
what I remember, In the topic

of redundant indexes, if, before
we always said, if you have

index on column A and another index
on column A and B, this is

definitely a redundant case, you
can drop, safely drop an index

on column A.

But if it's column B and you have
AB, you cannot drop an index

on column B because it's on the
second place on that second index.

It won't be helpful.

Here in some limited number of
cases, it can be helpful.

And it can skip the first column,
basically, right?

And use index on column A, not
on 2 column index on columns A

and B to work only with queries
which only have filters on B,

not on A at all.

Michael: Yeah, exactly.

If there's relatively few values
in A or if it knows the set

of values of A, like via a different
condition, an index on A,

B can relatively efficiently be
used for a query on B without

any information about A or with
a set of values for A.

And there's a couple of optimizations
in the last couple of major

versions of Postgres.

So this index searches tells you
how many unique descents of

the index there were.

So how many times did it loop through
the index to check each

value of A.

So 1 example could be like, you
know a phone book where you have

it listed by last name and then
first name.

If you live in a country with not
that many last names, you can

imagine saying like, how many people
are there called Nikolay

in, I don't know, San Diego.

And you could look at all of the, you could look per last name

and just jump straight to N each
time, straight to N-I-K and

see how many... and then you don't
keep searching through all of

the Smiths before you move on to
the, I don't know.

Nikolay: I like how you avoided
pronouncing my last name.

It's a good idea, okay.

Michael: Yeah, yeah, to get to the Samokhvalovs.

Nikolay: Good, yeah, so it's, So, but I don't understand what's

happening in EXPLAIN plans for in this area.

Michael: So we now see how many descents happened.

So in 17, in 17, we had some of these optimizations, but we couldn't

see where the Postgres was using them.

I mean we got clues like the execution time dropped and the buffers

were fewer, but in exactly indirectly and now we can see like

if it is using this then index searches will be greater than

If it's not using the optimization index searches will equal

Nikolay: Good, good.

And how does it help for like analysis?

Michael: Good question.

So we at the moment most of the time like by the nature of this

work these are optimizations put in right so most of the time

if you're getting these your queries already faster than it would

have been in previous versions of Postgres so it tends to come

up when the queries relatively fast generally now that's not

that's not always going to be true But I haven't seen that many

cases of this where this was the problem yet.

And I expect to eventually come across quite a few.

But for now, I haven't.

Nikolay: You mean when you see this mechanism being involved,

and you think, OK, this may be a sign that we don't squeeze the

best performance possible, because we could have different index,

which doesn't use this feature, but uses regular index scan mechanics.

Michael: So yeah, but you're right.

If you see this in place, I believe if index searches is above

1 You're almost guaranteed because this only works for pg_index

at the moment I think you're almost guaranteed that there is

a better index definition for that query.

You will be

Nikolay: able to have more.

Michael: Yes exactly so I think it is a way of saying that there

is there is potentially more to squeeze out of this but it's

efficient enough in a lot of cases and the idea of the feature

is only to be used when it's a better option, that I haven't

yet seen query plans.

Imagine a query plan is actually quite complex and you imagine

this part, this is going on in your query, But there might be

a different part that's actually the bottleneck.

So I haven't seen this, like, where this is the problem be the

bottleneck yet.

But it will be, and for very, very simple queries, it's very

easy to show it.

In fact, I blogged about it just last week, showing how you can

then optimize it further.

Nikolay: And Skips can also introduce Postgres 18.

So the feature comes with observability bit.

This is good demonstration how developers should think about

development.

They should think about like not only features, but also how

they will be observed.

I mean, in a regular DevOps approach, you think how you will

monitor your feature.

In this case, it's not monitoring, it's micro-level optimization,

micro-optimization, but still it's great for transparency.

I guess it helps all, including to develop this feature, you

need to see it, right?

So, in tests maybe as well, yeah?

In regression tests, Postgres has, maybe also this is used, I

guess.

Yeah, good, cool.

Michael: By the way, on the monitoring front, there is you can

actually see these

Nikolay: already yeah yeah I missed that

Michael: not added yeah not added in 18 this was always this

was available previously.

Nikolay: This was available previously?

What's the column?

Michael: Let me find it.

Oh actually, No, maybe not pg_stat_statements, sorry.

pg_stat_user_indexes and similar, you know, those views.

Yeah, you can see.

Nikolay: All indexes also, it's just at all indexes.

Michael: Yeah, exactly.

And there's an idx_scan column.

Nikolay: But this is old 1.

Michael: Yeah, but it counts these individual descents.

Nikolay: So if for 1 call we have multiple index cans yeah that's

interesting that's super interesting because you know we develop

quite advanced monitoring lately, and we touched this piece very

recently.

And I now think, what do you think about how this could be used?

Usually we just display 2 things, in our approach at least.

Top ends, so top end indexes by some metric, by index scans for

example.

These are top 10 indexes by index scan.

And also details about for individual index.

Okay, we have this index and we have a lot of graphs including

in the scans and all other stuff.

So I don't understand how we could use what you're saying.

We need basically to understand how many queries involve this

index, and somehow highlight that it's not 1 to 1, it's 1 to

n, right?

But how?

Michael: Yeah, but it depends what the problem is you're trying

to solve, doesn't it?

Nikolay: At macro level I don't
see how.

I guess it's possible but we need
to somehow to understand all

the query IDs and their calls in
pg_stat_statements that this index

is involved there.

How to do that, it's an interesting
question.

Because pg_stat_statements basically,
like it forgets about plans.

Maybe something Lukas Fittl,
pg_stat_statements is working on lately.

I know they just recently presented
yet another attempt to have

plans properly tracked, persistent
plans.

Maybe there, if we look at this
and here, this is interesting

direction.

Quite advanced, I would say.

But at micro level it's pretty
obvious, like we have 1 call.

Explain it's always 1 call.

So if we see multiple index scans.

Okay thank you for some advanced
stuff here, it's food for additional

thoughts.

Michael: Yeah while we're thinking
about the macro level, 1 idea

I had was, if you have, let's say,
like a really right heavy

workload, and you you really want
to minimize the number of indexes

you have, you might want to expand
your search of these overlapping

indexes.

So you mentioned for example what
you define as an overlapping

or redundant index.

You might want to expand that search
to consider indexes that

have the same columns but in a
different order.

So if you've got an index on ABC
and on BAC to serve different

reads, especially if any of those
columns like A or B are low

cardinality, so like don't contain
that many unique values, There's

a chance that you won't have to
pay too high a cost on your reads,

and you might be able to reduce
the number of indexes you have,

which I thought was interesting.

Nikolay: So you want what I think,
if idx_scan column in pg_stat_all_indexes,

pg_stat_user_indexes,
If this is registering multiple,

like it increments by not 1 but
by some number when we have a

call which uses skip scans.

In this case it tells us that probably
there should be another

column which would show distinct
queries were there.

So in this case, we would have
2 counters.

1 would be just how many queries,
how many statements use this

index, and another is how many
index scans were there.

If it's, if these numbers are different,
more than 1, ratio is

more than 1, that means there is
a room for, for like some.

Interesting that our direction
of thought is to avoid skip scans,

so avoid using this feature somehow.

Michael: Well, I was just proposing
a time when you might want

to go the other direction, actually
embrace skip scans, and you

would lose some performance on
some reads, but you'd gain some

performance on some writes.

So I think there are some folks
who would happily pay that trade

off in certain workloads, like,
you know, Internet of Things,

ingesting like loads of data, just
wanting to have minimal indexes

on a table.

You probably already got minimal,
or you've got very few, but

you might be able to reduce it
by 1 or 2 more with this.

But yeah, with most workloads being
read-heavy, I think it makes

sense.

Normally we can afford 1 extra
index if it makes a really important

query or really important endpoint
much faster.

Nikolay: Yeah definitely there's
like some feeling that things

like observability can be improved
here.

We are far from

Michael: it.

Currently, if it helps, I've thought
about it a little from the

micro level.

Like you're thinking from the macro
level.

From the micro level, I'm currently
thinking there are wins here,

but they're unlikely to be like
super common.

They're unlikely to be that huge.

And I'm still seeing loads of queries
out there that can't use

any indexes or you know have really
abysmal performance so like

this yeah so it depends like it
depends how often this is coming

up and how much optimization potential
there is when it does.

Nikolay: Yeah, well, depending
on the project stage, if the stage

is early, usually people focus
on adding indexes.

We deal more with slightly grown
startups, and they come to us

with problems, and we observe a
lot of unused redundant, and

index bloat, and so on.

That's why we focus right now in
opposite direction.

Michael: Yeah, but redundant, like
I've seen, you probably see

this all the time, but the exact
same index, like 3 times.

So there's like, there's often
even lower hanging fruit.

Nikolay: But you see it at micro
level and explain how to identify

this problem in aggregated state
like at macro level yeah this

I don't understand yeah

Michael: probably probably it's
going to show up in number of

buffers.

Nikolay: Which, sorry, what I do
understand, idx_scan is now

what, like, meaning of it shifted
for me.

So it's not how many queries, like...

Michael: Well, it kind of

Nikolay: used to

Michael: be, right?

It used to, because index searches,
I'm not sure of the exact

semantics.

I think there have been some optimizations
in the past that might

have used this, but in the past
it was pretty much one-to-one.

Well,

Nikolay: it might happen multiple
times even, it might, it could

be, it was possible even before
if the same index is used in

different parts of queries independently,
multiple index scans.

So it can be, for example, CTE,
right?

So different stages can use the
same index multiple times.

Union.

So it's definitely not 101, but
now it's even more not 101.

So it's interesting.

Anyway, I will be thinking about
this, thank you.

As I said, this is the food for
additional thoughts.

Yeah, yeah.

There should be some process here,
yeah.

Michael: Shall we go through a
few more?

Yeah.

Window functions.

So in the past, if you specified
verbose, you get output at each

stage of a query plan.

And window functions would just
say like, over question mark.

So you, if you've got multiple
of them in your query it could

start to get a little bit tricky
to see which 1 was which.

Obviously that's quite a niche
case but if you're sharing it

with somebody else they might want
to see what the...

I think verbose is especially helpful
when you're sharing plans

with other people or tools.

Now that has been improved.

So we still get, we get over like
W1 now and then there's a separate

field that reports what W1 is,
which window function that is.

Nikolay: So if in the past, I remember
we said,

EXPLAIN (ANALYZE, BUFFERS).

But on your website, I saw actually,
EXPLAIN (ANALYZE, BUFFERS)

were both settings on, right?

Settings or just settings?

You can skip that.

Michael: Just settings, yeah.

Both work.

Nikolay: And what else?

WAL?

Michael: Yeah, WAL and WAL's
good but look I don't find it

useful that often like most of
the queries.

Nikolay: It's useful.

It's useful.

Michael: Serialize, I think, is...

Nikolay: Ah, this is Postgres 17
feature, right?

Or 16?

17.

Michael: Yeah, 17.

Nikolay: So now mantra is
EXPLAIN ANALYZE

Buffers not needed, buffers not
needed.

Settings, verbose.

Serialize?

WAL?

Michael: Depends, right?

Like, I think if you're typing
them out by hand and you're in

a rush and there's a production
incident, sure, just go ahead

and grab whichever is quickest
to type.

But if you're not, then why not
just paste them all?

Like...

Nikolay: Which all?

Like, there's...

Like everything.

This should be a pre-explain everything.

We discussed that and I think you
proposed something in hackers.

Michael: Yeah, I ended up settling
on proposing that

EXPLAIN (ANALYZE, VERBOSE) should
give you everything.

Like everything should be on by
default with verbose because

it kind of means like I want like
verbose output.

It didn't go very far.

But yeah.

Nikolay: So what, like, there should
be some 1 recommendation

you usually tell people when they
come with problems.

Michael: Yeah, the 1 that you copy
and paste from our app is

EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
settings.

I think we haven't included the
17 ones, because not enough people

are on 17 yet.

Nikolay: In format JSON.

Michael: I think we dropped that.

Nikolay: That's great, Because
I hate dealing with JSON.

People send us JSON and okay, I
can go to some tools, but I cannot

read it myself.

We have actually a converter.

We have a converter from JSON to
text, which is good because

I, myself as a human, consume text,
not JSON.

But for tools, I understand JSON
is better.

OK, good.

So now it's explaining why it's
verbose settings.

OK?

4 words only, right?

Michael: Yeah, not too bad.

And you might want to use the others,
like WAL, if you're dealing

with data modification queries
or you suspect it's an issue like

if you're running out of ideas
why not try them all.

Nikolay: And verbose also brings
compute query ID thing, query

ID.

Yeah, query ID.

Michael: And schema qualifications.

Nikolay: You don't recommend serialize
yet?

Michael: Only because not many
people are on 17 yet.

Nikolay: Okay, so you think you
will switch some future.

Michael: Yeah, in a few years time.

Nikolay: I would definitely include
WAL.

There's also memory.

Michael: Again it's not all supported versions have WAL.

Nikolay: I see, yes.

Michael: I think.

Or maybe they do now.

Maybe it's maybe it's that I can't remember which version that

was added in.

Nikolay: Well last week we just said goodbye to version 13, right?

So now it's version 14, which is the latest, and WAL is included

there.

Michael: Nice, okay.

Nikolay: I think a WAL is worth having.

Michael: Great, and it's nice and short.

Nikolay: Good, okay, so back to the thing you told us about VERBOSE.

While we were chatting about all these options, I already forgot.

So VERBOSE brings what exactly?

Michael: The actual 18 changes don't require VERBOSE which is

a little bit confusing because it adds another field for window

definitions or the yeah the function the over

Nikolay: okay

Michael: oh sorry what's it called yeah okay I have to go back

to my glossary

Nikolay: there's memory which is new but I think it's from 17

maybe

Michael: yeah that's 17 oh but but this is related We do now

get memory information on like a...

So that was...

So memory in...

As an EXPLAIN parameter gives you information about the memory

used during planning time specifically.

Nikolay: Mm-hmm.

And it's from 17.

Memory is from 17.

And that's

Michael: from 17.

But we do now get...

There was another piece of work that adds memory or disk usage

details to a bunch more operation types.

So we already got them, for example, you might be used to seeing

them in sorts and a few other operations.

But now You get them in Materialized nodes, in window aggregates

as well, so for window functions, and CTEs, maybe others as well.

Nikolay: So memory was extended as well?

Michael: Not the parameter memory, but memory information in

EXPLAIN, yes.

Nikolay: Well I mean the memory in 17 brings you less in 4 than

in 18 right this is this is the right thing to say right

Michael: they're unrelated yeah

Nikolay: unrelated so it's only about their boss

Michael: This actually I think is on with ANALYZE, not even...

Nikolay: Why is it so difficult?

Michael: It is difficult, this is why I have to, like, I don't,

I struggle to remember all of this, so this is why I've written,

like, why I write it down.

Nikolay: There is UX issue here, definitely.

Michael: Yeah, the field is called window.

So you'll get a window ag operation with a field called window

within the definitions of each of your windows.

So anything you've got in the over clause.

Nikolay: Why is it called window?

Michael: So window functions are over something.

Nikolay: Ah, so you need to see details about window, ok.

Michael: Exactly.

Nikolay: And this is now, if you use verbose you see it, without

verbose you don't see it.

Michael: So, with verbose you see the output.

Nikolay: Ok.

Michael: I thought you would only see window with the Bose as

well.

So you see the output and the window.

But in testing I found that you could see the window without

the output by just running EXPLAIN ANALYZE

Nikolay: Okay,

so it's unclear.

Michael: Yeah, it's In fact, I've written it perhaps confusingly the window

field is always shown but output is only shown when the proposed

parameter to choose

Nikolay: I don't understand why we don't have give me everything.

Michael: Yeah, it's becoming clear.

Nikolay: There should be some give me everything.

I don't want to, I want short, EXPLAIN ANALYZE everything.

It should be done.

Because if it's development server, I mean, not production server,

I would like to see everything.

And in my optimization process to compare all the details.

Even those which I don't understand yet, because maybe if you

ask LLM or some tooling, it will understand it, right?

Michael: Yeah.

Or pgMustard.

Nikolay: pgMustard, exactly.

So why not bring everything if it's possible, if it's not production.

I understand that production, like maybe observer effect might

be an issue and you probably don't want everything sometimes.

Michael: Well yeah, I don't know, like yeah, I can see an argument

either way.

Nikolay: Okay.

Should we crack on?

Yeah, it was hard.

Michael: Do you remember we discussed that the enable like enable

seqscan and enable index scan like these...

Nikolay: There are changes there.

Instead of

Michael: a big

Nikolay: penalty now it's actual
disabling right?

Michael: Yes well...

Nikolay: And now you see it on
plans if it's disabled.

Michael: Yes.

That's, yeah.

So super quick 1.

But you'll only see in text format,
you'll only see disabled

true for nodes that have been explicitly,
I say disabled, but

let's say enable seqscan equals
off, you'll see on any sequential

scans in that plan, you'll see
disabled true.

Okay.

So simple 1.

Nikolay: Yeah, but I remember some
thought I have, I had when

I was reading about this.

Some concern compared to old behavior.

I don't remember which thought
it was.

You don't see any downsides of
this change?

Michael: I really like this.

I think this mitigates the downsides
of the previous functionality.

Nikolay: Especially if you use
settings, so you see settings,

and you understand the reason why
it's disabled, it's great.

Michael: Yes, exactly.

So if anybody doesn't know, settings
parameter adds to the bottom

of your query plan any non-default
planner related settings and

what they're set to.

Nikolay: Yeah and if you want to
for example if you really somehow

like the old behavior you can just
play with costs right maybe

maybe no.

There is a problem there is different
problem, like we don't

see the second plan.

What the planner had while the
planner was choosing, right?

So we only see the winner.

And I think old behavior of enable
seqscan set to off, getting,

like, putting huge penalty to seq
scans, it helped sometimes

understand that, like, planner behavior,
like, why it switches

from this to that From the sky

Michael: still yeah, you can still
use it for that

Nikolay: Okay Maybe yeah, and I
I want different I want just

to see second and third plan maybe
to understand like the difference

in costs between the winner and
the losers Yeah, but I think

you like UX and like it's not easy
to

Michael: Wait, this makes it easier.

This makes it easier to see the
difference in the costs.

Nikolay: Okay.

Because you

Michael: don't have those huge
numbers.

Nikolay: But it's not what I want.

I want to maybe sometimes always
see the second and third plans.

Michael: Fine, fine, fine.

Nikolay: If it's possible, like
somehow, like add some 0.1 more

flag in EXPLAIN ANALYZE, show me
not 1 plan, but multiple plans.

So, yeah.

Okay, too much, maybe.

Michael: Let's go through the other
couple of changes that we

have got already.

Yeah 1 I think is a kind of a bug
fix which is for parallel bitmap

heap scans You would get exact
heap blocks and lossy heap blocks

to tell you whether the work mem
was big enough for that bitmap

basically.

And it turns out they were only
reporting the leader process,

not the worker processes.

So now that's been changed so that
you now see per worker details

by default, without, with ANALYZE,
not just with VERBOSE.

So Parallel Query workers are normally
only shown, like details for

them is normally only shown with
verbose this is an exception

and if from memory the leader reported
numbers don't include

the worker numbers still so that's
a difference in back So that's

kind of maintaining behavior from
previous versions but it's

different to how most operations
report their details.

So most of the time operations
include the details, include the

information of their children or
from their work processes.

So, yeah, this is a slightly confusing
1, but a big improvement

for people that know what they're
looking for.

Nikolay: TLDR is parallel awareness
of EXPLAIN plans improved.

Right?

Michael: Oh, and accuracy.

Nikolay: Yeah, it sounds like a
bug to me a little bit.

Michael: Yeah it is, it is.

But it's good that we've got that
and you might notice a change.

And then the probably the least
likely to come up but for completeness

did you see we've got a new WAL
buffers full field as well?

So if the WAL buffers happen to
have become full during the

query execution, or the number
of times they become full now

gets reported.

Nikolay: Ah, I remember reading
about this, Yeah, that's great.

Nice.

Michael: So yeah, simple 1.

Nikolay: Yeah, simple, but interesting
how we are going to use

it because usually this kind of
analysis is really like we do

it at macro level and see usually
dealing with wait event analysis

and see, are your WAL buffers
and lightweight lock related to

WAL buffers, WAL write, right?

WAL write and so on.

This is how we indirectly understand
that probably there's a

problem here and contention and
so on.

And then we'll usually go to in
tune siblings and like a group

commit basically.

So let it commit, let it wait a
little bit, slightly, and do

fsync or analog for multiple transactions
for multiple commits

in right heavy, right intensive
workloads it's a very good approach.

But when I think about micro level
like here, explain well, it's

hard because I like usually in
EXPLAIN ANALYZE I do it dependently,

maybe some multiple times, but
I don't think about macro state

of our server.

Michael: Yeah, I actually I don't
imagine this is going to come

up much when you're running it
manually.

You'd have to go on last point
have to be, I think, or imagine

an auto_explain you've got a query
that's intermittently very

slow.

So normally it's relatively okay.

And then sometimes it's extremely
slow.

This would help you find out on
this really slow runs oh it might

be related to this issue

Nikolay: this is great this is
great and but it's it's immediately

provokes a couple of thoughts 1
is p99 for for everything could

not explain

Michael: yeah so on it's we talked
about right

Nikolay: yeah and another thing
is what about checkpoints and

BUFFERS written and seeing sometimes
by backends and like initiated

by backends because something like
because guys called BG writer

and checkpointer, they don't do
their job properly.

So our precious backends need to
do it sometimes.

So I'm thinking maybe explain should
start showing that info as

well, and also in the context of
auto_explain maybe more.

Yeah, that's, again, like I feel
some seeds for future plans

to be grown, you know, in observability.

It's interesting.

And yeah, thank you so much for
sharing all the details here.

Michael: Oh, you're welcome.

Nikolay: I'm glad it was helpful.

For me, it's very helpful.

We are very deep in observability
recently.

So that's great.

That's great.

And I agree with you.

It's more, maybe it's more for
to explain not for occasional

explained plans, but good.

Yeah, maybe.

Anyway, my recommendation is still
like, let's keep WAL included

because we, we, we deal with a
lot of, a lot of startups which

suffer from writing too much WAL.

And again, this brings us to the
topic of let's clean up indexes

because indexes write more WAL.

If you have unused, redundant,
and so on, and then Bloat as well,

you are writing more WAL than
you could.

Eventually, I think it would be
so great to say like you know

like let's clean up this and that
and we predict how much WAL

writes will be reduced but at micro
level having WAL info is

very helpful as well.

Although we always should remember
that it depends on the current

situation, how far from checkpoint
we are.

And also, it depends on parameters,
because you take different

parameters, you will do very different
thing, including WAL

writes will be very different,
so maybe.

Michael: Well, I think it's difficult
for Microsoft because you

need concurrent workload as well
for it to be super inter-

Nikolay: Not concurrent, but the
state needs to be fixed.

Michael: Same each time.

Nikolay: Same, yeah, But this is
solved by Database Lab.

We can reset and be in the same
state, right?

So that's great.

So Database branching helps.

So you start always, you have various
ideas or iterations, and

you always start from very same
state including everything, including

tuples, including the distance
from checkpoint, everything.

So this is not a problem.

The problem is, I see, for example,
if you forgot that parameters

matter a lot.

In one case, you can get a full-fledged
update.

In another case, you can have a
hot update.

And you will see very different
WAL numbers there, right?

Michael: Yeah, true.

Nikolay: But WAL buffers full,
it's interesting.

Like, why don't report number of
dirty buffers in each table

involved in this case?

Like, you know, like having a pg_buffercache here.

Well, I'm joking.

Michael: I think you do get dirty
buffers.

If you include buffers, you get
dirty pages.

Nikolay: What concerns me in this,
what you said, it's not only

what we did, but also what others
did before us recently, right?

In this case, let's report dirty
buffers for each page, each

table fully?

How many, like, buffers loaded
to the buffer pool, how many of

them dirty for this table, for
this index, for each relation?

Oh, this complicates, like, it's
too much.

But anyway, like, I'm curious what
was the reason to introduce

this?

I think there was a specific reason,
right?

Michael: I guess so.

Sometimes these things get added
for completeness.

So like it was getting added to
some system view and it's like,

oh, why don't we also report this
in expense so I'm not sure

which way around this one was.

But yeah, normally these things
get added because they're needed

for some production incident and
they weren't available.

Nikolay: Well in defense of this
there are things that also depend

heavily on previous actions for
example the status of caches

right.

If you use buffer numbers, you see
reads, but next time you see

hits.

Yeah, true.

This is maybe like in the same
route of thinking.

Yeah, OK.

Anyway, great.

I think it was very advanced.

I doubt everyone should know everything,
remember everything,

but in general I feel like the
depth of everything is increasing.

It's more and more nuances.

This is great.

Appreciate it.

Michael: Yeah, absolutely.

Nikolay: Yeah, appreciate all this.

Thank you so much.

Michael: Good to speak with you
and catch you soon.

More episodes

Chapters

Creators and Guests

What is Postgres FM?