Postgres FM

Nikolay and Michael discuss Index-Only Scans in Postgres — what they are, how they help, some things to look out for, and some advice.

Here are some links to things they mentioned:

Index-Only Scans and Covering Indexes (docs) https://www.postgresql.org/docs/current/indexes-index-only-scans.html
Discussion on Twitter about JIT and Parallel Query defaults https://x.com/jer_s/status/1819749688184373742
Postgres Wiki on Index-Only Scans https://wiki.postgresql.org/wiki/Index-only_scans
Heap Fetches https://www.pgmustard.com/docs/explain/heap-fetches
Rows Removed By Filter https://www.pgmustard.com/docs/explain/rows-removed-by-filter
random_page_cost https://postgresqlco.nf/doc/en/param/random_page_cost/
pg_dump docs change https://x.com/samokhvalov/status/1820539826363588755
Crunchy Bridge changed random_page_cost to 1.1 for new servers https://docs.crunchybridge.com/changelog#postgres_random_page_cost_1_1
Autovacuum Tuning Basics (updated blog post by Tomas Vondra) https://www.enterprisedb.com/blog/autovacuum-tuning-basics
Our episode on over-indexing https://postgres.fm/episodes/over-indexing
Our episode on HOT updates https://postgres.fm/episodes/hot-updates
Our episode on partitioning https://postgres.fm/episodes/partitioning

~~~

What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!

~~~

Postgres FM is produced by:

Michael Christofides, founder of pgMustard
Nikolay Samokhvalov, founder of Postgres.ai

With special thanks to:

Jessie Draws for the elephant artwork

Creators and Guests

Host

Michael Christofides

Founder of pgMustard

Host

Nikolay Samokhvalov

Founder of Postgres AI

What is Postgres FM?

A weekly podcast about all things PostgreSQL

Nikolay: Hello, hello, this is
Postgres.FM.

As usual, I don't remember episode
number.

Actually, I don't care.

I know 100 something.

And my name is Nikolay, founder
of Postgres.AI.

And as usual, my co-host is Michael,
pgMustard.

Hi, Michael.

Michael: Hello, Nikolay.

How are you doing?

Nikolay: I'm doing great.

How are you?

Michael: Yeah, good as well.

Nikolay: I actually feel very good
after therapy session we had

last week.

Michael: Glad to hear it.

Nikolay: Yeah, So I know it's your choice.

I'm glad you chose a very practical
topic.

Technical, practical.

So yeah, what is it?

Michael: Yeah, I chose index-only
scans.

I was aiming for technical and
practical, but I also wanted something

that's still beginner-friendly.

Like, I think there's enough depth
we can go into with index-only

scans.

I'm hoping there's something in
here for everybody.

But I definitely see some beginners
not fully understand index-only

scans, how to get them, or even
knowing that they exist, really

truly understanding their benefits,
and also the downsides when

they go wrong, or when they aren't
so efficient.

So yeah, lots to cover I think,
but we can start nice and gently.

Nikolay: Yeah, and since this topic
is related to query optimization,

it's your field, I would say, because
of pgMustard, so I will

let you to be in the driver's seat.

Michael: Yeah, well, I feel like
you are as much, probably more

of an expert in this field than
me still.

But yes, it's definitely something
I've had to look into a lot

for the product I work on.

And we've got a couple of tips
around this within the product.

But I actually wanted to start
like, super basic, like,

index-only scans are a type of scans.

I think it's worth covering that
we've, we've got other scan

types and Postgres gets to choose
between them in certain situations.

We can cover when it gets
the choice of doing an index-only

scan, but I wanted to start with
even the absolute basics.

If, for example, we don't have
any indexes, the only scan option

we have for a table is a sequential
scan.

Yep.

Going through the pages in order,
one at a time, and then if we

have an index...

Nikolay: Sequential or parallel sequential,
if we distinguish between

the 2.

Michael: You consider that a different
type?

Yeah, interesting.

Nikolay: Well, well, technically
it's different.

The planner has to make a choice.

Michael: True, true.

Oh, actually, this came up, and
maybe this is too off-topic,

but do you think parallelism should
be on by default, like with

Postgres?

Like, maximizing

the work together?

Nikolay: I think JIT should
be off by default.

Michael: This is when it came up.

Yeah.

Nikolay: I got beaten very badly again and
it was in my home.

I had a bad, bad, bad, bad, bad situation.

I don't understand why JIT is on.

I completely don't understand.

It just hurts workloads and that's
it.

So yeah, I think random_page_cost
should be closer to seq_page_cost.

We will discuss this, right?

random_page_cost adjustment, and
we discussed it a few times.

And as for parallelism, I saw Jeremy
Schneider's opinion that

maybe in some cases it's better
to switch it off to have more

predictable behavior of query processing.

Because if you...

Well, there are many things like
that.

If you have parallelism turned
on, which is on by default, 2

additional workers usually.

But there are several such things
and definitely it's our topic

today.

Index-only scans is an unpredictable
thing and heap-only tuple updates

is an unpredictable thing.

Quite recently I saw talking with
customers, I saw opinion, which

is probably a reasonable opinion
from backend engineers,

or like, oh, this is something
quite unpredictable.

We have many, many, many installations,
various situations, different

Postgres infrastructures.

So if we think we cannot, if it
was only 1 cluster, we should

think, oh, this is like our pet.

We will take care of behavior closely,
watch it closely.

But if you have thousands of setups,
you cannot rely on such

things.

And maybe you will prefer, like
let's raise this question

and in the end maybe we'll try
to answer ourselves.

So to have predictable behavior,
maybe you should switch off

parallelism, stop relying on heap-only
tuple updates, HOT updates,

and also stop relying on index-only
scans because they can degrade

to regular index scans, right?

Michael: Yeah, or even slightly
worse, I think.

So yeah, let's go back to basics.

Nikolay: Basics also, question.

Do you feel the name is kind of
off?

If we had name choices made right
now, maybe index-only scans

would be named as index scans,
and index scans would be named

like, I don't know, like index
heap scans or something, because

it's hard to...

You need to explain it, And if
people don't work with Postgres

planner every day, they forget
and they expect index scan is

what deals with only indexes, but
then index-only scan, what

is it?

Michael: Yeah, that's a really
good transition into what these

are.

Yeah, so index scans came first,
and I think that's probably

like why we're stuck with that
name.

And in Postgres... in a lot of other
database management systems,

not including Postgres, you have
indexed organized tables.

But in Postgres we don't have that.

We have heap and indexes are what
is often called secondary.

So to look something up via an
index, we'll check first in the

index and then that will point
us to the right place in the heap

to find that row.

Nikolay: Probably right.

Michael: Yeah, yes.

Nikolay: At least pages are right, but offset
might be wrong, right?

Michael: Yeah, well, back to Heap
again.

Let's keep it simple to start with.

Nikolay: For simple, yes, for index
points, but it doesn't know

if this tuple is alive or dead.

This is why we need to check heap.

Michael: Or even, I guess not even
like alive or dead, but visible,

like it might be alive to new transactions.

Nikolay: Right, visible to current
transactions.

Michael: Visible, exactly.

Cool.

So that becomes important in a
moment when, in fact, until, you

know, when index-only scans were
added to Postgres, I found out

it was...

Nikolay: It was not very long.

Michael: 9.2.

Yeah.

So yeah, not that long, like in
the history

of Postgres.

Nikolay: I remember living without them
very long, of course.

And this mantra, like, indexes
don't store visibility information,

I first heard 20 years ago, and
it was a rule.

Index scan must consult Heap to
understand visibility.

Michael: Yes.

So, and looking back, in fact,
the Wiki includes, the Postgres

Wiki includes a really good entry
on index-only scans and explains

that the majority of the work to
add index-only scans was work

on the visibility map to make it
crash safe.

So that, I found that fascinating
looking into it for this episode.

I didn't know that until yesterday.

So yeah, super fascinating.

So the index-only scan, probably
should actually explain what

it is and does.

If the data that we need for the
query is already in the index,

so very typical for a B-tree index,
not for all index types,

but in a B-tree index, we have
the data that we're indexing in

the index.

If that's all we need for the query
we're running...

Nikolay: If we don't use
SELECT *, if we

only select, for example, columns
which are in the index, this

is conditional, right?

Michael: Yes, exactly.

So if the columns we're selecting
are in the index, or for example,

if we're doing like an aggregation,
like COUNT *, we're not

really selecting a column, but
we... yeah.

So yeah, there are like a couple
of cases.

Yes, exactly.

And yeah.

And if the index type supports
it, like B-trees do, then we can

get an index-only scan.

So yeah, really cool feature.

Nikolay: This is like, okay, we
deal only with values which are

present in index keys.

Index keys maybe can be called.

Maybe not, maybe it's a bad idea
to call that.

But yeah, the planner might choose,
may choose or may not choose

index-only scan.

It still may think index scan is
better because the second component,

like I agree with everything you
said so far, but then you say

we don't consult the heap.

Well, in general case, index-only
scan consults the heap because

we don't have visibility information
still, right?

But in some cases, it skips it.

Michael: Do you count the visibility
map as the

heap?

Nikolay: Yes.

No, no, no, no.

No, I mean, we check the visibility
map and see what bit it has in

terms of all visible bit for the
page and the heap we need.

If that page is marked as all visible,
this is a win.

But it's not guaranteed.

It may not be marked as all visible.

In this case, it's kind of degradation
to the behavior of an index scan

for this particular value reference
to the tuple.

And we need to go to the heap and first
see if this tuple is visible

to our session, and secondly, probably
we will jump inside page

to proper position because of Heap-Only Tuple
chain, right?

But yeah, that's it.

So I mean, everything is right,
but it's not guaranteed that

we won't consult heap.

Michael: Yeah, good.

Yes, exactly.

Good point.

So heap, you'll see this in EXPLAIN
plans as heap fetches, if

you're running EXPLAIN ANALYZE.

Nikolay: Even if BUFFERS is not
used in EXPLAIN (ANALYZE, BUFFERS).

Michael: Yes.

Nikolay: Even if you skip BUFFERS, this
kind of buffers you still get.

Michael: In fact, imagine if you
didn't have heap fetches and

you only had buffers, you'd have
to be running it twice to spot

these, like, once.

Anyway, so, it's I really like
that they call this out explicitly.

It's really helpful.

And as you mentioned, like, let's
say we're scanning many, many

rows for our index-only scan.

So, we're doing a range scan across
maybe returning 100 rows

or something.

And we only have to do 1 or 2 heap
fetches.

That can still be a huge win.

It's only if the proportion grows
to a decent.

So, yeah, 2 big benefits.

Let's go back to why even bother
with all that work to make the

visibility map crash safe?

Why even go to the trouble of doing
this?

It's not only the benefit of not
having to do those reads on

the heap, but also I think data
locality.

Like if we're doing a range scan,
the hundreds values on the

index are likely in a much smaller
range of index pages than

they would be heap pages.

I mean, if they were all inserted
in order and there have been

very few updates, maybe they are,
of course, only a relatively

small number of pages, but they
could be anywhere.

It could be really random reads.

So I think historically, especially
if you consider when this

was added, doing those random reads
on the heap could be quite

a lot more expensive than doing
the reads.

Nikolay: Right, and if there is
no data locality, each tuple

sits in its own page, basically,
buffer, and we have a lot of

buffer operations additionally,
just to check visibility, basically,

in this case.

If we have all values already in
index, there's a lot of buffer

operations just to check visibility.

It feels super inefficient and
index-only scans is just an optimization

of index scans.

But they'd be super good if visibility...

Like If you have, for example,
if you just loaded your data into

a table, run a vacuum once on the
table, and don't do any writes

anymore, this is a perfect situation.

Heap fetches will be 0, the best
case for index-only scans.

But in the worst situation, you'll
have the maximum number of

heap fetches, and this should be
maybe the same number of buffer

operations as index scans.

Michael: It, plus, you've had
to check the visibility map,

which admittedly is not many reads,
like a visibility map is

tiny, but you don't have to check
that if you're doing an index

scan.

it's...

Nikolay: Oh, so you think index-only scan
in this extreme case is worse

than index scan?

Michael: Tiny bit.

Nikolay: I know planner might decide,
I think based on the state

of visibility map, in some cases
it might decide to choose index

scan, even the main condition is
met.

Michael: I don't know that it does.

Like in theory, that would make
sense.

I don't know for sure that it does.

Nikolay: It's worth checking.

It's a simple experiment.

Worth checking and looking at buffers.

Yeah.

But for simplicity, number 1 thing
to memorize, index-only scan.

If you have autovacuum tuned, index-only
scan is the best what

you can get from Postgres, for
example, for aggregate

SELECT COUNT, things like that.

If you get index-only scan, even
if it's like a lot, a lot of

rows, this is the best.

Michael: Well, I completely agree.

Nikolay: Index-only scan and heap
fetches are 0.

This is the ideal situation.

Michael: And 1 more condition,
Rows removed by filter 0 as well.

So I see sometimes people get excited
because they see an index

scanned or an index-only scan and
then they stop because they

think I've got an index-only scan,
but it's returning 100 rows

for every … So

Nikolay: you get 1 row, right?

Michael: Yeah, exactly.

So there is this 1 other gotcha
that sometimes catches beginners

out.

You would never be caught out by
that.

But some people see index-only
scan and think, great, I'm done.

And actually, it's still hideously
inefficient.

So yeah, they're the 2 things we
look out for.

Nikolay: Okay.

So yes, in phrasing this, you really
need those rows that index-only

scan brought to you, right?

You really need them.

Because if you see 99% was filtered
out, it means those rows

were not needed in the first place,
and you just need to find

better index, for example.

Michael: Exactly.

Nikolay: Also, let's touch this
thing.

You create an index, you run vacuum,
no writes yet.

You run SELECT COUNT(*) from
this table, or SELECT column,

indexed column, like ID, for example,
which is technically the

same because it cannot be null.

We know count, won't count, rows,
which are null.

And * cannot be null, even if...

It's complicated.

Maybe we should save it for another
episode about counts, right?

But you run it, but you still see
sequential scan in the plan.

This is happening all the time,
right?

Michael: That's a

good point.

Yeah, good point.

Or parallel sequential scan.

Nikolay: Or parallel sequential
scan.

And you think, why?

Why, like, Index is better, right?

Because index-only scan is better.

I know it.

It's much fewer buffer hits or
reads, or both.

Right?

What to do?

Michael: Yeah, so before we jump
straight to it, you can use,

if you want to ask this kind of
question yourself, there's like

some parameters that we can set
locally to help us diagnose these

things.

So we could use enable_seqscan
and set that to false briefly.

And then hopefully that will encourage
an index-only scan.

We can compare the costs.

And once you compare the costs,
it should become clear.

Nikolay: How to compare the cost?

Disable scan and look?

Michael: Run, EXPLAIN, ANALYZE,
SELECT counts, or maybe with

BUFFERS.

But you don't need it.

In fact, you don't even need ANALYZE.

Nikolay: Yeah, yeah, just because
we want to compare planning

cost.

And if sequential scan was chosen,
it means the cost was lower

than for index-only scan.

Michael: Yes, exactly.

And then I'm guessing where you're
going with this is probably

random_page_cost.

Nikolay: Right.

This happens if you have default
random_page_cost.

And it doesn't make sense if you
have fast disks, random accesses

close to sequential access on SSD
and NVMe, right?

In memory, it's also so.

You might have very slow magnetic
disks, but the database is

well cached and you rely on it
somehow, right?

It also can happen still.

In this case, random_page_cost
is 4 compared to seq_page_cost

is 1, and to work in a B-tree it's
random access compared to page

by page reading of heap sequentially.

So index scan, index-only scan,
the planner looks at

random_page_cost.

If it's 4 times more expensive,
it might prefer seq_page_cost.

Michael: Especially if it can do
it in parallel.

Yeah.

Yes, yes.

Nikolay: Well, index scan and index
only scan also can be parallel.

Michael: True.

Good point.

Nikolay: Like we have duplicated.

Michael: So maybe that doesn't
matter.

Yeah.

Good point.

Nikolay: Bitmap scan, index scan,
index-only scan, seq scan,

and then parallelized versions
of them, right?

Michael: Yeah, I skipped bitmap
scan to keep things simpler.

But we

maybe should do a whole
episode on that because it is

interesting enough.

Nikolay: Yeah.

Yeah.

And for index scan parallelization
cannot be, as I remember,

it cannot be applied to navigating
inside B-tree, inside the

tree, going from root to leaf.

But it can be applied to fetching
leaf pages in tree, and also

accessing heap, and also traversing
leaf nodes.

So it has a bidirectional list for
all the leaf nodes, allowing

not going up and down all the time,
not starting from root for

each entry, right?

We just need the first entry.

And then I think it can be parallelized,
dealing with leaf nodes.

This is what I remember, I might
be slightly wrong about internals

here.

But again, index-only scan is the
best if you need a lot of rows.

For SELECT COUNT(*) you might
still see a sequential scan.

It's a good sign you have not tuned
random_page_cost, which we

recommend to everyone to do sooner
because later it will feel

more risky.

Michael: Scarier, yeah.

Nikolay: Yeah, scarier because you
have a heavy workload and changing

this might flip your plans in some
unpredictable manner.

So yeah, it's a big task, But if
you do it early, live with it,

it feels more reasonable than keeping
defaults.

Did you have cases when you see
this, okay, cost for sequential

scan is lower, that's why it's
chosen, But when you say enable

seq scan to off, giving some huge
penalty to sequential operations

in planner's mind, then you see
timing and buffer operations

so much better for index-only scan.

You think, oh wow, planner is so
wrong here.

You had it, yeah, I have it all
the time.

And this is a good sign we need
to change random_page_cost by

default in all databases, basically.

Michael: Yeah, I think we've discussed
this in previous episodes,

but in the past I've seen, like,
conservatively, people go down

to 2, but

more aggressively.

Nikolay: And you know the
reason, right?

I mean, ah, not 2, I was
thinking, why default is still

You know, like why is it still
4, the default?

Michael: Yeah, I don't know.

I've seen argument, or I've seen
people mention that they don't

want, like people have old systems
and they don't want to change

those.

But yeah, I agree with you, but
that is the argument I've heard.

Nikolay: Defaults are applied
to new systems. New systems are on SSD...

Michael: I understand.
But that's what people say.

And I actually think also, even
if they didn't, like, It's not

about old systems versus new systems,
it's about SSDs versus

spinning disks.

It's about disk type, mostly.

Most people aren't setting up new
systems on spinning disks.

Like most people are setting up.

In that case, why are we forcing
the majority to have to change

their default instead of like default
should be for the majority?

Yeah.

Nikolay: it's second place after
JIT, or maybe default shared

buffers.

Third place.

Third place goes t
random_page_costs.

Let's have maybe some chart of
defaults we don't like, we would

like to be changed.

By the way, things are changing
slowly.

Yesterday I learned that pg_dump
is not a backup tool anymore.

It's off topic, but it's very similar.

So many years of discussing, it's
like you're swimming against

the current all the time.

You say something, but you open
documentation and it says this,

and you know every experienced
guy says this, but still...

So defaults are similar.

So Postgres until version 18, in
documentation stated, pg_dump

is a backup tool.

Like, first sentence about pg_dump.

And finally, like 5 days ago, Peter
Eisentraut committed the

change.

pg_dump is a tool to export PostgreSQL
databases.

Michael: Yay!

Nikolay: Not backup.

But this will change only in a
year, in PostgreSQL 18.

But it's already a win, like so
good.

I hope defaults also, like they
are slowly changing.

Log checkpoint was on, work_mem.

Yeah.

it?

Michael: hash_mem_multiplier has been, like
work_mem's been increasing very

slowly over the years.

hash_mem_multiplier doubled in like
not that long ago.

Like there was...

Nikolay: Autovacuum default throttling
10 times.

But it was many years already ago.

Yeah, slowly it's changing, it's
good.

So random_page_cost, most people
say, decrease it.

1.1 maybe.

You said some guys decide to go
with 2.

Well, I can say some guys decided
to put 1 there, but put 4 to

seq_page_cost.

Michael: What?

Nikolay: Yes.

Michael: Okay.

So I've seen sensible people that
have done testing, I've seen

set anywhere between 1.1 and 2.

And I tended to go closer to 1.1
if the performance results.

Yeah, I

think that's really cool.

Nikolay: Crunchy, after we did
the episode about something, They

tested it and published some details.

Maybe we'll find them a touch.

Their benchmarks showed that 1.1
in their case, I think it's

AWS and GCP or something, right?

It's still In their case, they
found out that 1.1 is slightly

better than 1.0, according to some
benchmarks.

Not mine, I don't know.

So since then, I also kind of put
1.1 before I tried to put

1.0.

So I don't know.

1.1 is a good start, right?

Enough maybe here.

Michael: Let's go through a few
other recommendations.

I think this is 1 recommendation
of something that you can do

to encourage index-only scans or
recommendations that are around

index-only scans.

We did a whole episode, for example,
on over-indexing, and I

think that's relevant here.

Nikolay: Before we go to over-indexing,
1 second.

It's super important to tune to
autovacuum because otherwise heap

fetches will be super high and
you need to, you need to, autovacuum...

autovacuum is not only about vacuuming
or collection of statistics

or fighting with transaction and
MultiXact ID wraparound.

It's also important because it
maintains visibility maps.

Michael: Yes.

And it's the only thing that maintains
visibility, like the only

thing that can set those bits

Nikolay: Directly, yes, yes,
directly.

Michael: I think this is important
because any, let's say we've

got a page in Postgres that has
20 tuples on it.

Like really common to have like
a couple of dozen of tuples on

a page.

If any of those rows are changed,
the visibility of that row,

like the bit gets-

Nikolay: It's out of the game.

Yeah, this page-

Michael: Yeah, it goes from 1 to
0, right?

It goes from all visible to not
all visible, or not guaranteed

to be all visible.

Nikolay: No, it's not all visible,
that's it.

We don't know, but it already cannot
be considered all visible,

that's it.

Yeah, technically, these tuples
might be visible to everyone.

Autovacuum just didn't mark it.

Michael: Didn't get there yet.

Nikolay: Yeah.

So autovacuum needs to do
this as well.

That's why frequent visits of tables
are so important.

Not only because of deleting that
tuples.

Michael: Yes.

Now is, I guess it is possible
that you will never benefit from

index-only scans if you have a table,
but maybe like a small table

that's constant, that the rows
are constantly being updated.

But if you, most of us have tables
where, or at least like the

cases we're thinking about here,
you have like some data that's

pretty old that doesn't change
often that we do still want to

read.

And like in those cases, as long
as you can, yeah, tuning

autovacuum super valuable for maintaining
the performance of index-only

scans.

Is there anything else around that?

I actually shared just recently
in my newsletter an old post

by Tomas Vondra.

I thought I revisited it.

He's updated it for the latest
versions of Postgres.

I'll share it again because I think
it's so good

Nikolay: Oh yeah, yeah, I also saw it.
That's great.

I actually expected this because
after they acquired the second

quadrant, they broke that block
and some links didn't work.

And I used archive.org to fetch
some old pages.

It's great.

These posts needed to be updated.

I also think they should be part
of documentation, honestly.

Michael: Nice.

Well, high praise.

Nikolay: Yeah.

So, but documentation tends to
avoid good how-tos and so far,

only basics.

There are how-to notes in documentation,
but they are scattered

among general documentation, reference
style, and so on.

I wish documentation grew like
a huge how-to section, but it's

a different story.

But let's not lose this important
point in this context.

So autovacuum maintains visibility
map, which has this all visible.

It also has all frozen, but that's
a different story.

All visible, 2 bits for each page.

And if we have a lot of writes,
like many writes are happening,

many pages are having 0 for all
visible bit.

And what's important here to understand
that partitioning is

so good.

Like if you just consider unpartitioned
one terabyte table and

partitioned one where old partitions,
like most of writes go to

the latest or few latest partitions
and old partitions are like

they can be, they can receive writes
but it happens rarely.

Yeah.

And vacuum aggressively fairly
frequently visits such pages.

Oh, okay, some write happened in
old partition but I immediately

mark this page all visible, and
it's only 1 page in this partition.

That's why, like, archive, archive
data locality and archive

kind partitions, old ones and only
new or the latest receives

all inserts, some updates, they're
usually for fresh data.

It means this, like old partitions
most of the time will be all

visible for all their pages.

This is so good if massive reads
are needed for all data.

Without partitioning, any page
can suddenly receive a write,

even if all rows in this page are
5 years old.

And then suddenly, oh, there is
some place here.

Let's insert a new tuple.

And it’s out of...

So everything is shuffled, right?

New data and old data, they can
live in the same page.

And for visibility map, it's kind
of a lot of work for autovacuum,

and visibility map is most of the
time kind of in bad shape.

So what I'm trying to say, partitioning
is needed not only because

indexes become smaller.

This reason is, over years I realized,
is one of the smallest reasons

for me.

Because B-tree height grows very
slow, you know.

But this particular thing, this
may be number one reason, since

recently I realized it.

Michael: Wow,

you think so?

Nikolay: For partitioning, like,
we can say, okay, This partition

is barely touched in terms of writes.

We just read it because people
want to see archived data.

But reading is well organized most
of the time.

If it counts or something, It's
index-only scans with heap fetches

very close to zero.

Without partitioning, impossible.

I mean, and less predictable also.

Michael: Yeah, I mean, in the case
you're talking about where

it's append mostly and it's to
the right of the...

I've definitely seen cases where
you don't touch old data anyway,

whether it's partitioned or not.

But I definitely see this as yet
another benefit of partitioning.

I'm surprised you said it's like,
you see it as the biggest,

but that's really cool.

Nikolay: Well, actually, I think
if we, for example, decide to

create additional indexes to support
index-only scans, and we

will talk about methods right now,
right?

But in the index-write amplification,
I know we have a plan.

But if you, for example, decide...
usually we keep the same sets

of indexes on all partitions, right?

But technically, you can create
an index for a particular partition.

And if, for example, you created
it for archive-style partitions

to support index-only scan, to
prioritize them for reads, and

you maybe don't care about write
amplification, because again,

these partitions barely receive
writes, and HOT updates also

we don't care.

We care about...

So we might decide to create more
indexes for all partitions

to benefit from...

Michael: Yeah,

I've not seen that but
it's a cool idea.

Nikolay: It's just like, I don't
know, maybe it's a crazy idea,

maybe we'll have some walls hit
if we implement it.

But it seems reasonable, and for
fresh partitions, I never did

it.

For fresh partitions, you avoid
creating these indexes because

you want HOT updates occurring.

So, right?

Yeah.

Let's maybe dive into details a
little bit.

Michael: I yeah, well, we could
point people towards the we did

a whole episode on over indexing
and we did another 1 on HOT

updates.

So let's point people at those
because I think there's a couple

more like definitely want to get
your thoughts on the difference

between multi-column index and
include like when you use each

or...

Nikolay: when it was released again
like 11? 12? 10? 11? I don't

remember.

Some many years, already
many years ago, and I was

thinking, is it only for unique
indexes?

Honestly, I remember I had
Anastasia Lubennikova presented on

Postgres TV details, some details.

Yeah, or maybe it was an interview
because she participated in

this include keyword covering indexes.

So my understanding, we only need...

It might be a size difference also,
right?

Because if we want to put in a
new column to have multi-column

index, 1 more column we just included,
it will participate in

structure, right?

But if we say include, it's just
additionally stored in leaf

nodes, right?

Michael: Leaf page, exactly.

Nikolay: Yeah.

And it means that kind of for to
achieve index-only scans, both

cases work, but the first case
can be also used since it's in

structure, it can be used to verify
uniqueness.

Or maybe also constrain exclusion.

Exclusion constraints, right?

Maybe.

For range types.

It's a different story, but definitely
for uniqueness.

I now realize that actually we
cannot put some data type, columns

of some data type to multi-column
index, but we can put them

to include.

This is also a benefit of include.

So unique constraint can, like,
we don't want to put index because

it will break the logic of uniqueness,
right?

But we can put additional column
to include and have index-only

scan, not changing logic of unique
constraint.

But also to include, we can put
data types which are not B-tree

friendly, which are not supported
by B-tree.

Michael: Like not...

Nikolay: Arrays, for example, right?

Michael: Yeah, yeah, yeah.

Nikolay: Polygons or something, like points.

Michael: Although you do need to
be careful with size, right?

Nikolay: Of course, yes.

Of course.

Size matters.

But imagine we have like point, 2 numbers,
right?

So we can put it to include and
the index-only scans will happen.

Michael: I've got 1 more thing
that I wanted to make sure we

mentioned, which was something
practical and very beginner-friendly.

In fact, probably the main thing
that I see stopping index-only

scans in the real world, which
is people selecting more data

than they need.

And I know it's kind of obvious,
but it's really common in ORMs

to have SELECT * be the default,
which is almost always going

to be giving you more columns than
you actually need for that

specific query.

But I see people doing it, like,
people deleting features, for

example.

Like, changing how a page is structured
in an application.

They no longer need all the data
from the queries returning,

but don't change the query.

There are so many ways it can creep
into people's code that they

no longer need all the data they're
returning.

So being careful on that front
and also just remembering that

there are these optimizations if
you can be, even if the application

does currently use all of the columns,
do we need to like what

value is that adding at the moment?

Can you make things much faster
by not including that information

at that point.

Nikolay: Right.

Yes.

And finally, maybe let's mention
also HOT updates.

I already kind of mentioned that
indirectly, but directly, like

if you have, if you need to put
extra, 1 more column to the index,

either to...

Doesn't matter if it's to extend
this multi-column list or to

include, to have coverage for your
selects.

You can lose the HOTness of some updates,
right?

Because even if it's include, still,
if this column is updated

frequently, these updates cannot
be HOT.

They will be much slower.

They will be regular updates.

So it's better sometimes to avoid
this.

So this trade-off here, you want
select only scans but you lose

HOT updates.

What's needed?

Selects or writes, I mean updates.

Michael: Yeah, I know this is very
off topic, but do you know

heap-only tuple updates are no
longer heap-only in some cases?

Like in BRIN, there's some changes
to BRIN indexes.

Have you seen this?

And now, Yeah, now it updates the
BRIN index.

I thought that was funny.

Nikolay: Yeah, yeah.

Okay, so I think we've covered enough,
right?

Michael: Yeah, 100%.

Nikolay: For iOS.

When I hear iOS, I think index-only
scans, right?

Michael: That’s so funny.

Are you on Android or iOS?

Nikolay: I'm on iOS.

My phone, I mean.

Michael: Index-only scans.

Cool, cool.

Nikolay: Yeah.

Yeah.

OK.

Good.

See you next week.

Bye.

Michael: See you next week.

Bye.

More episodes

Chapters

Creators and Guests

What is Postgres FM?