A weekly podcast about all things PostgreSQL
Nikolay: Hello, hello, this is
Postgres.FM.
As usual, I don't remember episode
number.
Actually, I don't care.
I know 100 something.
And my name is Nikolay, founder
of Postgres.AI.
And as usual, my co-host is Michael,
pgMustard.
Hi, Michael.
Michael: Hello, Nikolay.
How are you doing?
Nikolay: I'm doing great.
How are you?
Michael: Yeah, good as well.
Nikolay: I actually feel very good
after therapy session we had
last week.
Michael: Glad to hear it.
Nikolay: Yeah, So I know it's your choice.
I'm glad you chose a very practical
topic.
Technical, practical.
So yeah, what is it?
Michael: Yeah, I chose index-only
scans.
I was aiming for technical and
practical, but I also wanted something
that's still beginner-friendly.
Like, I think there's enough depth
we can go into with index-only
scans.
I'm hoping there's something in
here for everybody.
But I definitely see some beginners
not fully understand index-only
scans, how to get them, or even
knowing that they exist, really
truly understanding their benefits,
and also the downsides when
they go wrong, or when they aren't
so efficient.
So yeah, lots to cover I think,
but we can start nice and gently.
Nikolay: Yeah, and since this topic
is related to query optimization,
it's your field, I would say, because
of pgMustard, so I will
let you to be in the driver's seat.
Michael: Yeah, well, I feel like
you are as much, probably more
of an expert in this field than
me still.
But yes, it's definitely something
I've had to look into a lot
for the product I work on.
And we've got a couple of tips
around this within the product.
But I actually wanted to start
like, super basic, like,
index-only scans are a type of scans.
I think it's worth covering that
we've, we've got other scan
types and Postgres gets to choose
between them in certain situations.
We can cover when it gets
the choice of doing an index-only
scan, but I wanted to start with
even the absolute basics.
If, for example, we don't have
any indexes, the only scan option
we have for a table is a sequential
scan.
Yep.
Going through the pages in order,
one at a time, and then if we
have an index...
Nikolay: Sequential or parallel sequential,
if we distinguish between
the 2.
Michael: You consider that a different
type?
Yeah, interesting.
Nikolay: Well, well, technically
it's different.
The planner has to make a choice.
Michael: True, true.
Oh, actually, this came up, and
maybe this is too off-topic,
but do you think parallelism should
be on by default, like with
Postgres?
Like, maximizing
the work together?
Nikolay: I think JIT should
be off by default.
Michael: This is when it came up.
Yeah.
Nikolay: I got beaten very badly again and
it was in my home.
I had a bad, bad, bad, bad, bad situation.
I don't understand why JIT is on.
I completely don't understand.
It just hurts workloads and that's
it.
So yeah, I think random_page_cost
should be closer to seq_page_cost.
We will discuss this, right?
random_page_cost adjustment, and
we discussed it a few times.
And as for parallelism, I saw Jeremy
Schneider's opinion that
maybe in some cases it's better
to switch it off to have more
predictable behavior of query processing.
Because if you...
Well, there are many things like
that.
If you have parallelism turned
on, which is on by default, 2
additional workers usually.
But there are several such things
and definitely it's our topic
today.
Index-only scans is an unpredictable
thing and heap-only tuple updates
is an unpredictable thing.
Quite recently I saw talking with
customers, I saw opinion, which
is probably a reasonable opinion
from backend engineers,
or like, oh, this is something
quite unpredictable.
We have many, many, many installations,
various situations, different
Postgres infrastructures.
So if we think we cannot, if it
was only 1 cluster, we should
think, oh, this is like our pet.
We will take care of behavior closely,
watch it closely.
But if you have thousands of setups,
you cannot rely on such
things.
And maybe you will prefer, like
let's raise this question
and in the end maybe we'll try
to answer ourselves.
So to have predictable behavior,
maybe you should switch off
parallelism, stop relying on heap-only
tuple updates, HOT updates,
and also stop relying on index-only
scans because they can degrade
to regular index scans, right?
Michael: Yeah, or even slightly
worse, I think.
So yeah, let's go back to basics.
Nikolay: Basics also, question.
Do you feel the name is kind of
off?
If we had name choices made right
now, maybe index-only scans
would be named as index scans,
and index scans would be named
like, I don't know, like index
heap scans or something, because
it's hard to...
You need to explain it, And if
people don't work with Postgres
planner every day, they forget
and they expect index scan is
what deals with only indexes, but
then index-only scan, what
is it?
Michael: Yeah, that's a really
good transition into what these
are.
Yeah, so index scans came first,
and I think that's probably
like why we're stuck with that
name.
And in Postgres... in a lot of other
database management systems,
not including Postgres, you have
indexed organized tables.
But in Postgres we don't have that.
We have heap and indexes are what
is often called secondary.
So to look something up via an
index, we'll check first in the
index and then that will point
us to the right place in the heap
to find that row.
Nikolay: Probably right.
Michael: Yeah, yes.
Nikolay: At least pages are right, but offset
might be wrong, right?
Michael: Yeah, well, back to Heap
again.
Let's keep it simple to start with.
Nikolay: For simple, yes, for index
points, but it doesn't know
if this tuple is alive or dead.
This is why we need to check heap.
Michael: Or even, I guess not even
like alive or dead, but visible,
like it might be alive to new transactions.
Nikolay: Right, visible to current
transactions.
Michael: Visible, exactly.
Cool.
So that becomes important in a
moment when, in fact, until, you
know, when index-only scans were
added to Postgres, I found out
it was...
Nikolay: It was not very long.
Michael: 9.2.
Yeah.
So yeah, not that long, like in
the history
of Postgres.
Nikolay: I remember living without them
very long, of course.
And this mantra, like, indexes
don't store visibility information,
I first heard 20 years ago, and
it was a rule.
Index scan must consult Heap to
understand visibility.
Michael: Yes.
So, and looking back, in fact,
the Wiki includes, the Postgres
Wiki includes a really good entry
on index-only scans and explains
that the majority of the work to
add index-only scans was work
on the visibility map to make it
crash safe.
So that, I found that fascinating
looking into it for this episode.
I didn't know that until yesterday.
So yeah, super fascinating.
So the index-only scan, probably
should actually explain what
it is and does.
If the data that we need for the
query is already in the index,
so very typical for a B-tree index,
not for all index types,
but in a B-tree index, we have
the data that we're indexing in
the index.
If that's all we need for the query
we're running...
Nikolay: If we don't use
SELECT *, if we
only select, for example, columns
which are in the index, this
is conditional, right?
Michael: Yes, exactly.
So if the columns we're selecting
are in the index, or for example,
if we're doing like an aggregation,
like COUNT *, we're not
really selecting a column, but
we... yeah.
So yeah, there are like a couple
of cases.
Yes, exactly.
And yeah.
And if the index type supports
it, like B-trees do, then we can
get an index-only scan.
So yeah, really cool feature.
Nikolay: This is like, okay, we
deal only with values which are
present in index keys.
Index keys maybe can be called.
Maybe not, maybe it's a bad idea
to call that.
But yeah, the planner might choose,
may choose or may not choose
index-only scan.
It still may think index scan is
better because the second component,
like I agree with everything you
said so far, but then you say
we don't consult the heap.
Well, in general case, index-only
scan consults the heap because
we don't have visibility information
still, right?
But in some cases, it skips it.
Michael: Do you count the visibility
map as the
heap?
Nikolay: Yes.
No, no, no, no.
No, I mean, we check the visibility
map and see what bit it has in
terms of all visible bit for the
page and the heap we need.
If that page is marked as all visible,
this is a win.
But it's not guaranteed.
It may not be marked as all visible.
In this case, it's kind of degradation
to the behavior of an index scan
for this particular value reference
to the tuple.
And we need to go to the heap and first
see if this tuple is visible
to our session, and secondly, probably
we will jump inside page
to proper position because of Heap-Only Tuple
chain, right?
But yeah, that's it.
So I mean, everything is right,
but it's not guaranteed that
we won't consult heap.
Michael: Yeah, good.
Yes, exactly.
Good point.
So heap, you'll see this in EXPLAIN
plans as heap fetches, if
you're running EXPLAIN ANALYZE.
Nikolay: Even if BUFFERS is not
used in EXPLAIN (ANALYZE, BUFFERS).
Michael: Yes.
Nikolay: Even if you skip BUFFERS, this
kind of buffers you still get.
Michael: In fact, imagine if you
didn't have heap fetches and
you only had buffers, you'd have
to be running it twice to spot
these, like, once.
Anyway, so, it's I really like
that they call this out explicitly.
It's really helpful.
And as you mentioned, like, let's
say we're scanning many, many
rows for our index-only scan.
So, we're doing a range scan across
maybe returning 100 rows
or something.
And we only have to do 1 or 2 heap
fetches.
That can still be a huge win.
It's only if the proportion grows
to a decent.
So, yeah, 2 big benefits.
Let's go back to why even bother
with all that work to make the
visibility map crash safe?
Why even go to the trouble of doing
this?
It's not only the benefit of not
having to do those reads on
the heap, but also I think data
locality.
Like if we're doing a range scan,
the hundreds values on the
index are likely in a much smaller
range of index pages than
they would be heap pages.
I mean, if they were all inserted
in order and there have been
very few updates, maybe they are,
of course, only a relatively
small number of pages, but they
could be anywhere.
It could be really random reads.
So I think historically, especially
if you consider when this
was added, doing those random reads
on the heap could be quite
a lot more expensive than doing
the reads.
Nikolay: Right, and if there is
no data locality, each tuple
sits in its own page, basically,
buffer, and we have a lot of
buffer operations additionally,
just to check visibility, basically,
in this case.
If we have all values already in
index, there's a lot of buffer
operations just to check visibility.
It feels super inefficient and
index-only scans is just an optimization
of index scans.
But they'd be super good if visibility...
Like If you have, for example,
if you just loaded your data into
a table, run a vacuum once on the
table, and don't do any writes
anymore, this is a perfect situation.
Heap fetches will be 0, the best
case for index-only scans.
But in the worst situation, you'll
have the maximum number of
heap fetches, and this should be
maybe the same number of buffer
operations as index scans.
Michael: It, plus, you've had
to check the visibility map,
which admittedly is not many reads,
like a visibility map is
tiny, but you don't have to check
that if you're doing an index
scan.
Like
it's...
Nikolay: Oh, so you think index-only scan
in this extreme case is worse
than index scan?
Michael: Tiny bit.
Nikolay: I know planner might decide,
I think based on the state
of visibility map, in some cases
it might decide to choose index
scan, even the main condition is
met.
Michael: I don't know that it does.
Like in theory, that would make
sense.
I don't know for sure that it does.
Nikolay: It's worth checking.
It's a simple experiment.
Worth checking and looking at buffers.
Yeah.
But for simplicity, number 1 thing
to memorize, index-only scan.
If you have autovacuum tuned, index-only
scan is the best what
you can get from Postgres, for
example, for aggregate
SELECT COUNT, things like that.
If you get index-only scan, even
if it's like a lot, a lot of
rows, this is the best.
Michael: Well, I completely agree.
Nikolay: Index-only scan and heap
fetches are 0.
This is the ideal situation.
Michael: And 1 more condition,
Rows removed by filter 0 as well.
So I see sometimes people get excited
because they see an index
scanned or an index-only scan and
then they stop because they
think I've got an index-only scan,
but it's returning 100 rows
for every … So
Nikolay: you get 1 row, right?
Michael: Yeah, exactly.
So there is this 1 other gotcha
that sometimes catches beginners
out.
You would never be caught out by
that.
But some people see index-only
scan and think, great, I'm done.
And actually, it's still hideously
inefficient.
So yeah, they're the 2 things we
look out for.
Nikolay: Okay.
So yes, in phrasing this, you really
need those rows that index-only
scan brought to you, right?
You really need them.
Because if you see 99% was filtered
out, it means those rows
were not needed in the first place,
and you just need to find
better index, for example.
Michael: Exactly.
Nikolay: Also, let's touch this
thing.
You create an index, you run vacuum,
no writes yet.
You run SELECT COUNT(*) from
this table, or SELECT column,
indexed column, like ID, for example,
which is technically the
same because it cannot be null.
We know count, won't count, rows,
which are null.
And * cannot be null, even if...
It's complicated.
Maybe we should save it for another
episode about counts, right?
But you run it, but you still see
sequential scan in the plan.
This is happening all the time,
right?
Michael: That's a
good point.
Yeah, good point.
Or parallel sequential scan.
Nikolay: Or parallel sequential
scan.
And you think, why?
Why, like, Index is better, right?
Because index-only scan is better.
I know it.
It's much fewer buffer hits or
reads, or both.
Right?
What to do?
Michael: Yeah, so before we jump
straight to it, you can use,
if you want to ask this kind of
question yourself, there's like
some parameters that we can set
locally to help us diagnose these
things.
So we could use enable_seqscan
and set that to false briefly.
And then hopefully that will encourage
an index-only scan.
We can compare the costs.
And once you compare the costs,
it should become clear.
Nikolay: How to compare the cost?
Disable scan and look?
Michael: Run, EXPLAIN, ANALYZE,
SELECT counts, or maybe with
BUFFERS.
But you don't need it.
In fact, you don't even need ANALYZE.
Nikolay: Yeah, yeah, just because
we want to compare planning
cost.
And if sequential scan was chosen,
it means the cost was lower
than for index-only scan.
Michael: Yes, exactly.
And then I'm guessing where you're
going with this is probably
random_page_cost.
Nikolay: Right.
This happens if you have default
random_page_cost.
And it doesn't make sense if you
have fast disks, random accesses
close to sequential access on SSD
and NVMe, right?
In memory, it's also so.
You might have very slow magnetic
disks, but the database is
well cached and you rely on it
somehow, right?
It also can happen still.
In this case, random_page_cost
is 4 compared to seq_page_cost
is 1, and to work in a B-tree it's
random access compared to page
by page reading of heap sequentially.
So index scan, index-only scan,
the planner looks at
random_page_cost.
If it's 4 times more expensive,
it might prefer seq_page_cost.
Michael: Especially if it can do
it in parallel.
Yeah.
Yes, yes.
Nikolay: Well, index scan and index
only scan also can be parallel.
Michael: True.
Good point.
Nikolay: Like we have duplicated.
Michael: So maybe that doesn't
matter.
Yeah.
Good point.
Nikolay: Bitmap scan, index scan,
index-only scan, seq scan,
and then parallelized versions
of them, right?
Michael: Yeah, I skipped bitmap
scan to keep things simpler.
But we
maybe should do a whole
episode on that because it is
interesting enough.
Nikolay: Yeah.
Yeah.
And for index scan parallelization
cannot be, as I remember,
it cannot be applied to navigating
inside B-tree, inside the
tree, going from root to leaf.
But it can be applied to fetching
leaf pages in tree, and also
accessing heap, and also traversing
leaf nodes.
So it has a bidirectional list for
all the leaf nodes, allowing
not going up and down all the time,
not starting from root for
each entry, right?
We just need the first entry.
And then I think it can be parallelized,
dealing with leaf nodes.
This is what I remember, I might
be slightly wrong about internals
here.
But again, index-only scan is the
best if you need a lot of rows.
For SELECT COUNT(*) you might
still see a sequential scan.
It's a good sign you have not tuned
random_page_cost, which we
recommend to everyone to do sooner
because later it will feel
more risky.
Michael: Scarier, yeah.
Nikolay: Yeah, scarier because you
have a heavy workload and changing
this might flip your plans in some
unpredictable manner.
So yeah, it's a big task, But if
you do it early, live with it,
it feels more reasonable than keeping
defaults.
Did you have cases when you see
this, okay, cost for sequential
scan is lower, that's why it's
chosen, But when you say enable
seq scan to off, giving some huge
penalty to sequential operations
in planner's mind, then you see
timing and buffer operations
so much better for index-only scan.
You think, oh wow, planner is so
wrong here.
You had it, yeah, I have it all
the time.
And this is a good sign we need
to change random_page_cost by
default in all databases, basically.
Michael: Yeah, I think we've discussed
this in previous episodes,
but in the past I've seen, like,
conservatively, people go down
to 2, but
more aggressively.
Nikolay: And you know the
reason, right?
I mean, ah, not 2, I was
thinking, why default is still
4?
You know, like why is it still
4, the default?
Michael: Yeah, I don't know.
I've seen argument, or I've seen
people mention that they don't
want, like people have old systems
and they don't want to change
those.
But yeah, I agree with you, but
that is the argument I've heard.
Nikolay: Defaults are applied
to new systems. New systems are on SSD...
Michael: I understand.
But that's what people say.
And I actually think also, even
if they didn't, like, It's not
about old systems versus new systems,
it's about SSDs versus
spinning disks.
It's about disk type, mostly.
Most people aren't setting up new
systems on spinning disks.
Like most people are setting up.
In that case, why are we forcing
the majority to have to change
their default instead of like default
should be for the majority?
Yeah.
Nikolay: it's second place after
JIT, or maybe default shared
buffers.
Third place.
Third place goes t
random_page_costs.
Let's have maybe some chart of
defaults we don't like, we would
like to be changed.
By the way, things are changing
slowly.
Yesterday I learned that pg_dump
is not a backup tool anymore.
It's off topic, but it's very similar.
So many years of discussing, it's
like you're swimming against
the current all the time.
You say something, but you open
documentation and it says this,
and you know every experienced
guy says this, but still...
So defaults are similar.
So Postgres until version 18, in
documentation stated, pg_dump
is a backup tool.
Like, first sentence about pg_dump.
And finally, like 5 days ago, Peter
Eisentraut committed the
change.
pg_dump is a tool to export PostgreSQL
databases.
Michael: Yay!
Nikolay: Not backup.
But this will change only in a
year, in PostgreSQL 18.
But it's already a win, like so
good.
I hope defaults also, like they
are slowly changing.
Log checkpoint was on, work_mem.
Yeah.
Is
it?
Michael: hash_mem_multiplier has been, like
work_mem's been increasing very
slowly over the years.
hash_mem_multiplier doubled in like
not that long ago.
Like there was...
Nikolay: Autovacuum default throttling
10 times.
But it was many years already ago.
Yeah, slowly it's changing, it's
good.
So random_page_cost, most people
say, decrease it.
1.1 maybe.
You said some guys decide to go
with 2.
Well, I can say some guys decided
to put 1 there, but put 4 to
seq_page_cost.
Michael: What?
Nikolay: Yes.
Michael: Okay.
So I've seen sensible people that
have done testing, I've seen
set anywhere between 1.1 and 2.
And I tended to go closer to 1.1
if the performance results.
Yeah, I
think that's really cool.
Nikolay: Crunchy, after we did
the episode about something, They
tested it and published some details.
Maybe we'll find them a touch.
Their benchmarks showed that 1.1
in their case, I think it's
AWS and GCP or something, right?
It's still In their case, they
found out that 1.1 is slightly
better than 1.0, according to some
benchmarks.
Not mine, I don't know.
So since then, I also kind of put
1.1 before I tried to put
1.0.
So I don't know.
1.1 is a good start, right?
Enough maybe here.
Michael: Let's go through a few
other recommendations.
I think this is 1 recommendation
of something that you can do
to encourage index-only scans or
recommendations that are around
index-only scans.
We did a whole episode, for example,
on over-indexing, and I
think that's relevant here.
Nikolay: Before we go to over-indexing,
1 second.
It's super important to tune to
autovacuum because otherwise heap
fetches will be super high and
you need to, you need to, autovacuum...
autovacuum is not only about vacuuming
or collection of statistics
or fighting with transaction and
MultiXact ID wraparound.
It's also important because it
maintains visibility maps.
Michael: Yes.
And it's the only thing that maintains
visibility, like the only
thing that can set those bits
Nikolay: Directly, yes, yes,
directly.
Michael: I think this is important
because any, let's say we've
got a page in Postgres that has
20 tuples on it.
Like really common to have like
a couple of dozen of tuples on
a page.
If any of those rows are changed,
the visibility of that row,
like the bit gets-
Nikolay: It's out of the game.
Yeah, this page-
Michael: Yeah, it goes from 1 to
0, right?
It goes from all visible to not
all visible, or not guaranteed
to be all visible.
Nikolay: No, it's not all visible,
that's it.
We don't know, but it already cannot
be considered all visible,
that's it.
Yeah, technically, these tuples
might be visible to everyone.
Autovacuum just didn't mark it.
Michael: Didn't get there yet.
Nikolay: Yeah.
So autovacuum needs to do
this as well.
That's why frequent visits of tables
are so important.
Not only because of deleting that
tuples.
Michael: Yes.
Now is, I guess it is possible
that you will never benefit from
index-only scans if you have a table,
but maybe like a small table
that's constant, that the rows
are constantly being updated.
But if you, most of us have tables
where, or at least like the
cases we're thinking about here,
you have like some data that's
pretty old that doesn't change
often that we do still want to
read.
And like in those cases, as long
as you can, yeah, tuning
autovacuum super valuable for maintaining
the performance of index-only
scans.
Is there anything else around that?
I actually shared just recently
in my newsletter an old post
by Tomas Vondra.
I thought I revisited it.
He's updated it for the latest
versions of Postgres.
I'll share it again because I think
it's so good
Nikolay: Oh yeah, yeah, I also saw it.
That's great.
I actually expected this because
after they acquired the second
quadrant, they broke that block
and some links didn't work.
And I used archive.org to fetch
some old pages.
It's great.
These posts needed to be updated.
I also think they should be part
of documentation, honestly.
Michael: Nice.
Well, high praise.
Nikolay: Yeah.
So, but documentation tends to
avoid good how-tos and so far,
only basics.
There are how-to notes in documentation,
but they are scattered
among general documentation, reference
style, and so on.
I wish documentation grew like
a huge how-to section, but it's
a different story.
But let's not lose this important
point in this context.
So autovacuum maintains visibility
map, which has this all visible.
It also has all frozen, but that's
a different story.
All visible, 2 bits for each page.
And if we have a lot of writes,
like many writes are happening,
many pages are having 0 for all
visible bit.
And what's important here to understand
that partitioning is
so good.
Like if you just consider unpartitioned
one terabyte table and
partitioned one where old partitions,
like most of writes go to
the latest or few latest partitions
and old partitions are like
they can be, they can receive writes
but it happens rarely.
Yeah.
And vacuum aggressively fairly
frequently visits such pages.
Oh, okay, some write happened in
old partition but I immediately
mark this page all visible, and
it's only 1 page in this partition.
That's why, like, archive, archive
data locality and archive
kind partitions, old ones and only
new or the latest receives
all inserts, some updates, they're
usually for fresh data.
It means this, like old partitions
most of the time will be all
visible for all their pages.
This is so good if massive reads
are needed for all data.
Without partitioning, any page
can suddenly receive a write,
even if all rows in this page are
5 years old.
And then suddenly, oh, there is
some place here.
Let's insert a new tuple.
And it’s out of...
So everything is shuffled, right?
New data and old data, they can
live in the same page.
And for visibility map, it's kind
of a lot of work for autovacuum,
and visibility map is most of the
time kind of in bad shape.
So what I'm trying to say, partitioning
is needed not only because
indexes become smaller.
This reason is, over years I realized,
is one of the smallest reasons
for me.
Because B-tree height grows very
slow, you know.
But this particular thing, this
may be number one reason, since
recently I realized it.
Michael: Wow,
you think so?
Nikolay: For partitioning, like,
we can say, okay, This partition
is barely touched in terms of writes.
We just read it because people
want to see archived data.
But reading is well organized most
of the time.
If it counts or something, It's
index-only scans with heap fetches
very close to zero.
Without partitioning, impossible.
I mean, and less predictable also.
Michael: Yeah, I mean, in the case
you're talking about where
it's append mostly and it's to
the right of the...
I've definitely seen cases where
you don't touch old data anyway,
whether it's partitioned or not.
But I definitely see this as yet
another benefit of partitioning.
I'm surprised you said it's like,
you see it as the biggest,
but that's really cool.
Nikolay: Well, actually, I think
if we, for example, decide to
create additional indexes to support
index-only scans, and we
will talk about methods right now,
right?
But in the index-write amplification,
I know we have a plan.
But if you, for example, decide...
usually we keep the same sets
of indexes on all partitions, right?
But technically, you can create
an index for a particular partition.
And if, for example, you created
it for archive-style partitions
to support index-only scan, to
prioritize them for reads, and
you maybe don't care about write
amplification, because again,
these partitions barely receive
writes, and HOT updates also
we don't care.
We care about...
So we might decide to create more
indexes for all partitions
to benefit from...
Michael: Yeah,
I've not seen that but
it's a cool idea.
Nikolay: It's just like, I don't
know, maybe it's a crazy idea,
maybe we'll have some walls hit
if we implement it.
But it seems reasonable, and for
fresh partitions, I never did
it.
For fresh partitions, you avoid
creating these indexes because
you want HOT updates occurring.
So, right?
Yeah.
Let's maybe dive into details a
little bit.
Michael: I yeah, well, we could
point people towards the we did
a whole episode on over indexing
and we did another 1 on HOT
updates.
So let's point people at those
because I think there's a couple
more like definitely want to get
your thoughts on the difference
between multi-column index and
include like when you use each
or...
Nikolay: when it was released again
like 11? 12? 10? 11? I don't
remember.
Some many years, already
many years ago, and I was
thinking, is it only for unique
indexes?
Honestly, I remember I had
Anastasia Lubennikova presented on
Postgres TV details, some details.
Yeah, or maybe it was an interview
because she participated in
this include keyword covering indexes.
So my understanding, we only need...
It might be a size difference also,
right?
Because if we want to put in a
new column to have multi-column
index, 1 more column we just included,
it will participate in
structure, right?
But if we say include, it's just
additionally stored in leaf
nodes, right?
Michael: Leaf page, exactly.
Nikolay: Yeah.
And it means that kind of for to
achieve index-only scans, both
cases work, but the first case
can be also used since it's in
structure, it can be used to verify
uniqueness.
Or maybe also constrain exclusion.
Exclusion constraints, right?
Maybe.
For range types.
It's a different story, but definitely
for uniqueness.
I now realize that actually we
cannot put some data type, columns
of some data type to multi-column
index, but we can put them
to include.
This is also a benefit of include.
So unique constraint can, like,
we don't want to put index because
it will break the logic of uniqueness,
right?
But we can put additional column
to include and have index-only
scan, not changing logic of unique
constraint.
But also to include, we can put
data types which are not B-tree
friendly, which are not supported
by B-tree.
Michael: Like not...
Nikolay: Arrays, for example, right?
Michael: Yeah, yeah, yeah.
Nikolay: Polygons or something, like points.
Michael: Although you do need to
be careful with size, right?
Nikolay: Of course, yes.
Of course.
Size matters.
But imagine we have like point, 2 numbers,
right?
So we can put it to include and
the index-only scans will happen.
Michael: I've got 1 more thing
that I wanted to make sure we
mentioned, which was something
practical and very beginner-friendly.
In fact, probably the main thing
that I see stopping index-only
scans in the real world, which
is people selecting more data
than they need.
And I know it's kind of obvious,
but it's really common in ORMs
to have SELECT * be the default,
which is almost always going
to be giving you more columns than
you actually need for that
specific query.
But I see people doing it, like,
people deleting features, for
example.
Like, changing how a page is structured
in an application.
They no longer need all the data
from the queries returning,
but don't change the query.
There are so many ways it can creep
into people's code that they
no longer need all the data they're
returning.
So being careful on that front
and also just remembering that
there are these optimizations if
you can be, even if the application
does currently use all of the columns,
do we need to like what
value is that adding at the moment?
Can you make things much faster
by not including that information
at that point.
Nikolay: Right.
Yes.
And finally, maybe let's mention
also HOT updates.
I already kind of mentioned that
indirectly, but directly, like
if you have, if you need to put
extra, 1 more column to the index,
either to...
Doesn't matter if it's to extend
this multi-column list or to
include, to have coverage for your
selects.
You can lose the HOTness of some updates,
right?
Because even if it's include, still,
if this column is updated
frequently, these updates cannot
be HOT.
They will be much slower.
They will be regular updates.
So it's better sometimes to avoid
this.
So this trade-off here, you want
select only scans but you lose
HOT updates.
What's needed?
Selects or writes, I mean updates.
Michael: Yeah, I know this is very
off topic, but do you know
heap-only tuple updates are no
longer heap-only in some cases?
Like in BRIN, there's some changes
to BRIN indexes.
Have you seen this?
And now, Yeah, now it updates the
BRIN index.
I thought that was funny.
Nikolay: Yeah, yeah.
Okay, so I think we've covered enough,
right?
Michael: Yeah, 100%.
Nikolay: For iOS.
When I hear iOS, I think index-only
scans, right?
Michael: That’s so funny.
Are you on Android or iOS?
Nikolay: I'm on iOS.
My phone, I mean.
Michael: Index-only scans.
Cool, cool.
Nikolay: Yeah.
Yeah.
OK.
Good.
See you next week.
Bye.
Michael: See you next week.
Bye.