A weekly podcast about all things PostgreSQL
Nikolay: Hello, hello, this is
PostgresFM episode 76.
My name is Nikolay and as usual,
together with me is Michael.
Hi, Michael.
Michael: Hello, Nikolay.
Nikolay: So, today's topic is super
interesting and very popular,
right?
Michael: Not exactly.
I chose this one.
Nikolay: Okay.
Michael: Therefore, it's boring.
Nikolay: But we will learn something
new, which is good.
Michael: I hope so.
Nikolay: Even if it's useless,
it's still new.
We're here to learn, right?
Michael: So Nikolay's teasing me
because I've picked the topic
hash indexes, which are, well,
until relatively recently, were
highly discouraged by the Postgres
docs and not very sensible
to use, but-
Nikolay: until PostgreSQL 10.
Michael: Yes, but have been in
Postgres for a long, long time.
I looked into the history of it
and therefore some people must
think they're useful.
And I definitely think there's
one or two use cases for them, but
more to the point, I think they're
very interesting for people.
I think this is the kind of thing
that will make you appreciate
the B-tree index as well, and also
understand a little bit more
about trade-offs we make when we're
choosing things within Postgres.
Nikolay: Indexes, right?
I mean, why doesn't Postgres choose
indexes, index type for us?
We just want to make it fast and
that's it, right?
It should choose.
Okay, a different question.
Are hash indexes more useful than
the money data type?
Michael: Oh, I think so, yes.
Nikolay: Okay, this is already
better.
I hope we will never have an episode
about the money data type.
Michael: Yeah, me too.
Nikolay: Okay.
Michael: So yeah, should we start
with what a hash index is?
Nikolay: Yeah, hash function, hash
index, right?
Michael: Yeah, that's pretty much
it.
Hash table.
Yeah, so in fact that's a good
point.
A lot of people have come across
hashing from, well, looking
at query plans for hash joins or
backend engineers, very familiar
with hash tables, can be a very
efficient way of taking a value,
hashing it, in this case we get
a, well, Postgres behind the
scenes is calculating a 4-byte
integer, storing that, and then
in the future if we want to look up
the value by equality and equality
only, we can hash the value we
seek and look it up in the index
in this case.
So people that like theory talk
a lot about big O notation, don't
they?
So this is one of those things that
I struggle to love, but it has
a low like O(1) is what people often
say.
I'm not sure it's strictly true,
depending on how deep down that
rabbit hole you dive.
But yeah, it can be very, very
efficient for looking up things
by equality, especially in certain
conditions.
So that's like, actually a couple
of things on details.
I didn't realize you could...
And it makes sense, but you can
use a hash index on any value,
any data type, which is quite cool.
I guess the same is true.
Nikolay: Postgres has a lot of
internal hashing functions for
all data types and also for the
record.
So you can say record for some
table and it will give you a hash
integer 4, right?
Regular integer 32 bytes or bits.
Michael: 32 bit, 4 byte, yeah.
Nikolay: Yeah, and I actually recently
rediscovered it and used
it.
It was interesting.
I didn't use the hash index, but
I used one of, probably, hash
text function.
Yeah, it was hash text function.
I needed to, you know, when you
need to reindex a lot of indexes,
and it's a huge database.
For example, after upgrade to Postgres
13 or 14, 14 especially,
you need to re-index a lot of bit
re-indexes to start benefiting
from the optimizations Peter Geoghegan
and others implemented.
And obviously you need to re-index.
You will do re-index concurrently
and so on, it's good, but the
question will be how to move faster.
There are parallel workers, as
we discussed, and so on.
But probably you will decide to
use multiple processes to issuing
the reindex concurrently command.
And in this case, if two of them
try to reindex two different indexes
which belong to the same table,
you will have a deadlock.
So my idea was we just need to
attach each table associated with
a particular worker.
And to do it, I just took a table
name, calculated hash text
from it, and then modulo number
of workers.
If I have, for example, 5 workers,
I just...
So we have table name, hash text
function, the hash text function
produces number, like integer 4
number, and then we just modulo
of 5, like remainder of the division,
it will be like 0, 1, 2,
3, 4, and that's it again, 0, 1, 2,
3, 4.
And each index associated with
a particular table will always go
to a particular worker.
So there will never be a conflict,
no more deadlocks.
So this is how you can redistribute
work.
Maybe also check if you need
to redistribute, you need to check
a lot of indexes for corruption.
You can also use many like 16 workers,
for example, and you redistribute
work to avoid deadlocks.
So, hash text is useful.
I was already like CRC32 or something,
I was thinking, oh, I
need to, oh, by the way, I asked
my bot to advise me, and it
says, oh, there is hash text.
I never used it before, or maybe
I just forgot.
And yeah, then I, if you're in
psql, if you say backslash df,
then hash star, you will see more
than 50 functions.
We checked it before the call.
More than 50 functions, many of
them return int4, some of
them return int8.
So it's good to have a lot of already
implemented functions.
You don't need to care, just use
it.
Good?
Yeah, absolutely.
Yes, Small comment, slightly off
topic.
Michael: Yeah, I mean related to
hashing, but I guess completely
unrelated
Nikolay: to hash indexing.
Michael: Yeah, exactly.
So you mentioned already, there's
a big change in the history
of hash indexes I think is worth
mentioning, and that's the version
10 made hash indexes crash safe
or write-ahead log.
Which is really important, like
before that it meant if you had
like, it made them basically unusable
or unsafe to use, especially
in a high availability setup if
you had replication and then
failed over, your replica wouldn't
have that index.
Nikolay: Ephemeral indexes.
Michael: Yeah, right.
Nikolay: But not anymore, so it's solved.
The same was solved with GIST a very
long ago.
Originally GIST was also not write-ahead log-supported
at all, but it was
implemented like 20 something years
ago.
Michael: Speaking of 20 something
years ago, I was looking up,
I was trying to find out when hash
indexes were added, and the
git history doesn't go far enough
back, I don't think, and the
docs, the online docs 7.2 is the
oldest version on the online
docs and they have hash indexes.
So that's as far back as 2002 already.
So they're more than 20 years old
as well.
Nikolay: Well I'm quite sure they're
older because it's maybe
one of the simplest types of index
you can implement.
But interesting that WAL was not
implemented for so long, right?
Right.
Until 2017, right?
Michael: I think you'd struggle
to get...
I could be wrong, but I think if
we didn't have hash indexes
in Postgres today, they wouldn't
get added and they would be
encouraged to be added via an extension.
That's my guess as to how they
would be implemented if they were
done today.
But I mean, I might be, or maybe
via the contrib modules.
Nikolay: Well, yeah.
Michael: But we have them.
They're first class supported.
They're well logged now.
They're replicated and everything.
Crash safe, even while they're
doing complex operations, it might
get to a bit later, it will recover.
So we can use them if we want to.
I guess the question then is why
would we?
And you mentioned something before
the call I found very interesting
and it's probably very telling.
But should I ask you, have you
ever used the hash index?
Nikolay: Only in experimental environments,
never on production.
I think I saw them in the wild,
you know, saw them.
I don't think I ever seriously
considered them.
But you need to take into account
that version 10 and later,
it's already mostly my consulting
practice.
Before that, I created a lot of
systems, social networks and
so on.
It was before Postgres 10, so I
never considered them.
My memory says stay away from them.
Michael: Yeah, that's a really
good point.
So it's only since 2017 that you
should even be considering using
these.
So I'd like to first cover, I know
there's going to be quite
a few limitations that we need
to talk about, but I'd like to
cover some of the potential benefits
of hash indexes.
Like I think there's a couple that
are worth mentioning.
Nikolay: Size?
Michael: Yes, So they can be, so
for, imagine if you are hashing
a fairly large string or even a number,
or just a piece of data
that is more than 4 bytes.
When you are indexing it with a
hash index, it's only having
to store 4 bytes, or a bit more,
but it's not having to store
that value in the index.
And that means for larger data
types or larger values, it can
be significantly smaller than a
B-tree index with some caveats.
Naturally, there are some optimizations
for B-tree that we've
got in recent versions.
And yeah, this really stands
out for larger values basically
and for indexes that are unique
or mostly unique, like, oh sorry,
for columns that are unique or
mostly unique.
So that's a big difference as well.
But there are cases like that,
right?
Like I did a collaboration with
Haki Benita for his blog and
he did a lot of the work.
He deserves most of the credit
but he put me down as a co-author
which was kind of him and we looked
at quite a few things and
for the use case that he had for
hash indexes which was a URL
shortening service you get some
quite large URLs can be quite
large strings in the grand scheme
of things, especially for URLs.
So they actually came out quite
a lot smaller when you put a
hash index on it versus a B-tree,
which is pretty cool, Even
post deduplication, because these
are largely going to be unique.
We didn't make them completely
unique, but largely your URL shortening
thing is going to be unique strings.
So yeah, they can be significantly
smaller, they can be significantly
bigger, depending on your data.
But I think size is underrated
in terms of index types.
I think not only we're going to
look at performance in a bit,
but base and the cache, you know,
we've talked about this kind
of thing before, but I think it
is worth mentioning.
Nikolay: Yeah, by the way, if we
try to index in a regular way
with B-tree some large text values,
there's some limitation,
I always forget.
Michael: Oh, good
Nikolay: point.
Yeah, so sometimes we just need
to apply hash and use expression
B-tree on top of it, but it's not,
of course, it's not a hash index,
but we use a combination, right?
In this case, we cannot order these
values, but for exact matching,
this is what can work well.
Michael: Why did you, why do you
hash it and then put a B-tree
index on instead of just putting
a hash index on?
Nikolay: Because again, I'm an
old guy.
You know, like again, my mom says
stay away from hash.
By the way, with my bot and GitHub,
we just checked when hash
build, the hash build function
was added and it's 27 years ago,
original Postgres 95.
Michael: 27?
Nikolay: Yeah, original source
code of Postgres 95.
Michael: Wow.
Nikolay: Committed by Mark Fournier.
It had already this hash build,
which builds a new hash index.
So this is super old stuff.
I think maybe it even came from
original Postgres.
Obviously, it came from original
Postgres, maybe even from Ingres,
who knows.
So maybe it's even more than 30
years ago.
Michael: Yeah, very cool.
But yeah, that's another advantage,
right?
There's no limit to the size.
Nikolay: So you think, yeah, that's
great.
So you think I just can create
a hash index if I have very large
text values?
Michael: You're losing the same
things, right?
You can't order by them.
And you're kind of doing a self-rolled
one, which maybe even proves
that we don't need them.
But I think less footprint, right,
because you're having to...
Yeah, I'm not sure.
I think there might be some weird
trade-offs around maintenance,
but I think you'd largely recreate
them, yeah.
Nikolay: So if I say create table
T something as select and then
I say repeat as column C1 and I
repeat some letter a million
times, I have obviously a text value,
a million letters, when I
say create index on this value,
on this table, on this column,
in this case, B-tree will say index
row requires 11,464 bytes, so
11k, right?
Mm-hmm.
Ah, in my case.
But maximum size is 8K, okay, like
block size.
But if I say using hash, right,
using hash, same column, create
index, yeah, it works.
So why do I, okay, I'm an old guy,
I have old habits, I need to get
rid of it.
Actually, this is the perfect case.
Michael: Or consider it.
Nikolay: Large text values, I don't
need to hash first and then
B-tree, I just use a hash index in this case.
I will need to think about it.
Thank you.
This is unexpected.
Michael: There are some reasons
you might not
Nikolay: want to
Michael: do this.
Nikolay: Why?
Michael: Okay, well, let's go through
the positives first.
I promised positives.
Size can be, it can be worse, but
it can be a lot better.
Speed can be better, not just lookups,
but also the average latency
of inserts.
I definitely had some discussions
with Haki when we were reporting
these numbers that we could have
done a better job, I think.
I hold my hands up, this was a
couple of years ago, and I've
learned a lot since then.
But insert performance on a batch
was, or when we were inserting
1 row at a time, was about 10%
slower, I think, for the B-trees
than versus hash, which I was a
bit surprised by, because I knew
there was some overhead around
splits for hash indexes, or higher
overheads some of the time.
But lookup performance...
Nikolay: B-tree also has splits sometimes,
right?
Michael: Yeah, of course.
Nikolay: So we need to compare.
In this case, we can't say hash
index has this disadvantage and
that's it, right?
B-tree also has this disadvantage.
Michael: Yeah, so they're kind
of bigger splits less often.
Yeah, that's a good point.
But lookup speed, I was expecting
a difference and admittedly,
they're both really fast, right?
We're talking about basically key
lookups.
So B-trees, we've always raved
about how good they are at like
key lookups, right?
Nikolay: That's what- Yeah, just
a few pages to find a proper
reference to, like it's just a
few blocks to fetch from disk
or to read from page cache.
Michael: I think we only did about
100,000 lookups and it was
both of them were under a second
to do 100,000 lookups.
Nikolay: So you haven't checked
shared buffers at that time?
Only timing, which is quite volatile.
Michael: Yeah.
Again, I would do some things differently
now.
This is a couple of years ago.
But the lookups, whilst they were
both under a second, I think
it was something like 30 or 40%
faster for the hash lookups,
which was, I thought, actually
quite significant.
So I think there are some cases
where in some extreme, if you've
got long strings and you only care
about uniqueness lookups,
then I think there's a case for
using them.
But that is obviously a very narrow,
very specific use case.
Nikolay: I wonder in these tests
if you considered the planning
time, because I guess if table
is not absolutely huge, like billions
of rows.
In this case, planning time can
be maybe even a bigger contributor
than index scan itself.
So we need to exclude it using
maybe prepared statements and consider
only execution.
Michael: That would be a good test,
but also why would the planning
time differ between index types?
Nikolay: I'm not saying it differs,
I'm saying if you're comparing
these latencies, what percentage
of planning time is there here?
So we need to exclude it to compare
clean numbers, right?
Michael: Yeah, that'd be great.
I don't think we used prepared
statements.
I'm pretty sure we didn't.
Nikolay: And honestly, I wouldn't
look at timing here at the
first, like this first metric.
I would still prefer looking at
shared buffers.
But okay.
Michael: But the point is they
can be more efficient and that's
partly because they're, if you
imagine a B-tree structure, you
might have to go through
Nikolay: a few
Michael: hops, especially once
your table gets really large.
Now we've talked about partitioning
so many times, right?
So you could argue that you could
keep your B-trees relatively
small in the grand scheme of things.
But if we do get large enough,
the hash structure is just so
much more efficient and that will
show up in shared buffers as well.
So that is the argument, I think.
So you've got smaller, potentially
smaller, indexes for certain,
you know, for highly unique, slightly
larger data types and potential
benefits on the insert and lookup
speeds.
Nikolay: Have you considered collisions?
Because if you have only integer
four, obviously you might have
collisions and you need to resolve
this additional hops, right?
Additional comparison and so on.
Have you considered trying to measure
some edge cases?
Michael: I looked into this, I
looked into the internals a bit.
Well, actually there's a couple
of really good internals pages
on the PostgreSQL docs on hash indexes.
And it's only a throwaway word
quite early on in one of them that
mentions that they are a lossy
index type.
So it's not the only lossy index
type, but it means that just
because we get a match for the
hash, the 32-bit integer, doesn't
mean we do actually have a hit.
We could have a collision.
So there is a recheck, and you
can get rows removed by index
recheck type things in your explain,
but I haven't seen it.
Like we're talking, but you'd have
to have, well, I'm not sure you'd have to have billions,
but I'm guessing you'd start,
you know, in the tests I've done,
I haven't seen any.
But I guess you'd pay that overhead
for every lookup, right?
Like you're having to do that recheck
all the time because they're
lossy.
It's not just when there are collisions
that you pay that recheck.
So those numbers I was talking
about include the overhead of
rechecks.
Nikolay: Okay.
So what are the downsides?
Why shouldn't I
Michael: use- So many.
Nikolay: So many, okay.
Index only scan, as I remember,
right?
Yep.
Lack of
Michael: index only scan.
Because we don't have the value
in the index, it's literally
impossible to have an index only
scan.
Right.
Nikolay: And that's bad.
Yeah.
Right.
And that's bad.
Yeah.
Yeah, so.
But if I, in my case, if I have
large text value, and I consider
B-tree versus hash
expression.
Michael: You can't have an index
only scan there either.
Nikolay: Yeah, because I cannot
fetch the original value.
So it's not a reason that would
stop me from using hash index.
Okay.
Right.
Michael: In fact, if you consider
collisions for that.
Nikolay: Yeah, well, yeah, yeah.
Michael: You have to implement
it yourself.
Nikolay: That's true, yeah.
Michael: So I think the big one is
ordering.
Obviously not in the case you're
talking about where you're hashing,
but the big advantage B-trees have
is they order your data and
that supports so many different
things.
Nikolay: And greater than or less
than comparison, obviously.
Yeah,
Michael: range scans, yeah, it
feels like a limitless number
of things that that then enables.
Nikolay: Right, you can deal with
collation issues, it's so cool.
They can be corrupted because of
collation changes.
Michael: I mean that's a good point.
Maybe hash indexes are less likely
to get corrupted.
Nikolay: Right.
Okay.
Michael: Actually, what if the
hash function created a different
value instead?
Nikolay: I don't think it's a good
question.
I don't know how it's implemented
internally.
It should not be super difficult.
But obviously, how will it hash
non-latin characters and so
on?
Michael: Yeah, exactly.
Does it depend
Nikolay: on glibc version change,
for example?
It's an interesting question.
I never had this question because,
again, I'm not using them
in the wild often at all.
Michael: Another huge difference,
and I don't think this is a,
I don't think this is necessarily
based on it's, I don't think
this is like the index only scans
where it's literally impossible,
but you, in Postgres, we can't
use hash indexes to enforce uniqueness.
So for unique constraints, which
is a big deal.
Like, that's a really, like, for
example, we can't use them for
primary keys.
Or like, there's no point because
we've already got a B-tree
index on them.
Nikolay: Okay.
Good to know.
What else?
Multicolumn, right?
I remember now.
Yeah,
Michael: Exactly.
So, hash indexes don't support
multicolumn.
Nikolay: You can combine values,
so you can, I don't know?
Michael: But more to the point, one of
the nice things about multi-column
B-tree indexes is you've also got
the sorting sorted by the first
column and sorted by the second.
So we just don't have advantages
like that.
And it ties into the index-only
scans as well.
One of the best things about multi-column
indexes is we then get
potential for index-only scans
as well.
So these things, a lot of them
are interplay.
One difference that isn't necessarily
an issue, but I found interesting,
I thought you might as well, was
that the fill factor for hash
indexes is 75 by default, whereas
B-tree is 90.
Really interesting.
Nikolay: Some old decision, I guess.
Maybe not, but maybe, I don't know.
Yeah,
Michael: Interesting.
Well, I guess, like, in the blog
post, we looked at what does
it look like if you change the
fill factor to 25, 50 or 100.
And at 100 versus 75, you just
see, you see it not getting larger
as quickly, but when it does, it
bounces at a faster rate.
So as the index grows, it grows
less at first and then more all
at once.
So my guess is it's some trade-off
between general size and performance
and the impact of splits or the
cost we pay at splits.
And in fact, this is probably the
biggest one you'd want to consider
on the difference between doing
a B-tree index on a hash function
versus using a hash index.
And that's how insert performance would look in
terms of probably not average latency,
but maybe the standard
deviation of latency.
So with hash indexes, the way they're
structured, we have buckets
that values can go into, but beyond
a certain size, Postgres
works it all out, and it does all this
for you.
It decides, oh, we need more space.
It'd be best to have another bucket.
Let's split one of the existing buckets.
And then we pay that overhead during
the insert, right?
Like that, it can slow down your
insert.
So my guess is the standard deviation
for inserts would be perhaps
significantly higher than B-trees
but I haven't checked.
It's another thing I would like
to do.
Yeah.
So on high throughput, there is
a note in the internals document
about high throughput.
Let's say you've got a really high
throughput table, you might
actually be better off with your
B-tree index that you mentioned
than a hash one.
Nikolay: Okay.
And multi-column index, I think
we can, I've just checked by
the way, hash index applied to
whole table row.
It works.
Produces interferer number, it
works.
I think we might probably decide
to use it to, I remember I was
always using MD5 hash, converting
row first to text and then
MD5, but maybe hash index is a
better choice when we need to,
for example, compare the content
of two snapshots, row by row,
you know, like to find which rows
were changed, comparing them,
like joining by ID, by primary
key, for example, and then we
compare hashes of whole row, right?
In this case, hash index is working.
I wonder if, like, if we, okay,
multi-column index is not supported.
What if we just combine multiple
columns and produce hash out
of them and almost the same, right?
It's strange that it's not supported.
Michael: I mean, we can still only
use uniqueness lookups, right?
So I guess it's the case where
you want to see if two columns when
combined are equal to the record
you've stored, yeah?
But I'm not seeing the same benefits.
Nikolay: Okay, I will check a couple
of things more.
Any more limitations or that's
it?
Michael: I don't think so.
I might have missed one.
But those for me are substantial
limitations.
And I think if you're, like when
you're, imagine when you're
designing a system where you're
choosing an index type for a
new feature building, and it turns
out that right now, the hash
might just about come out on top.
Maybe you get slightly smaller
index, maybe your lookups are
slightly faster.
I'm not sure I'd still advise using
a hash index.
I think I might say, look, if your
requirements change in the
future, if you ever do need to
look up like a range of these
values, if there's some analytics
you might need to do on it
or something, you might have been
better off with a B-tree index.
You've got that extra flexibility.
You've also got, I think, probably
more effort being put into
optimizations on them going forwards.
If it's a tight call between them,
I might still encourage people
not to use them in general.
Nikolay: I never did it, I just
did it.
I used hash record function, applied
to the whole row in a table,
which has 3 columns.
And then I created an index on top
of this, that hash record.
So, for the first time, I created an index
not specifying columns, but
involving all columns in a table.
This is super strange.
It could be bit-free actually as well because the hash record produces
an int4, right?
So it applies to all columns.
So I said, create an index on T1
using a hash of hash record of
T1.
The whole record.
That's super strange.
I need to think
Michael: about it.
What use case are you imagining
for this?
Nikolay: I don't know.
I'm just exploring, you know, like,
curious.
I don't know, maybe like, again,
like to track which records
changed content or something.
I don't know.
I don't know.
It's just interesting that it works.
I thought we were required to specify
columns, right?
When we like this column, that
column, but here you just specify
the whole record and it works.
This shows the flexibility of Postgres
as usual, right?
But I created a multi-column index
hash, but it's a hash of hash.
So double hashing here.
Okay.
So I don't know.
I'm still thinking about use cases.
I still only, like, a realistic one
is only this, like, really large
text values.
That's it so far for
Michael: me.
Yeah, and when you only need unique
lookups of them.
Nikolay: Yeah, so yeah, interesting.
So I will keep it in mind.
Michael: Yeah, I love that we've
gone, during this episode we've
gone from I never have use cases
for hash indexes.
Yeah, exactly.
There we go.
Cool.
But I also think, you know, you
said you've seen them in the
wild.
I think that might be a result
of people overthinking it.
And I was there a couple of years
ago.
I was thinking, you know, there
might be some cases where that
the difference is meaningful and worthwhile,
but for the flexibility,
I think.
Anyway, I've said that a few times
now.
Cool.
Well, thank you, Nikolay. Thanks
for indulging me.
Hopefully a few people found that
interesting and useful, and
yeah, catch you next week.