Postgres FM

Nikolay and Michael discuss hash indexes in Postgres — what they are, some brief history, their pros and cons vs btrees, and whether or when they recommend using them.

Update: the idea Nikolay mentioned at the end of this episode turns out to be a little fraught (and as such, inadvisable). 
 
Here are some links to things they mentioned:

~~~

What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!

~~~

Postgres FM is brought to you by:
With special thanks to:

Creators & Guests

Host
Michael Christofides
Founder of pgMustard
Host
Nikolay Samokhvalov
Founder of Postgres AI

What is Postgres FM?

A weekly podcast about all things PostgreSQL

Nikolay: Hello, hello, this is
PostgresFM episode 76.

My name is Nikolay and as usual,
together with me is Michael.

Hi, Michael.

Michael: Hello, Nikolay.

Nikolay: So, today's topic is super
interesting and very popular,

right?

Michael: Not exactly.

I chose this one.

Nikolay: Okay.

Michael: Therefore, it's boring.

Nikolay: But we will learn something
new, which is good.

Michael: I hope so.

Nikolay: Even if it's useless,
it's still new.

We're here to learn, right?

Michael: So Nikolay's teasing me
because I've picked the topic

hash indexes, which are, well,
until relatively recently, were

highly discouraged by the Postgres
docs and not very sensible

to use, but-

Nikolay: until PostgreSQL 10.

Michael: Yes, but have been in
Postgres for a long, long time.

I looked into the history of it
and therefore some people must

think they're useful.

And I definitely think there's
one or two use cases for them, but

more to the point, I think they're
very interesting for people.

I think this is the kind of thing
that will make you appreciate

the B-tree index as well, and also
understand a little bit more

about trade-offs we make when we're
choosing things within Postgres.

Nikolay: Indexes, right?

I mean, why doesn't Postgres choose
indexes, index type for us?

We just want to make it fast and
that's it, right?

It should choose.

Okay, a different question.

Are hash indexes more useful than
the money data type?

Michael: Oh, I think so, yes.

Nikolay: Okay, this is already
better.

I hope we will never have an episode
about the money data type.

Michael: Yeah, me too.

Nikolay: Okay.

Michael: So yeah, should we start
with what a hash index is?

Nikolay: Yeah, hash function, hash
index, right?

Michael: Yeah, that's pretty much
it.

Hash table.

Yeah, so in fact that's a good
point.

A lot of people have come across
hashing from, well, looking

at query plans for hash joins or
backend engineers, very familiar

with hash tables, can be a very
efficient way of taking a value,

hashing it, in this case we get
a, well, Postgres behind the

scenes is calculating a 4-byte
integer, storing that, and then

in the future if we want to look up
the value by equality and equality

only, we can hash the value we
seek and look it up in the index

in this case.

So people that like theory talk
a lot about big O notation, don't

they?

So this is one of those things that
I struggle to love, but it has

a low like O(1) is what people often
say.

I'm not sure it's strictly true,
depending on how deep down that

rabbit hole you dive.

But yeah, it can be very, very
efficient for looking up things

by equality, especially in certain
conditions.

So that's like, actually a couple
of things on details.

I didn't realize you could...

And it makes sense, but you can
use a hash index on any value,

any data type, which is quite cool.

I guess the same is true.

Nikolay: Postgres has a lot of
internal hashing functions for

all data types and also for the
record.

So you can say record for some
table and it will give you a hash

integer 4, right?

Regular integer 32 bytes or bits.

Michael: 32 bit, 4 byte, yeah.

Nikolay: Yeah, and I actually recently
rediscovered it and used

it.
It was interesting.

I didn't use the hash index, but
I used one of, probably, hash

text function.

Yeah, it was hash text function.

I needed to, you know, when you
need to reindex a lot of indexes,

and it's a huge database.

For example, after upgrade to Postgres
13 or 14, 14 especially,

you need to re-index a lot of bit
re-indexes to start benefiting

from the optimizations Peter Geoghegan
and others implemented.

And obviously you need to re-index.

You will do re-index concurrently
and so on, it's good, but the

question will be how to move faster.

There are parallel workers, as
we discussed, and so on.

But probably you will decide to
use multiple processes to issuing

the reindex concurrently command.

And in this case, if two of them
try to reindex two different indexes

which belong to the same table,
you will have a deadlock.

So my idea was we just need to
attach each table associated with

a particular worker.

And to do it, I just took a table
name, calculated hash text

from it, and then modulo number
of workers.

If I have, for example, 5 workers,
I just...

So we have table name, hash text
function, the hash text function

produces number, like integer 4
number, and then we just modulo

of 5, like remainder of the division,
it will be like 0, 1, 2,

3, 4, and that's it again, 0, 1, 2,
3, 4.

And each index associated with
a particular table will always go

to a particular worker.

So there will never be a conflict,
no more deadlocks.

So this is how you can redistribute
work.

Maybe also check if you need
to redistribute, you need to check

a lot of indexes for corruption.

You can also use many like 16 workers,
for example, and you redistribute

work to avoid deadlocks.

So, hash text is useful.

I was already like CRC32 or something,
I was thinking, oh, I

need to, oh, by the way, I asked
my bot to advise me, and it

says, oh, there is hash text.

I never used it before, or maybe
I just forgot.

And yeah, then I, if you're in
psql, if you say backslash df,

then hash star, you will see more
than 50 functions.

We checked it before the call.

More than 50 functions, many of
them return int4, some of

them return int8.

So it's good to have a lot of already
implemented functions.

You don't need to care, just use
it.

Good?

Yeah, absolutely.

Yes, Small comment, slightly off
topic.

Michael: Yeah, I mean related to
hashing, but I guess completely

unrelated

Nikolay: to hash indexing.

Michael: Yeah, exactly.

So you mentioned already, there's
a big change in the history

of hash indexes I think is worth
mentioning, and that's the version

10 made hash indexes crash safe
or write-ahead log.

Which is really important, like
before that it meant if you had

like, it made them basically unusable
or unsafe to use, especially

in a high availability setup if
you had replication and then

failed over, your replica wouldn't
have that index.

Nikolay: Ephemeral indexes.

Michael: Yeah, right.

Nikolay: But not anymore, so it's solved.

The same was solved with GIST a very
long ago.

Originally GIST was also not write-ahead log-supported
at all, but it was

implemented like 20 something years
ago.

Michael: Speaking of 20 something
years ago, I was looking up,

I was trying to find out when hash
indexes were added, and the

git history doesn't go far enough
back, I don't think, and the

docs, the online docs 7.2 is the
oldest version on the online

docs and they have hash indexes.

So that's as far back as 2002 already.

So they're more than 20 years old
as well.

Nikolay: Well I'm quite sure they're
older because it's maybe

one of the simplest types of index
you can implement.

But interesting that WAL was not
implemented for so long, right?

Right.
Until 2017, right?

Michael: I think you'd struggle
to get...

I could be wrong, but I think if
we didn't have hash indexes

in Postgres today, they wouldn't
get added and they would be

encouraged to be added via an extension.

That's my guess as to how they
would be implemented if they were

done today.

But I mean, I might be, or maybe
via the contrib modules.

Nikolay: Well, yeah.

Michael: But we have them.

They're first class supported.

They're well logged now.

They're replicated and everything.

Crash safe, even while they're
doing complex operations, it might

get to a bit later, it will recover.

So we can use them if we want to.

I guess the question then is why
would we?

And you mentioned something before
the call I found very interesting

and it's probably very telling.

But should I ask you, have you
ever used the hash index?

Nikolay: Only in experimental environments,
never on production.

I think I saw them in the wild,
you know, saw them.

I don't think I ever seriously
considered them.

But you need to take into account
that version 10 and later,

it's already mostly my consulting
practice.

Before that, I created a lot of
systems, social networks and

so on.

It was before Postgres 10, so I
never considered them.

My memory says stay away from them.

Michael: Yeah, that's a really
good point.

So it's only since 2017 that you
should even be considering using

these.

So I'd like to first cover, I know
there's going to be quite

a few limitations that we need
to talk about, but I'd like to

cover some of the potential benefits
of hash indexes.

Like I think there's a couple that
are worth mentioning.

Nikolay: Size?

Michael: Yes, So they can be, so
for, imagine if you are hashing

a fairly large string or even a number,
or just a piece of data

that is more than 4 bytes.

When you are indexing it with a
hash index, it's only having

to store 4 bytes, or a bit more,
but it's not having to store

that value in the index.

And that means for larger data
types or larger values, it can

be significantly smaller than a
B-tree index with some caveats.

Naturally, there are some optimizations
for B-tree that we've

got in recent versions.

And yeah, this really stands
out for larger values basically

and for indexes that are unique
or mostly unique, like, oh sorry,

for columns that are unique or
mostly unique.

So that's a big difference as well.

But there are cases like that,
right?

Like I did a collaboration with
Haki Benita for his blog and

he did a lot of the work.

He deserves most of the credit
but he put me down as a co-author

which was kind of him and we looked
at quite a few things and

for the use case that he had for
hash indexes which was a URL

shortening service you get some
quite large URLs can be quite

large strings in the grand scheme
of things, especially for URLs.

So they actually came out quite
a lot smaller when you put a

hash index on it versus a B-tree,
which is pretty cool, Even

post deduplication, because these
are largely going to be unique.

We didn't make them completely
unique, but largely your URL shortening

thing is going to be unique strings.

So yeah, they can be significantly
smaller, they can be significantly

bigger, depending on your data.

But I think size is underrated
in terms of index types.

I think not only we're going to
look at performance in a bit,

but base and the cache, you know,
we've talked about this kind

of thing before, but I think it
is worth mentioning.

Nikolay: Yeah, by the way, if we
try to index in a regular way

with B-tree some large text values,
there's some limitation,

I always forget.

Michael: Oh, good

Nikolay: point.

Yeah, so sometimes we just need
to apply hash and use expression

B-tree on top of it, but it's not,
of course, it's not a hash index,

but we use a combination, right?

In this case, we cannot order these
values, but for exact matching,

this is what can work well.

Michael: Why did you, why do you
hash it and then put a B-tree

index on instead of just putting
a hash index on?

Nikolay: Because again, I'm an
old guy.

You know, like again, my mom says
stay away from hash.

By the way, with my bot and GitHub,
we just checked when hash

build, the hash build function
was added and it's 27 years ago,

original Postgres 95.

Michael: 27?

Nikolay: Yeah, original source
code of Postgres 95.

Michael: Wow.

Nikolay: Committed by Mark Fournier.

It had already this hash build,
which builds a new hash index.

So this is super old stuff.

I think maybe it even came from
original Postgres.

Obviously, it came from original
Postgres, maybe even from Ingres,

who knows.

So maybe it's even more than 30
years ago.

Michael: Yeah, very cool.

But yeah, that's another advantage,
right?

There's no limit to the size.

Nikolay: So you think, yeah, that's
great.

So you think I just can create
a hash index if I have very large

text values?

Michael: You're losing the same
things, right?

You can't order by them.

And you're kind of doing a self-rolled
one, which maybe even proves

that we don't need them.

But I think less footprint, right,
because you're having to...

Yeah, I'm not sure.

I think there might be some weird
trade-offs around maintenance,

but I think you'd largely recreate
them, yeah.

Nikolay: So if I say create table
T something as select and then

I say repeat as column C1 and I
repeat some letter a million

times, I have obviously a text value,
a million letters, when I

say create index on this value,
on this table, on this column,

in this case, B-tree will say index
row requires 11,464 bytes, so

11k, right?

Mm-hmm.

Ah, in my case.

But maximum size is 8K, okay, like
block size.

But if I say using hash, right,
using hash, same column, create

index, yeah, it works.

So why do I, okay, I'm an old guy,
I have old habits, I need to get

rid of it.

Actually, this is the perfect case.

Michael: Or consider it.

Nikolay: Large text values, I don't
need to hash first and then

B-tree, I just use a hash index in this case.

I will need to think about it.

Thank you.

This is unexpected.

Michael: There are some reasons
you might not

Nikolay: want to

Michael: do this.

Nikolay: Why?

Michael: Okay, well, let's go through
the positives first.

I promised positives.

Size can be, it can be worse, but
it can be a lot better.

Speed can be better, not just lookups,
but also the average latency

of inserts.

I definitely had some discussions
with Haki when we were reporting

these numbers that we could have
done a better job, I think.

I hold my hands up, this was a
couple of years ago, and I've

learned a lot since then.

But insert performance on a batch
was, or when we were inserting

1 row at a time, was about 10%
slower, I think, for the B-trees

than versus hash, which I was a
bit surprised by, because I knew

there was some overhead around
splits for hash indexes, or higher

overheads some of the time.

But lookup performance...

Nikolay: B-tree also has splits sometimes,
right?

Michael: Yeah, of course.

Nikolay: So we need to compare.

In this case, we can't say hash
index has this disadvantage and

that's it, right?

B-tree also has this disadvantage.

Michael: Yeah, so they're kind
of bigger splits less often.

Yeah, that's a good point.

But lookup speed, I was expecting
a difference and admittedly,

they're both really fast, right?

We're talking about basically key
lookups.

So B-trees, we've always raved
about how good they are at like

key lookups, right?

Nikolay: That's what- Yeah, just
a few pages to find a proper

reference to, like it's just a
few blocks to fetch from disk

or to read from page cache.

Michael: I think we only did about
100,000 lookups and it was

both of them were under a second
to do 100,000 lookups.

Nikolay: So you haven't checked
shared buffers at that time?

Only timing, which is quite volatile.

Michael: Yeah.

Again, I would do some things differently
now.

This is a couple of years ago.

But the lookups, whilst they were
both under a second, I think

it was something like 30 or 40%
faster for the hash lookups,

which was, I thought, actually
quite significant.

So I think there are some cases
where in some extreme, if you've

got long strings and you only care
about uniqueness lookups,

then I think there's a case for
using them.

But that is obviously a very narrow,
very specific use case.

Nikolay: I wonder in these tests
if you considered the planning

time, because I guess if table
is not absolutely huge, like billions

of rows.

In this case, planning time can
be maybe even a bigger contributor

than index scan itself.

So we need to exclude it using
maybe prepared statements and consider

only execution.

Michael: That would be a good test,
but also why would the planning

time differ between index types?

Nikolay: I'm not saying it differs,
I'm saying if you're comparing

these latencies, what percentage
of planning time is there here?

So we need to exclude it to compare
clean numbers, right?

Michael: Yeah, that'd be great.

I don't think we used prepared
statements.

I'm pretty sure we didn't.

Nikolay: And honestly, I wouldn't
look at timing here at the

first, like this first metric.

I would still prefer looking at
shared buffers.

But okay.

Michael: But the point is they
can be more efficient and that's

partly because they're, if you
imagine a B-tree structure, you

might have to go through

Nikolay: a few

Michael: hops, especially once
your table gets really large.

Now we've talked about partitioning
so many times, right?

So you could argue that you could
keep your B-trees relatively

small in the grand scheme of things.

But if we do get large enough,
the hash structure is just so

much more efficient and that will
show up in shared buffers as well.

So that is the argument, I think.

So you've got smaller, potentially
smaller, indexes for certain,

you know, for highly unique, slightly
larger data types and potential

benefits on the insert and lookup
speeds.

Nikolay: Have you considered collisions?

Because if you have only integer
four, obviously you might have

collisions and you need to resolve
this additional hops, right?

Additional comparison and so on.

Have you considered trying to measure
some edge cases?

Michael: I looked into this, I
looked into the internals a bit.

Well, actually there's a couple
of really good internals pages

on the PostgreSQL docs on hash indexes.

And it's only a throwaway word
quite early on in one of them that

mentions that they are a lossy
index type.

So it's not the only lossy index
type, but it means that just

because we get a match for the
hash, the 32-bit integer, doesn't

mean we do actually have a hit.

We could have a collision.

So there is a recheck, and you
can get rows removed by index

recheck type things in your explain,
but I haven't seen it.

Like we're talking, but you'd have
to have, well, I'm not sure you'd have to have billions,

but I'm guessing you'd start,
you know, in the tests I've done,

I haven't seen any.

But I guess you'd pay that overhead
for every lookup, right?

Like you're having to do that recheck
all the time because they're

lossy.

It's not just when there are collisions
that you pay that recheck.

So those numbers I was talking
about include the overhead of

rechecks.

Nikolay: Okay.

So what are the downsides?

Why shouldn't I

Michael: use- So many.

Nikolay: So many, okay.

Index only scan, as I remember,
right?

Yep.

Lack of

Michael: index only scan.

Because we don't have the value
in the index, it's literally

impossible to have an index only
scan.

Right.

Nikolay: And that's bad.

Yeah.
Right.

And that's bad.

Yeah.

Yeah, so.

But if I, in my case, if I have
large text value, and I consider

B-tree versus hash
expression.

Michael: You can't have an index
only scan there either.

Nikolay: Yeah, because I cannot
fetch the original value.

So it's not a reason that would
stop me from using hash index.

Okay.
Right.

Michael: In fact, if you consider
collisions for that.

Nikolay: Yeah, well, yeah, yeah.

Michael: You have to implement
it yourself.

Nikolay: That's true, yeah.

Michael: So I think the big one is
ordering.

Obviously not in the case you're
talking about where you're hashing,

but the big advantage B-trees have
is they order your data and

that supports so many different
things.

Nikolay: And greater than or less
than comparison, obviously.

Yeah,

Michael: range scans, yeah, it
feels like a limitless number

of things that that then enables.

Nikolay: Right, you can deal with
collation issues, it's so cool.

They can be corrupted because of
collation changes.

Michael: I mean that's a good point.

Maybe hash indexes are less likely
to get corrupted.

Nikolay: Right.

Okay.

Michael: Actually, what if the
hash function created a different

value instead?

Nikolay: I don't think it's a good
question.

I don't know how it's implemented
internally.

It should not be super difficult.

But obviously, how will it hash
non-latin characters and so

on?

Michael: Yeah, exactly.

Does it depend

Nikolay: on glibc version change,
for example?

It's an interesting question.

I never had this question because,
again, I'm not using them

in the wild often at all.

Michael: Another huge difference,
and I don't think this is a,

I don't think this is necessarily
based on it's, I don't think

this is like the index only scans
where it's literally impossible,

but you, in Postgres, we can't
use hash indexes to enforce uniqueness.

So for unique constraints, which
is a big deal.

Like, that's a really, like, for
example, we can't use them for

primary keys.

Or like, there's no point because
we've already got a B-tree

index on them.

Nikolay: Okay.

Good to know.

What else?

Multicolumn, right?

I remember now.

Yeah,

Michael: Exactly.

So, hash indexes don't support
multicolumn.

Nikolay: You can combine values,
so you can, I don't know?

Michael: But more to the point, one of
the nice things about multi-column

B-tree indexes is you've also got
the sorting sorted by the first

column and sorted by the second.

So we just don't have advantages
like that.

And it ties into the index-only
scans as well.

One of the best things about multi-column
indexes is we then get

potential for index-only scans
as well.

So these things, a lot of them
are interplay.

One difference that isn't necessarily
an issue, but I found interesting,

I thought you might as well, was
that the fill factor for hash

indexes is 75 by default, whereas
B-tree is 90.

Really interesting.

Nikolay: Some old decision, I guess.

Maybe not, but maybe, I don't know.

Yeah,
Michael: Interesting.

Well, I guess, like, in the blog
post, we looked at what does

it look like if you change the
fill factor to 25, 50 or 100.

And at 100 versus 75, you just
see, you see it not getting larger

as quickly, but when it does, it
bounces at a faster rate.

So as the index grows, it grows
less at first and then more all

at once.

So my guess is it's some trade-off
between general size and performance

and the impact of splits or the
cost we pay at splits.

And in fact, this is probably the
biggest one you'd want to consider

on the difference between doing
a B-tree index on a hash function

versus using a hash index.

And that's how insert performance would look in

terms of probably not average latency,
but maybe the standard

deviation of latency.

So with hash indexes, the way they're
structured, we have buckets

that values can go into, but beyond
a certain size, Postgres

works it all out, and it does all this
for you.

It decides, oh, we need more space.

It'd be best to have another bucket.

Let's split one of the existing buckets.

And then we pay that overhead during
the insert, right?

Like that, it can slow down your
insert.

So my guess is the standard deviation
for inserts would be perhaps

significantly higher than B-trees
but I haven't checked.

It's another thing I would like
to do.

Yeah.

So on high throughput, there is
a note in the internals document

about high throughput.

Let's say you've got a really high
throughput table, you might

actually be better off with your
B-tree index that you mentioned

than a hash one.

Nikolay: Okay.

And multi-column index, I think
we can, I've just checked by

the way, hash index applied to
whole table row.

It works.

Produces interferer number, it
works.

I think we might probably decide
to use it to, I remember I was

always using MD5 hash, converting
row first to text and then

MD5, but maybe hash index is a
better choice when we need to,

for example, compare the content
of two snapshots, row by row,

you know, like to find which rows
were changed, comparing them,

like joining by ID, by primary
key, for example, and then we

compare hashes of whole row, right?

In this case, hash index is working.

I wonder if, like, if we, okay,
multi-column index is not supported.

What if we just combine multiple
columns and produce hash out

of them and almost the same, right?

It's strange that it's not supported.

Michael: I mean, we can still only
use uniqueness lookups, right?

So I guess it's the case where
you want to see if two columns when

combined are equal to the record
you've stored, yeah?

But I'm not seeing the same benefits.

Nikolay: Okay, I will check a couple
of things more.

Any more limitations or that's
it?

Michael: I don't think so.

I might have missed one.

But those for me are substantial
limitations.

And I think if you're, like when
you're, imagine when you're

designing a system where you're
choosing an index type for a

new feature building, and it turns
out that right now, the hash

might just about come out on top.

Maybe you get slightly smaller
index, maybe your lookups are

slightly faster.

I'm not sure I'd still advise using
a hash index.

I think I might say, look, if your
requirements change in the

future, if you ever do need to
look up like a range of these

values, if there's some analytics
you might need to do on it

or something, you might have been
better off with a B-tree index.

You've got that extra flexibility.

You've also got, I think, probably
more effort being put into

optimizations on them going forwards.

If it's a tight call between them,
I might still encourage people

not to use them in general.

Nikolay: I never did it, I just
did it.

I used hash record function, applied
to the whole row in a table,

which has 3 columns.

And then I created an index on top
of this, that hash record.

So, for the first time, I created an index
not specifying columns, but

involving all columns in a table.

This is super strange.

It could be bit-free actually as well because the hash record produces

an int4, right?

So it applies to all columns.

So I said, create an index on T1
using a hash of hash record of

T1.

The whole record.

That's super strange.

I need to think
Michael: about it.

What use case are you imagining
for this?

Nikolay: I don't know.

I'm just exploring, you know, like,
curious.

I don't know, maybe like, again,
like to track which records

changed content or something.

I don't know.

I don't know.

It's just interesting that it works.

I thought we were required to specify
columns, right?

When we like this column, that
column, but here you just specify

the whole record and it works.

This shows the flexibility of Postgres
as usual, right?

But I created a multi-column index
hash, but it's a hash of hash.

So double hashing here.

Okay.

So I don't know.

I'm still thinking about use cases.

I still only, like, a realistic one
is only this, like, really large

text values.

That's it so far for

Michael: me.

Yeah, and when you only need unique
lookups of them.

Nikolay: Yeah, so yeah, interesting.

So I will keep it in mind.

Michael: Yeah, I love that we've
gone, during this episode we've

gone from I never have use cases
for hash indexes.

Yeah, exactly.

There we go.

Cool.

But I also think, you know, you
said you've seen them in the

wild.

I think that might be a result
of people overthinking it.

And I was there a couple of years
ago.

I was thinking, you know, there
might be some cases where that

the difference is meaningful and worthwhile,
but for the flexibility,

I think.

Anyway, I've said that a few times
now.

Cool.

Well, thank you, Nikolay. Thanks
for indulging me.

Hopefully a few people found that
interesting and useful, and

yeah, catch you next week.