Nikolay and Michael discuss Heap-Only Tuple (HOT) updates — what they are, the benefits, and things you can do to optimize for them.
- Heap-only tuple updates (docs)
- README file (source code)
- Heap Only Tuple chapter (Internals of PostgreSQL by Hironobu SUZUKI)
- How partial, covering, and multicolumn indexes may slow down UPDATEs (blog post by Nikolay)
- Why Uber switched from Postgres to MySQL (blog post)
- Fighting write amplification with HOT updates (Adyen blog post)
- HOT Updates vs Bottom-Up Index Deletion (video by Lukas Fittl from pganalyze)
- Reduced Bloat with Bottom-Up Deletion (blog post by Hamid Akthar from Percona)
- What is fillfactor and how does it affect performance? (blog post by Kaarel Moppel from Cybertec)
What did you like or not like? What should we discuss next time? Let us know by tweeting us on @samokhvalov / @michristofides / @PostgresFM, or by commenting on our Google doc.
If you would like to share this episode, here's a good link (and thank you!)
Postgres FM is brought to you by:
- Nikolay Samokhvalov, founder of Postgres.ai
- Michael Christofides, founder of pgMustard
- Jessie Draws for the amazing artwork
Creators & Guests
What is Postgres FM?
A weekly podcast about all things PostgreSQL
Michael: Hello and welcome to Postgres fm,
a weekly show about all things PostgreSQL.
I'm Michael, founder of PG Mustard, and this
is my Cohot Cola, founder of Postgres ai.
Hey Nicola, what are we talking about today?
Nikolay: Hot updates, Hot updates, or hot updates.
Let's discuss this
Michael: Heap only two pull topple.
We, are at that point again, heap only, tap updates.
Nikolay: let's start from Short story.
First time I was in the US it was 2008.
I came for, for the conference in Maryland and I was in New York.
First time I entered to Starbucks and they asked me what I want.
I say coffee, like cappuccinos.
I think they ask me, should it be hot or cold, and
I say, and they gimme, of course, cold because they
hear, you know, American pronunciation is hot, not hot.
Michael: So hot.
Nikolay: And if you say hot, they can hear
cold, actually, Cause it's Oh, oh . So.
Hot updates or hot updates.
Michael: Who serves cold cappuccinos.
Nikolay: Yeah, it's, it's, it's, You
should live here for a while to understand
Michael: Yeah, I wish I lived in a climate
where that was a reasonable thing to do.
So actually, this is a listener question and topic suggestion.
So thank you to Andy for this one.
The question was effectively when should I think about tuning fill factor?
What kind of what kind of workloads?
For tables, I guess, but he actually didn't specify.
But yeah, because he an asked about hot updates, I'm assuming tables.
And what kind of workloads would it be worth customizing for?
So, yeah, really good question.
And to answer it, we kind of really have to dive into what heap only
type updates are how they help the advantages, that kind of thing.
And this is one of the questions which are originally
closer to infrastructure level, like DBA DBRE.
But I see it clearly that developers should learn it as well and understand
it better because like, this is one of administrative topics, which.
Are very beneficial to learn for developers for.
people who write sql, not just maintain progress notes.
Because this depending on which indexes you define and
how you write your queries, you can benefit from hot
updates from having or, or you can avoid it, right?
So of course I wish we had all updates hot, but it's not possible, of course.
And of course fill factor by default is a hundred.
For tables, we should probably should start from basics, right?
What, what fill factories and how updates are
happening according to Postgres and VCC model.
Michael: Yeah, let's start with how updates happen without this optimization.
So before version 8.3.
Before your 2008 story, Actually it was around
about that same time that they, this got added.
Um, All updates.
Nikolay: you say?
The same version which which got XML data type and functions.
So I remember it
Michael: Oh wow.
Of course your one.
So yes, so same version.
And that means there were times before this optimization.
So, and of course we have regular updates or cold updates.
I've heard, I've seen them called at least once.
So by default Postgres, because due to nvcc
and I guess other design decisions will.
Create a new topple for each.
Like if you do an update on, on a row, you'll get a new couple.
And an you'll have an old one that gets marked eventually as dead
and cleaned up once, once no transactions can see it anymore.
Nikolay: And cleaned by vacuum.
It's cleaned later as synchronistic, so
independently by our activities, right,
Michael: Yeah, exactly.
So the main optimization here, and there are some caveats,
is that we can, if there's enough space on the page
Nikolay: Which is eight k kb bytes by default and usually it's all.
Michael: Yeah, total size of the page is, is eight
kilobytes and total amount left free is, is generally zero.
So Postgres would try and pack these pages as much as it can by default.
That's, that's the fill factor we were talking about briefly.
So by default we have a hundred percent or a hundred as the default factor.
So it won't leave space in them.
Nikolay: forensic says 90.
Michael: Or be treat.
So this optimization with some caveats
will try and update a row on the same page.
So the new, the new type ends up on the same page as the old one
and tis it up like without needing a vacuum, is my understanding.
So yeah, the main caveat is there needs to be enough space on
the page, and the other one is that it needs to be updating.
Columns that are not indexed in any way.
And here is very important.
Two, conditions should be met.
You're absolutely right.
Enough space in page.
And we, we are, by the way, we are skipping, discussing toast here.
We, excuse me, Like if, if, if we insert value which
exceeds page, it's shrinked and to tables are used,
and shrinked, like two gigabytes also compressed.
Like, but we whiskey it here.
But so, okay.
Of, of course the space should be there inside our page.
, a second reason, second condition says you, you said indexed
columns, but indexed should be defined by like explicitly.
What, what, what does it mean?
Indexed, Of course, if you create index on some
column, this means you, this column is indexed.
Okay, But what if you use column, column inside
wear clothes, meaning you create a partial.
Michael: Yeah, Partial or expression indexes.
Count two, I believe.
So if, if com is not used in any indexes inside like the main
body of definition, but or including covering indexes, right?
But it's used only inside wear clouds, it's also counted as like being used.
And if you update such com, you, you cannot have hot updates, unfortunately.
And the very popular example here is
having a colon cold, for example, modified.
Nikolay: and you just modify time and that's it.
And then you say, I want to index only part of my
table where modification happened only recently.
And that's it.
You, you put like, create some index, but blah, blah, blah.
Were ified it.
Is in this, this here only, for example, and that's it.
If you try to modify, modify it, add, call,
updated, you don't have hot updates, never.
So like, it's not possible or, I mean, you, you, any update.
We'll modify this column because modify that, usually
updated by some trigger or by application itself.
But any update is supposed to, to insert new value inside this
column, meaning you will always be touching a column, which is
participating in some index definition, so you will never have.
I have an article about it, so about
this because I, I've hit it in production.
I, I thought I'm optimizing things, but I lost hot updates completely.
And it was uh, it was not good in terms of performance.
Actually question why, Like, Okay, these two
conditions are made, but why do we want it?
Michael: Yeah, that's a
Nikolay: What's the, what's the benefit?
Michael: So my understanding is the, the, without this optimization
update, heavy workloads become very difficult on Postgres.
So it, we get what's famously called right amplification.
The Uber blog post is like the most extreme,
I think example I've seen of this in the wild.
But if we have a.
Actually, there's a really good ADM blog post on this more recently as well.
But they, the way they explained it I thought was really neat, which is
if we have a single row that has, let's say 10 indexed columns, doing a
single update to that row requires us to not just wrap to a new page in
the heap on the, in the table, but also to update 10 index pages too.
So for one update, we have 11 updates.
Even if that update.
Update an index column.
So even if those are effectively unnecessary updates, whereas
with this optimization, if we are updating a co, a single
column that's not involved in any of those 10 indexes,
we can get away with a single page right instead of 11.
And that's a huge benefit, obviously say
So yeah, Postgres, All POGS indexes are direct.
Unlike in some other database systems where sometimes there
is discussions should we have indirect indexes as well.
This, these discussions pop up theoretically and probably like we
should, for example, in some, some design could be only primary.
PRI indexes, which support primary keys are direct, meaning that
they point to heap and other indexes could point to primary keys.
And in this case we would not need to update all of
them, but, currently in PO all indexes, point to heap.
They say this page and some of that, and if, and index doesn't
have visibility information, this is the key also to understand it.
So visibility means which is this top debt or still live for our transaction.
And it means that index should uh, you
need to consult with he to understand it.
And, and this means that if you, if the sum topple is dead,
Indexes and, and new ah, also we forgot to say in, in vcc, in PO
basics of it any update means some tap you actually described it.
Table is deleted.
A new, new table is created.
A top is a raw version, physical, raw version.
And since Synex doesn't know, which is life.
It needs to have both old one and new one because old
one probably still needed for some transactions and.
If we have 10 indexes we need to create to, to tie, to change all of them.
And later auto, auto vacuum or regular manual
vacuum, we'll need to clean up all of them.
All that interest there.
And also page split can happen like very like affecting
performance even more if some index ari index already
don't have space in some page, although fill factor for.
Indexes between indexes is 90, so some space should be
there, but if not, then new page needs to be created.
Split happens, like it's, it's penalty for performance,
Michael: While we're on the advantages or the benefits of this, there's a
few others that I know we've covered in other episodes briefly, but as well
as not the update the indexes as well as less bloat in the indexes as, as
a result, there's also potential for less bloat in the table long term.
less right ahead log generation.
Normally I believe it would normally be faster.
The correlated data stays correlated.
So when we were talking about brain indexes, for
example, so there's a bunch of other Oh yeah.
Also I think reduced vacuum overhead because some of this is being tidied up
Nikolay: Right, we, we, lets don't touch additional in page vacuum,
which happens on the fly here, but like, it's too much details.
Usually DB is usually more, Let, let's,
let's focus on performance for developers.
So right heart of that usually are much, much better.
Meaning that you, your actual update is much lighter.
So it's, if you check, explain and analyze buffer.
, you will see that your update modified much fewer buffers, right?
Made the majority or written.
And, and this is great.
And of course here in this topic, we should not use only microanalysis.
We need to check many of uh, update cases so it makes sense to generate.
Benchmark and take your data, your, like simulate
to your workload and see how many updates are hot.
It's easy to you, you just need to inspect that.
It has both number of.
Taps updated a number of taps, updated in hot
manner, and updated includes hu of course.
So it's like you cannot summarize.
One includes the other.
And the ratio is easy to get and the closer to hundred you
are, the better performance will be because again uh, buffer.
Buffers rule works here as well, right?
The fewer buffers will be touched because fewer indexes need to be adjusted,
Michael: And that's a really good point It's possible that
you are already even without doing any tuning, you might
already be achieving close to a hundred percent hot updates.
So there might be, it might, there might be no need to change your settings.
Equally in that situation, you need to be very aware of that because
if you, if you do anything to prevent hot updates, like you mentioned
briefly earlier, if you add an index yeah, it'll add an, yeah.
Nikolay: yeah, it was a Ruby application, very interesting,
like very good startup acquired by a large company later in
San Francisco, and I was advising them and I said, Oh, here we.
We can reduce our index size and it's scope, just making it partial.
Let's do it.
We will put this column to wear clouds indexes much smaller, faster.
By the way, also worth pointing that if.
, we have a partial index if it's scope in
terms of how many rows is out of our change.
So, I mean, change is happening outside.
This index doesn't need to be touched, of course.
So if, if one of our 10 indexes is saying like, it's index,
don't leave this part of table, but we change another part
of table, of course this index doesn't need to be changed.
But in my case, I, speed it up.
My query, but then we deployed it and then I suddenly see that hot updates.
I was checking picture, start user tables, gathering
statistics, showing that we, we good, we're good.
We have so many hot updates.
And then suddenly, After release, it dropped like to zero basically.
And I, I, what's happening?
And then I realized, so I explained this
case in, in, um, my article actually.
This is quite Bright case, which says that.
Our changes should be tested in at, at various angles,
not only for our workload, but questions should be,
will other parts of our work workload be affected?
And I didn't see so far comprehensive testing,
like the testing, whole workload every time.
It's quite, quite big and expensive task.
I like still, there are some ideas and some, in some cases we achieved.
I mean, I've built some systems which are close or it's easier, like you would
press button and have this kind of testing, but it's not topic of course.
Michael: So in terms of what developers can do, I guess being aware
of what their current state is is a really good starting point.
So checking that view for highly updated tables and
just getting a sense of what's the current proportion
of updates that are hot is a, is a great starting point.
If it's zero you might have room for a lot of improvement.
If it's a hundred, you might need to be very careful, but anything
in between, there might be room, I guess, for a bit of optimization.
Nikolay: Right, right.
Having good monitoring, which absorbs all
tables and uh, psta user tables and absorbs.
It, it, it helps and, but also of course,
having a regular update doesn't hurt.
Originally, like it's not, it's okay to have it right, but if
you expect that this table right, performance matters a lot.
And also data volumes are huge and data grows.
So vacuum pressure on vacuum will be high.
A lot of work to, to delete this dead and both in he and all indexes, right?
Of course it makes sense to try to achieve
higher ratio for hot updates, hot updates.
I guess the bigger your workloads, the more this percent.
Improvement will help
Nikolay: Question to you actually as an expert in explaining plants
and, and and so on do you think having auto explain running with
buffers options enabled for slow queries can help to identify issues?
For example, we have update.
Which became, which was slow.
We see it on auto explain log, and we see a lot of buffers involved.
We know that our update just one row, but why so many buffers?
Meaning that maybe many indexes we are involved.
Of course, maybe this update includes select.
So, so that's why we try to find which we want update.
This is another reason for, But this would, no,
this would mean that buffers are, are in red.
I mean, Hit and, and red.
But if we see a lot of buffers directed and
written also might be related to hi bit updating.
Hit bits a different story.
But do you think it's a sign like we, we see big numbers,
so probably we should try to understand what's happening.
We've had update ratio.
Michael: Yeah, exactly.
I think that's the kind of thing where you can spot that
there is an issue, but you, you can't necessarily tell what
the issue is and it, this would be a very big sign of it.
Like even in select queries, we, we now to try and point out
when we think there's a disproportionate number of blocks, Even
shared blocks hit or red versus the amount of data being returned.
Cuz that's normally a sign of bloke, but
it can be a sign of other issues as well.
So it's, it's one of those ones where a high number
of blocks might mean you've got a low hot update
proportion, but it could be a sign of other things.
So, yeah definitely.
Yeah, it definitely feels like the kind of thing where you could
point out there's a problem and, and suggest some things to look into
Nikolay: All right.
So checking pta, user tables.
This is number one thing here.
And um, if we, for example, achieve high
number, we should like keep watching, right?
Monitoring again, isn't it?
In terms of what people can expect in terms of speedups
or in terms of like reduction in some of these numbers?
I've, I think the AGM blog post I read was the best,
Michael: like real case I saw.
So they, well, they had a serious amount of right ahead log
generation, but they reduced it by about about 20% or so.
That feels like an interesting, nu like interesting number to me.
So I don't think people are necessarily, I don't think people
should expect, you know, 10 x improvements in some of these things.
Nikolay: This is a very good point.
So it's not only about timing of your update, but also you
generate a lot of walls data and means that you put additional
pressure to your back up system and also to replication system.
So more, more bites should be trans transferred
to standby notes, both physical and logical.
So it's like a lot of work and having pogs 13 or later helps here
because in producer statements, And in explain analyze, you will see wall
metrics, how many bites and how many full page shorts happened and so on.
This is great, great point.
Also, actually, you mentioned this earlier, having pos 13 or 14
or maybe 15 helps as well because B3 will be more compact, right.
In this case.
How, how is it related to hot updates?
We have Even updates became not hot.
We, we have smaller penalty in terms of maintaining in Index State, right?
Michael: There's a, there's a good video by Lucas Fiddle
on this that came out around about that time that I'll
Nikolay: ah, you, it's about let's delete
from, it was originally per Kona block post.
I guess let's, right, let's, let's delete
from the end or something like this, right?
Michael: It was something about, I think the, the
video might have been titled Bottom Up Index D.
Nikolay: Right, because if you start from the
end we will have empty pages or something.
I may be wrong.
Let's not guess.
Let's just provide some link and, and, and read about it later again, right?
Actually, you, you mentioned Lucas um, fi, um,
YouTube channel, Pigeon generalized YouTube channel.
I, I think it's underestimated because.
I, I would warn our audience because it's too deep.
It's like, you, you, yeah.
You should go there if you already, not a new user of pogs, but once
you already know basics, it's super cool channel because it's very deep.
It's very interesting.
I, I learn a lot every time I open it, so it's good.
And short videos as well, like five minutes.
Reverting us to, to the very beginning.
I, I should point to the fact that if you want to learn about
s you previously I always said, said you cannot find it in
documentation because the convention didn't have anything about it.
Now it has since POS 15 or 1415.
But it has only a small.
It'll not be enough for you.
Like, I mean, even if you just starting, if you want to
understand how updates are working, I encourage everyone to
open source code again and find rhythm related to hot updates.
Much more interesting information there,
but of course, maybe too deep as well.
Michael: Yeah, very in depth.
I was gonna say, but there is, it's quite beginner friendly.
Like I, I looked at it earlier today and there's a glossary
even of, of kind of definitions even within the read.
Yeah, it's great.
So yes, I'll include links to both of those.
There's also one final kind of place that
people could look into if they want, if you.
Diagrams or you're quite a visual learner.
There's a good part of the guide by, we've mentioned it before,
by Hu Nobu Suzuki that exactly the internals book, it has the
online version of I can link to with diagrams of how this works.
And Ros book also has this topic card also internal worth mentioning.
Well, okay, good.
Finally like related to hot updates.
I have a feeling that a lot of things can be improved here.
For example, there was this discussion of the idea of partial hot updates.
So let's have some indexes are updated in regular weight, some in partial, if
it's possible, but as I know, it was not like, Prototype was not get finished.
Maybe in future we, we will see some improvements in this area.
I have strong feeling that can be much, much better in
the future, but in general, definitely if you are thinking
about this area, try to reach at least 13 or maybe 14.
In terms of pogs major version, you will feel much better, right?
Because in terms of visibility, in terms
of how between indexes, behave and so on.
Anything else related to heart?
Michael: I think it's quite important to recap.
So the, the two big.
Things needed to achieve a hot update are one enough space on
the page and two, no index, no index columns being updated.
And that kind of leads developers to, two things you can do about this.
One is make sure there's more space available on each page,
so that's where fill factor tuning can be looked into.
And then second one is make sure no, so
we've covered the kind of make sure your.
You don't have indexes on columns you want to update if you don't need them.
So any, if you've got any unused indexes,
it's a real benefit for cleaning those up.
Or if there's a trade off there to consider, if you've got a very right heavy
workload, maybe you're okay with some reads being a bit slower as a result.
So like there's that.
That natural trade off there, but the one we haven't
talked about much is that fill factor tuning.
And it's something I didn't know until today when I was
reading up on it, is that if you reduce the fill factor,
so, oh wait, if you have a partition table, it's quite nice.
You can experiment on fill factor of new partitions.
Not in production maybe, but The other point that I
learned today though, was that you can change the fill
factor on an existing table without rewriting it all.
It doesn't change your past data, but you can benefit
going forwards at least, which is something that's quite
interesting as a as something you might actually be able to do.
You don't have to do a full table rewrite or similar
to, to start getting the benefits going forwards.
So again, fuel factor is the threshold.
When defining when new page should be allocated and used
when new two tops are coming during inserts or updates.
And for table, it's handed by default.
For indexes, it's 90 by default, meaning
for in, for indexes, we are better prepared.
New pages use, I mean, I mean, to, to avoid them actually, right?
To, to make them less likely to happen.
But the question is, should we decrease fill
factor for tables from hundred to 90 or 80?
Maybe I saw 50.
Some people use 50.
Michael: Oh wow.
Answer is test always right.
, there's no magic answer.
And it'll depend on data and especially on
the workload and also on your activity of owa.
If you have not tuned owa, you will not benefit a lot.
So OWA will need to clean up very quickly.
Michael: That's such a great point.
And things that prevent cleanup are also problematic here.
For example, if you have long running transactions
that prevent things being marked as dead.
So there, there's a whole interplay in
terms of system level things to consider.
Um, So yeah, testing and testing I think on something
that's realistic as possible is quite important here.
So your pages need to be filled with data that's roughly realistic
in order to determine whether there'll be a space available.
Nikolay: Or if you have a, a long transaction on a physical standby with
hot feedback on, or if you have a logical standby, which is lagging a lot.
You will, your auto vacuum will not do your proper work.
And this is an issue.
And in this case, fill factor won't save you actually fill factor.
Even make, will make things worse because more pages fill factor.
If you reduce it you will need more space to store the same data.
If you said 50.
Nikolay: Twice more.
Well, not well.
Right, right, right.
So experiments like I, I saw a blog post from Cyber Tech, which.
For some synthetic benchmark workload showed that if you decrease fuel factor
from a hundred to 90 roughly, you, you have improved performance improvement
for updates also, like roughly 10%, roughly, but further, It's not.
So if you next 10% doesn't give, you, don't give you 10%
of performance improvement only like a couple of percent.
So maybe like the conclusion is maybe
you should have 94 tables as well, right?
Michael: Oh, you mean by default?
Nikolay: by defaulting your application.
But we cannot say it's, it's rule for everyone.
It should be test.
Michael: If you have a pen only, like if you have an
append only workload or if you have a pen only tables,
there's no point reducing it as far as I can tell.
So it's only, we're only worried about update heavy
Nikolay: If append only means no, ensures, no, no updates, no deletes,
Michael: like Yeah, exactly.
, like log, log, like table, which definitely should be partitioned.
And only one or two indexes a maximum should be used on it.
. Okay, Because inserts also unify indexes.
There are no hot, hot inserts, unfortunately.
Always modify all indexes, Unfortu.
So if you want very good performance for insert, you
need to have fewer indexes as few as possible, right?
Michael: Yeah, yeah, yeah.
Nikolay: Delete is the same.
Delete also mag all indexes.
There is no hard delete.
I'm just like raising silly questions.
Do we have hundred answers?
So delete will modify.
Both inserts and deletes.
Deletes, they will write to wall a lot because they need to,
implicitly, they will modify all indexes you have on table.
So hand only case, usually, like I see people prefer not
having indexes at all, but maybe it's not a good idea as well
because like who will take care of duplicates for example.
But maybe it's not, It doesn't matter a lot, but.
Michael: key feels important.
Michael: think we've covered that quite well.
I think in terms, I'm looking back over the question.
We've covered fill factor, hot updates, kind, workloads, performance,
when it, when it's worth customizing and the age old advice
Nikolay: how to control digital user tables.
Michael: Yeah, exactly.
How to monitor, How to
Nikolay: Don't forget about buffers.
Nikolay: Buffers buffers.
Viewers should see buffers.
Nicola has a t-shirt on for anybody listening with
explain, analyze, and then buffers in bold, which is, yeah.
Well, thanks everybody.
keep the suggestions coming.
We wouldn't, I don't think we'd have done this topic so
early if, if we hadn't had it requested, so that's great.
Thank you, Nicola.
Nikolay: Definitely I would like, I would be happy to hear about some real
life stories production stories related to field factor and hot updates.
It's, it's always interesting because we, we have some
materials, but most of them are talking about some synthetic
like experimental environment, not about real production.
And the production stories are always interesting to hear
because they tend to to be reproduced in other places.
Michael: In fact, Adian uh, shared that they changed
their fuel factor to 85%, which is super interesting.
Michael: Again, it'd be great to hear what numbers
other people have settled on based on their own testing.
Well, thanks again everybody and see you next
Nikolay: Oh, it was episode number 20, right?
Nikolay: Yeah, . Okay.
Almost half a year without interruptions.
Thank you everyone for being with us.
Next till next week.