Postgres FM

Nikolay and Michael discuss foreign keys in Postgres — what they are, their benefits, their overhead, some edge cases to be aware of, some improvements coming, and whether or not they generally recommend using them.

Here are some links to things they mentioned:

Foreign keys (docs) https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-FK
Our episode about constraints: https://postgres.fm/episodes/constraints
GitLab migration helper add_concurrent_foreign_key https://github.com/gitlabhq/gitlabhq/blob/master/rubocop/cop/migration/add_concurrent_foreign_key.rb
Adding a foreign key without downtime (tweet by Nikolay) https://x.com/samokhvalov/status/1732056107483636188
Bloat, pg_repack, and deferred constraints (blog post by Miro) https://medium.com/miro-engineering/postgresql-bloat-pg-repack-and-deferred-constraints-d0ecf33337ec
Postgres 17 draft release notes, server configuration section https://www.postgresql.org/docs/17/release-17.html#RELEASE-17-SERVER-CONFIG
Our 100th episode https://postgres.fm/episodes/to-100tb-and-beyond
Stop! Trigger Time (blog post by Michael) https://www.pgmustard.com/blog/trigger-time
Should I Create an Index on Foreign Keys? (Blog post by Percona) https://www.percona.com/blog/should-i-create-an-index-on-foreign-keys-in-postgresql
Avoid Postgres performance cliffs with MultiXact IDs and foreign keys (5 min video by Lukas Fittl) https://pganalyze.com/blog/5mins-postgres-multiXact-ids-foreign-keys-performance
Experiment to see basic overhead of foreign keys https://v2.postgres.ai/chats/01902ee6-8ed1-70ec-9345-5606305012f4
Experiment showing an extreme contention case https://v2.postgres.ai/chats/018fb28d-865f-788d-adb7-efa7ed3a48c4
Subtransactions Considered Harmful (blog post by Nikolay) https://postgres.ai/blog/20210831-postgresql-subtransactions-considered-harmful
Notes on some PostgreSQL implementation details (blog post by Nelson Elhage that mentions “subtransactions are cursed”) https://buttondown.email/nelhage/archive/notes-on-some-postgresql-implementation-details

~~~

What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!

~~~

Postgres FM is brought to you by:

Nikolay Samokhvalov, founder of Postgres.ai
Michael Christofides, founder of pgMustard

With special thanks to:

Jessie Draws for the amazing artwork

Creators & Guests

Host

Michael Christofides

Founder of pgMustard

Host

Nikolay Samokhvalov

Founder of Postgres AI

What is Postgres FM?

A weekly podcast about all things PostgreSQL

Michael: Hello and welcome to PostgresFM,
a weekly show about

all things PostgreSQL.

I am Michael, founder of pgMustard,
and this is my co-host

Nikolay, founder of Postgres.AI.

Hey Nikolay, what are we talking
about this week?

Nikolay: Hi Michael.

How was your vacation?

Michael: Oh yeah, really good.

I had a really nice week off pretty
much completely, which is

nice and rare as a founder.

I think you know that.

But I did enjoy listening to the
podcast as a listener for the

first time in a while.

So yeah, really enjoyed your episode
last week.

Thanks for doing that.

Nikolay: Good to hear.

But let's talk about the question,
do we need foreign keys or

we only need American keys?

Michael: Yeah, the word foreign
is interesting, isn't it?

Nikolay: Right.

Foreign tables, foreign keys.

Michael: Yeah.

Yeah.

Should we use them or like when
should we use them, perhaps,

if not always?

What are the pros?

What are the cons?

In what cases can they be a problem?

Or what cases can they really help?

That kind of thing.

Nikolay: Right.

Well, I think it's obvious that
it's better to use them if you

don't expect heavy loads, but if
you are preparing for heavy

loads, until what point you can
use them, right?

And I think dropping them is easier,
maybe.

Maybe not, actually.

If you already designed your system
to, for example, to delete,

to propagate deletes using cascade
option, if you drop foreign

keys, you lose some functionality,
right?

Michael: Oh, I see what you mean.

So like, maybe we'll get to this
at the end, but maybe if you

start with them, and then if they
become a problem, or if you

start to get success and lots and
lots of scale, or lots and

lots of load, migrating to maintaining
that integrity of the

system, quality of data without
them, and processes around data

deletion, things like that.

Nikolay: Right, so they can give
you not only quality of data,

but also some functionality.

Cascade deletes.

And if you rely on them, dropping
to solve some performance problems

in the future will be problematic.

I think I've thought about it many
times, but we have this also,

right?

We cannot drop them sometimes because
we will lose functionality

and this is not good.

But in general, there are people,
I know there are people who

think you should avoid them.

And you know, for example, I can
tell you, you should avoid sub-transactions.

This is my position and I can prove
why you should avoid sub-transactions

unless it's absolutely necessary.

Should we apply similar logic to
foreign keys after you saw some

examples of bad behavior or something?

Or it's different?

This is a question I have in my
head.

And honestly, in this morning,
thinking about this episode, I

was thinking, oh, there are new
options in Postgres 17 we should

explore.

Why we didn't explore?

And then I realized, okay, okay,
it's only beta 1 of Postgres

17.

It's still too early.

But I can't wait when we will see
new results, because we will

talk about it later in this podcast,
in this episode.

But Postgres 17 will bring some
new settings, which are interesting

in the context of some performance
issues you can have with foreign

keys being used.

But let's start with simple things
first.

Michael: Good idea.

Nikolay: We had an episode about
constraints, and we mentioned

there are 6 constraints, right?

6 constraints, and foreign keys
is 1 of the types Postgres offers.

Usually I say foreign keys are
good because I trust database

system much better than anything
else when we talk about data

quality and constraints and integrity
and so on, and referential

integrity, what foreign keys offer
us.

You cannot maintain it in complex
systems without foreign keys,

because even if you implemented
it on your application code,

for example, Python or Ruby, later
your system can be becoming

more and more complex.

And if someone brings other interfaces
to database, new application

code, like in different language,
different frameworks, for example,

or someone is working directly
with database somehow.

It's not uncommon to see multiple
types of code working with

the database, right?

In this case, you have already
multiple implementations of foreign

key, or not foreign key, referential
integrity checks and enforcement

logic.

It means that, For example, imagine
you have Ruby code and you

have Python code somehow working
with the same database.

They both need to have this logic,
and it's hard to maintain.

this logic.

Eventually you will see some cases
where this integrity is not

enforced.

So, while in database foreign keys
are implemented internally

using triggers, triggers is a good
thing in terms of like, it's

inevitable.

Of course, you can say ALTER TABLE
DISABLE TRIGGERS ALL.

In this case, it will disable all
triggers, including these implicit

system triggers, which are supporting
foreign keys.

This is when you can see them in
backslash d table name.

If you ALTER TABLE, DISABLE TRIGGER
all, and then backslash d,

you will suddenly see that, oh,
this table has some triggers.

Because usually they are hidden,
backslash D doesn't show them.

But when they are disabled, suddenly
they show up.

So it's a funny trick.

And these foreign key, these triggers,
unless you disable them,

and you should not disable them
of course, unless you know what

you do.

I mean, during some massive operations
and so on, but in this

case, it's on your shoulders to
ensure that they are followed,

Because when you enable them back,
Postgres won't check them

for existing data.

So if they are enabled, they are
inevitable to work.

So any write you perform is checked
using those triggers, and

if write breaks these rules, it
won't happen.

It will be rolled back.

Which is good.

So it's like more trustworthy approach,
checks on database side.

Make sense?

Michael: Yeah, so I mean to be
super explicit, if you try and

insert a row in the table that
has a referencing column, and

you include an ID that isn't in
the reference table, then you'll

get an error saying so.

That kind of thing.

Nikolay: Right, right.

So this is a good thing, obviously.

Better quality of data, but of
course there are complexities

under heavy loads in large systems.

1 of the complexities is when you
define a foreign key for existing

large tables, or you define foreign
key for a new table, but

it points to a large table.

Yeah.

Of course, in this case-

Michael: not just large, but busy,
right?

Nikolay: Busy, okay.

Michael: Or active in any way,
yeah.

Nikolay: Well, there are 2 types
of problems here.

One is related to busyness, another
is related to large volumes

of data.

Anyway, we need to...

Postgres needs to acquire locks
on both sides,

Michael: which is challenging.

Both tables.

Nikolay: Right.

Different types of logs, but anyway.

And second, it needs to ensure
that existing data already complies

with our constraint.

Michael: Constraint, yeah.

Nikolay: Right.

And if you do it straight, like
in a regular way, in general,

you will have issues.

High risk of issues.

And the risk is becoming higher
and higher.

The more you have data, the higher
are TPS, the higher risks

are.

So at some point, and it's better
to do it earlier, you need

to install foreign keys in a proper
way.

So you need...

Fortunately, Postgres supports...

We discussed this, but let's repeat.

Postgres supports a two-step approach
for constraint creation

here, similar to check constraint.

You can define foreign keys as
not valid state, flag, rights.

It means that they will have flag
not valid, but important to

remember, it doesn't mean they
are not working.

They are immediately working for
all new rights.

They are just not checked for existing
data.

I remember I had a super big mistake
when I designed some complex

procedure related to int4
to int8 conversion, and I

forgot that so-called invalid foreign
keys are working for new

rights, so I kept old foreign keys
at some point just for the

sake of being able to roll back,
I mean to revert all operations.

I decided, oh, I need them because
if a decision will be made

to revert changes, I will already
have them.

I will just validate them.

But new rights were not followed.

New rights were blocked, basically.

And this was a big mistake I learned.

So it happened on production and
it was terrible.

It was my mistake.

So not valid doesn't mean it doesn't
work.

Similar to check constraint, and
actually indexes, if you say

it is valid, false, which is not
actually recommended due to

different reasons.

It also means index actually is
maintained.

So foreign key is maintained for
all new rights.

And then second step, you say alter
table validate.

This is not blocking.

It's blocking briefly, right?

But if you have huge dataset to
check, okay, your DML queries

are not blocked.

Again, they are blocked very quickly,
like briefly, for a short

period of time.

Anyway, this is not a trivial operation,
but it's already well

described in various sources.

For example, as usual, I recommend
GitLab migration style guide.

It's called migration style guide,
but it's about database migrations,

it's about DDL basically, how to
deploy DDL under heavy load

reliably.

So, and they have Ruby code, which
is called Migration Helpers

RB, which demonstrates how to do
it reliably under any heavy

load.

So yeah, it's 1 thing to remember,
you cannot just create them

easily.

But this is true for any DDL.

DDL is dangerous.

We discussed this multiple times.

There's also a headache associated
with installing them to partition

tables.

Right?

Yeah.

I always forget details.

Every time I deal with it, I need
to double check details.

It's always an issue.

Michael: And it changes, right?

The restrictions around partition
tables change every version.

Exactly.

It improves each version.

So things that I thought were not
possible become possible.

So it's good that we check documentation,
documentation is great,

but it's difficult to remember
the details version to version.

Nikolay: Right, the changes are
in a good direction.

Michael: Exactly.

Nikolay: Which is good.

So yeah, if you're an old version,
it's one thing, another version,

like newer version, different thing.

But in general, on newer version,
a lot of things are possible.

So, so yeah.

Michael: One last thing that you
normally mention around creating

these on large tables with, or
like busy or large tables, is

the idea of having timeouts and
retries.

So I think that, I don't think
you mentioned that this time,

but it's an important extra point.

And I'll link to your...

You did a good Twitter...

When you were doing a Twitter marathon,
you did a great post

on this.

I'll link that up as well.

Nikolay: Yeah, I already forgot
about this.

But Yeah, right.

In general, if you need logs, you
can do it explicitly with log

table, but you need to do it with...

So basically, if a two-step approach,
like invalid and then validate,

is implemented in Postgres, generally
you don't need it.

For custom checks and for foreign
keys.

But for some things like adding
some column actually, or dropping

column, it's needed.

You need to deal with locks and
do it with lock timeout and retries

as usual.

But with foreign keys...

Michael: Why not do it with this
1 though?

Like, I would...

I still think it's sensible.

I think, for example, you gave
a good example of if autovacuum

is running in the heavy mode, in

Nikolay: the transaction...

Michael: You are right.

Nikolay: You are right.

You are right.

Yeah.

Michael: So if it doesn't yield
its luck or something else isn't

yielding a luck that you don't
know about, it's so helpful then

to have that timeout on a retry.

Nikolay: You know what?

In general, DDL is hard.

Like under heavy load in large
systems, it's really hard.

I wish it was much simpler in Postgres.

I think there is a huge room for
improvement step by step, and

I hope it will be improved.

But now we need to, it's like art.

You need to improve your tools
and libraries and so on.

You are right.

Even when we install a foreign
key with not valid flag, we still

need logs on both tables, right?

And these logs, if they cannot
be obtained, they will block us.

And if they block us, we start
blocking everything else, including

SELECTs, because it's DDL and SELECTs
is waiting.

So, yeah, it's kind of...

I think for SELECTs, in this case,
it's not that...

Maybe not.

Michael: Or inserts, for example.

Nikolay: Yeah, we need to check,
we need to carefully check types

of locks that need to be installed
on both cases.

I only remember they are different
on both sides.

But in general, this should be
carefully designed.

It's better to follow existing
experience.

For example, GitLabs, maybe some
frameworks already implemented.

I don't know.

Django, I think, Ruby on Rails
in general, they don't offer good

automation here.

It's not enough.

So if you're preparing for heavy
loads, it's definitely worth

keeping an eye on this topic and
maybe to check it earlier than

it starts biting you, because sooner
or later it will.

So first it bites, nobody notices,
right?

Okay, some spike of latencies,
a few seconds during deployment.

But then at some point, especially
if longer transactions are

allowed, at some point it can bite
you so heavily, so you will

notice downtime a few minutes,
for example, not good.

So it's better to double-check
carefully, step by step, and rely

on other people's experience.

But you are right, we need to deal
with low log_timeout and retries

here as well.

And I think especially for partition
table case you need it.

Michael: Why especially there?

Nikolay: Because I think, I don't
remember details, but I remember

when you create foreign key on
partition table, you need more

actions to be done.

First,

Michael: yeah.

Would you have to do it on each
partition and then...

Nikolay: think at least until some
version, you cannot create

a not valid foreign key on partition
table.

This is a problem.

Michael: Oh, interesting.

Nikolay: Right, right.

Yeah.

But again, I don't...

Yes, it differs for each version
and so on.

It should be carefully studied,
tested, and so on.

But with proper understanding that
locks are needed, it's solvable.

Right?

It's solvable.

So maintenance overhead, basically.

There is some maintenance.

Oh, let's mention 1 more thing.

Maybe people don't...

Not everyone realizes it.

We also didn't realize until our
clients at some point, it was

a company called Miro, which is
very popular now.

We helped them with some things
to fight bloat and so on.

And we offered to use pg_repack
to fight bloat.

And they had at that time deferred
constraints.

So foreign key constraint checks
were done at commit time, not

immediately.

And I remember that it caused some
issues with pg_repack.

Again, I don't remember all the
details.

There is a good blog post about
this.

I will send it to you.

But additional maintenance overhead,
which is caused by the presence

of foreign keys in this special
state, deferred constraints,

deferred foreign keys.

So definitely if you use foreign
keys, you need to pay attention

to different things when you do
some operations with your database.

This is true.

But I would say it's worth it,
right?

Because you have good referential
integrity and so on.

So I would still choose foreign
keys in spite of these problems,

right?

Yeah,

Michael: I tend to agree.

I think the vast majority of us
are working on systems that don't

have the characteristics where
foreign keys cause problems.

I know we had an episode only a
couple ago where we had some

possible exceptions to that.

Well, maybe we'll get to that in
a moment.

Nikolay: You talk about 100 terabyte
episode, which was also

episode number 100.

We have received good feedback
about this, considered as the

best episode we had.

So, yeah.

Actually, someone told this was
the best podcast episode.

What's so like?

Yeah, We had great guests.

Again, thank you because you organized
all the invitations and

so on.

Michael: I was just pleased to
be there.

It was a good fun 1 to record as
well.

But yeah, not just the fact that
they're 100 terabytes, right?

The fact that they're heavily loaded
systems and they're constantly

busy and they have to worry about
a lot of these scaling limits

and I guess keep hitting different
cliffs or different

Nikolay: cliffs.

Michael: Maybe that's a bad word
but like limits of what you

can do or how many of a certain
thing you can exhaust before

falling back to a different method.

But 99.9% of us are not working
with those systems and don't

have to worry about those and don't
have to worry about them

initially as well.

So I think it was Sammy from Figma
who said as well, and I think

I agree with this, at the beginning
maybe you don't have to design

for this kind of thing and if you
have to solve this, it's a

good problem.

You're probably successful in other
ways.

If you've got to a heavily loaded
Postgres and you're not making

enough money to pay people to fix
this kind of problem, you've

probably done something wrong.

You need a business model that
works with that load, and all

of them did that successfully.

And I can't think of an example
where a company got too successful

with load, but not with finances.

So I think they can normally solve
these problems.

Nikolay: Well, I not fully agree.

I think, yeah, let's talk about
cleaves, but first let's talk

about performance overhead, foreign
keys, and PgBench supports

foreign keys option, which is not
used by default.

And before this podcast, we were
very curious, we didn't like...

Simple question, is it better,
what's the difference between

2 PgBench runs with and without
this option?

Of course, this option should be
used during initialization time,

but we checked and we saw difference
on a small scale, like 1

million rows in PgBench accounts,
scale 10.

We saw difference only 11%.

On my MacBook, it was a quick and
dirty experiment, all in memory

and so on.

As expected.

I expected also like 5 to 10% or
so.

You said 10%, right?

Michael: I guessed.

It was a complete guess.

And also, I don't know why I guessed
before even knowing.

I'd forgotten the breakdown of
what proportion was selects versus

inserts updates and deletes I definitely
shouldn't have guessed

before knowing that.

Nikolay: Why do you care about
selects, deletes, updates and

so on?

Michael: I thought I guessed that
well maybe this is wrong again

I guessed that there would be a
bigger impact on...

Right.

...Inserts.

Yeah, exactly.

Than, so, than the select operations.

That was my guess.

Nikolay: Well, by the way, actually,
maintenance, Let's put it

to the maintenance overhead basket.

When you need to delete a reference
set row, a row in a referenced

table, sometimes we call it parent
or something.

If an index, well, index always
exists on 1 side of foreign key

because otherwise foreign key creation
will complain lack of

index.

You need like unique index on 1,
or primary key, for example.

Primary index.

Index for primary key, which is
also unique index, right?

Michael: You need a unique constraint.

Nikolay: Right, right.

If index is not created on the
other side, which by default is

so, you can...

Michael: Yeah, no, yeah.

Nikolay: Yeah.

And you delete this reference row.

Postgres needs to find all dependent
rows to either say it's

okay, or maybe to propagate delete
if you again use cascade deletes.

And if an index is not there, it will
be a sequential scan of the

whole table.

So delete will be super slow.

And this is also kind of maintenance
overhead to me because if

you expect...

Sometimes people say, oh, we don't
have such deletes, we use

only soft deletes, right?

So, or something, like, we don't
care.

We don't need those indexes to
be present.

But if you have some risks, sometimes,
like, not often, sometimes,

these deletes are happening very
rarely in some systems.

You need to maintain these indexes
and need to remember about

them and need to create them and
so on.

So it's also a kind of overhead.

Michael: I see this quite often.

And I actually didn't write a blog
post about this, it was quite

a while ago now.

But yeah, it's the time I see triggers
causing performance issues

the most often, because

Nikolay: it's on blast.

Michael: That's exactly that's
how you spot them in the explain

plans.

Yeah.

So I'll include I'll include that.

But my view is normally it makes
sense to index your referencing

columns.

Also for select perform, like often
people, I find they're a

good candidate for selects anyway.

Yeah.

Like there's loads of workloads
and patterns where it makes sense

to have them indexed.

Of course, if you're already adding
like a multi-column index

with them in the leading column,
you don't need an additional

index.

But what I mean is like an index
on that so that they can be

looked up.

It is normally.

Nikolay: Yeah, our Postgres checkup
tool also has such kind of

report, non-indexed, like foreign
keys without supporting indexes.

But we usually say it's very special.

And some people, I see this as
well.

I see that people say, well, we
don't need that because our work

is different.

Michael: I remember a blog post,
I think it was from Pocona,

arguing the opposite case, saying,
please don't automatically

index.

I'll include that as well.

Nikolay: I would say as well, I
would say don't automatically

index because every index has a
different kind of overhead.

It has a penalty, right?

Michael: Well we've discussed that
many times, haven't we?

Not just insert overhead, but also
killing hot updates in some

planning.

Yeah, lots of,

Nikolay: lots of.

So many dangers around when you
need to grow and have good performance.

Right.

So back to these experiments with
PgBench, with and without foreign

keys, You say you expect problems
only in write operations.

Interesting.

Michael: Or at least more overhead.

Nikolay: Let's remember that because
I have a case, you know

I have a case where it's very different.

Okay, in this case, PgBench,
we observe roughly 10%.

I'm quite sure it will be the same
at larger scale, on different

machines, and so on, with PgBench,
I think.

PgBench is my PgBench type of workload.

Roughly 10%.

Would you pay this price, knowing
that everything is slowed down?

Right, operations are slowed down
by 10%.

Michael: Most systems I work with
that have been easy.

Yes, like, most systems aren't
maxed out on CPU aren't maxed

like it's, it's an easy decision.

And for the for the anybody that's
ever dealt with bad data,

Like bad data can be so painful,
as in bad quality data.

I mean, you have to look through
a system and you're like, this

doesn't make sense.

Like these are referencing something
or maybe like a product

that just doesn't exist.

What plan are these customers on?

Or, you know, if you're doing analytics
and you're trying to

categorize things, and there's
bad data in there, it just becomes

such a nightmare at those times.

Nikolay: Or somebody just disabled
foreign keys, did some writes

and then enabled them back.

Or some back, sometimes, or human
cases without foreign keys

actually.

Michael: But I also want- I'm talking
about, yeah, I'm talking

about cases without, but I mean,
anybody that's ever dealt with

that, and it's probably kind of
slightly more seasoned folks

that have had to deal with it once
or twice in their career,

it's just so painful that you then
think, do you know what, I

don't want to, I'll happily pay
10% going forwards to not have

to deal with that again.

Nikolay: I remember cases when
the financial integrity was broken,

even with foreign keys, but it
was a bug in Postgres that was

very long ago.

Since then they became very reliable
and I would pay this price,

10%, to have these checks constantly
performed by Postgres to

ensure the data quality is higher
in this area.

And let's emphasize that this overhead
is not a cliff.

It doesn't matter how many transactions
per second you have,

how many rows you have.

This extra penalty is added to
all write operations all the time

because extra work needs to be
done.

For example, you insert, Postgres
needs to make sure that you

insert a value to child table.

Let's call it parent-child for
simplicity.

It's not necessarily meaning of
the relationship can be different,

but let's call it.

You insert a record to a child
table, Postgres needs to ensure

that parent record exists because
you're referring to it.

So it performs basically implicit
select to check that the record

exists, and this is extra work
that always needs to be done.

And even if you have 1 transaction
per second, very tiny workload,

still penalty will be there.

We can check this actually with
EXPLAIN (ANALYZE, BUFFERS) as

well and see that more buffers
need to be involved when you insert

a row with foreign key versus without
foreign key, right?

Michael: Maybe 1 more?

Nikolay: Well, depends.

Depends how big the table is and
how many hops index scan

Michael: should show.

True, actually, definitely not
1, at least 2.

Nikolay: Again, it depends.

But maybe, I don't know, maybe
it's index only scan.

I think there's the

Michael: page, there's like the
top...

I don't think you'll ever get data
in that root node, but I could

be wrong.

Nikolay: Well, yeah, okay.

Anyway, we agree that this is not
a cliff.

It's constant work.

Michael: Yes, true, true, true.

Nikolay: Additional workload, which
is like...

Additional weight you need to carry
with performing this work

all the time.

You have a contract to do this
all the time.

It's like a tax, actually.

Foreign key taxes.

Michael: That actually is a good
metaphor, I think.

Nikolay: Yeah, like 10% tax from
foreign keys.

So we have consensus here.

We are ready to pay those taxes
to live in the world with better

data quality.

But there are performance cliffs,
as you already mentioned, and

they are terrible sometimes.

So one of them was raised a year
ago by Lukas Fittl, at pganalyze

series of small videos which I
like.

Michael: 5 Minutes of Postgres.

Nikolay: Right, right, right.

They are always based on some other
focus materials, and Lukas

reflects very well in the video.

So it was based on Christophe Pettus
blog post, where a simple

workload was described.

Imagine a Twitch stream and a new
stream is starting, and we

have a streams table, kind of a
parent table, and we have some

viewers or something table, and
immediately like 1,000,000 or so

viewers join the stream and you
need to insert them to this,

let's say, child table, parent
and child.

And if you do it in separate inserts,
separate transactions,

when Postgres performs insert,
not only it needs to check that

we've selected that row exists,
during the duration of insert

this writing transaction, we need
to lock that row with access

share lock.

Basically, for key share lock.

If it was explicit lock, it would
be for key share lock.

So to ensure that this row won't
disappear during our transaction,

right?

Michael: Yeah.

Nikolay: And imagine now like a
million inserts are happening

around the same time.

Of course, it's not possible.

It depends on how many resources
you have, but many of them will

collide and try to happen at the
same time.

And in this case, since multiple
transactions need to perform

a row-level lock on this parent
table, Postgres starts using multiexact

IDs.

Right?

Multi-exact IDs.

I'm like, oh.

So multi-exact IDs are created
and multi-exact IDs are used when

multiple transactions lock the
same row.

And this mechanism is very easy
to achieve some performance cliffs

when you grow, when multiexacts
are actively used.

Well, by the way, Lulas also mentions
another problem with multiexacts.

Many, many monitoring systems,
and PgAnalysis is a good exclusion

here, as well as our version of
PgWatch, PgWatch PostgreSQL Edition,

They both care about this.

But many systems care about only
regular transaction ID wraparound.

They forget that multi-exact ID
wraparound also might happen.

It's just very rare.

I never heard about this case so
far, but it's still possible.

So you need to monitor and ensure
that autovacuum freezes rows

properly and takes care of rows
which were involved in this multiexact

mechanism.

So 1 problem.

But another problem, this cliff.

So in source, we have contention
on multi-exact mechanism.

Multi-exact has SLRU, S-L-R-U,
and buffers are quite small.

The number of buffers, like there
are default.

So it's not default, it's hard-coded
until Postgres version 17.

And yeah, so it's like...

I saw this problem with SLRUs in
different cases, in sub-transactions

case many years ago, not many,
several years ago.

And I remember I touched it a little
bit at that time already.

And sub-transactions is also a
performance cliff.

Everything is right, your scale
is right, and then suddenly the

system is down, basically, because
overhead becomes enormous.

It usually happens when you achieve
the limits of these SLRU

buffers.

We needed it for sub-trans SLRU,
but this is similar to multi-exact

SLRU.

Around the time I learned that
Andrei Borodin already proposed

improvements in algorithm, not
only improvements, but also make

this tunable.

And good news, this work is committed
to Postgres 17.

So if we observe this performance
cliff at some point, at some

rates of inserts, you observe performance
becomes terrible, like

system unresponsive downtime, right?

You might try to tune to postpone
this cliff from your current

situation.

But the only-

Michael: At what cost?

Like a little bit of extra memory?

Yeah,

Nikolay: algorithm was improved
instead of sequential capital

O of N, I think.

Or maybe even quadratical algorithm.

It was improved, better algorithm.

But also you can adjust configuration
and have more buffers,

for example.

I remember we tested some earlier
versions of these patches.

There are multiple patches in this
work.

By the way, big congratulations
to Andrei.

More and more works he was working
on, like many years, they

are committed.

It's great.

But I remember there are some complexities
in our particular

case related to sub-transactions.

But I think we should test these
things and study this exactly,

this very performance cliff Christoph
describes in his blog post,

right?

Michael: Yeah, well, so I think
that's worth mentioning because

this is 1 solution, but Christoph
also mentions a couple of alternative

approaches to that schema design.

Nikolay: He mentions reliable approaches.

1 of them is just drop foreign
key.

In this case, no more performance
cliff, because multi-exact

mechanism is not involved.

Absolutely true.

Michael: But also alternative approach,
like batching the updates,

or there was another idea.

Nikolay: What updates?

Inserts.

It's a very important...

We always say you need to perform
deletes and updates in batches.

But we never said this about inserts.

Inserts is better to do in a single
transaction if you can.

Because inserts are usually not
blocking.

Not in this case.

Here we need to deal with this
multi-exact row level lock.

But if you can do it in single
transaction, it's great, even

if it's a million rows, it's okay.

Or multiple bigger batches, like
a thousand inserts.

But straightforward implementation
of this approach, like, oh,

viewers, just join our channel.

It will be separate inserts because
there happens different sessions,

right?

You need to know about this problem
to design system differently

and start batching these inserts.

So it's a dangerous performance
cliff.

But this performance cliff demonstrates
the problem for inserts.

I have another case, which is not
yet published.

I hope we will publish some blog
post about this.

It was not observed in production,
but it was inspired by some

problems we had with some customers
related to multi-exact IDs

contention.

So imagine we have parent and a
couple of records, just 1 and

2, ID1, ID2, and child table with
some inserts happening.

Not at a huge rate.

Not at a rate where we already
have this performance cliff.

And then also we want to have a
lot of selects, selecting, for

example, the latest child inserted
to the child table.

Just with join, right?

So we join, we select parent ID
1, and we select what was the

latest child inserted recently.

We have timestamp, for example,
to do this, or we just order

by primary key, if it's just an
integer primary key for the child

table.

And now, we think, Well, that's
it.

So we inserted, like, say, we insert
10 per second, for example.

And we select as much as we can.

Like, imagine we have a lot of
viewers trying to read the latest

inserted record to child.

And now we start updating the parent
id equals 1.

This is the cliff.

We perform terrible for everyone,
including selects.

This is the key for this demonstration.

It can be like 100 times worse.

100 times bigger latency, 100 times
lower

Michael: TPS.

It really surprised me when you
told me this, when you showed

me this.

Nikolay: Well, We need to research
all the details, but I suspect

this is related to the fact that
in this case, when you have

inserts and updates of parent inserts
to child, they also...

Well, first of all, all those inserts,
if they happen around

the same time, it's also multixact
involved.

But updates also need row-level
lock, right?

And they also...

Actually, multixact ID is being
put to XMax hidden column,

system column, right?

So XMax column defines the transaction
ID when the row becomes

dead.

And if the transaction succeeds,
okay, this tuple is dead.

If the transaction is cancelled,
nobody knows.

It's like it would be XMAX being
now, right?

Live tuple.

But in this situation, Postgres
puts not regular transaction

IDs, but multixact IDs.

And produces some volume of dead
tuples all the time with multixact

IDs inside XMAX.

And I suspect that when selects
are trying to find live tuples

for parent, it's really expensive
because multixact mechanism,

we need to check which transactions
were canceled.

It's very expensive.

So, selects need to scan a lot
of dead tuples, performing expensive

verification, and they degrade
a lot.

So this workload, I think, anyone
can experience.

And I remember I designed a few
systems where, for the sake to

have fast counts, when inserting
to child table, for example,

comments or something, or views
or something, I don't know, like

something, I decided to update
some counter in the parent table,

to have denormalized data, to avoid
slow count.

So I just insert an update in the
same transaction.

In different transactions, I also
insert an update, plus 1, plus

Of course, this already is not
good, because these updates have

some contention.

You're updating the same row, you
cannot update 2 times, it will

be sequential, like Postgres will.

It will be contentional heavy locks.

You can improve this, but anyway,
even if you perform batched

updates, for example, in separate
transactions, you have high

risks of multi-exact IDs being
involved, and then you already

can have a lightweight lock contention
on multixact mechanism.

Michael: A favorite.

Nikolay: Yeah, overlock multixact
offset.

And yeah, so this is not good.

And I imagine this can happen with
very good chances.

And it also can be seen as a performance
cliff.

So suddenly, it was fine, but then
it quickly becomes terrible.

Yeah, so what to do?

I don't know.

I think we need to study, research
all those cliffs, document

them and understand how to predict
them maybe.

At what rates everything is fine.

Because I don't see them under
very heavy loads, I don't see

them.

But it doesn't mean they're not
there.

Michael: Or they're not coming.

Nikolay: Or they're not coming.

At some point, they might bite
you back.

Michael: That's interesting.

Nikolay: For those who are interested,
let's put links.

I have a series of benchmarks performed
with our PostgreSQL bot

with some visualization.

Of course, it requires explanation
maybe, but if you know how

PgBench is organized, everything
is there, including all, it's

easy to reproduce.

Michael: If you're talking about
the graph that I remember seeing,

it doesn't take that much for explanation,
because you have 4

fairly large bars on the bar chart,
all showing lots of transactions

per second, and then 1 that is
so tiny, you have to almost like

zoom in to see it.

And it's quite easy to say, oh,
that 1 didn't do so well.

Nikolay: Yeah, let's explain a
little bit.

The tiny one is this cliff demonstrated.

3 others.

One was, let's just remove updates.

Another one, let's keep updates,
but update ID 2, not 1.

Another one, let's remove inserts.

And the last one, let's remove foreign
keys.

Yes.

Which is fair, right?

In this case, just to show that
foreign keys is the key of evil

here.

Yeah.

And all bars are TPS for selects.

Maybe latency.

Actually, it's more correct to
show latency, not TPS.

But okay, TPS.

Somehow people tend to like TPS
numbers more, but latency is

what matters here for sure.

Michael: Then we'd see 1 really
large 1 and 3 small ones.

It's not as dramatic a graph.

Nikolay: Yeah, maybe.

Yeah, smaller is worse.

Smaller is not better, yeah.

Because latency is inverted logic.

Smaller is better.

Michael: Anyway, it's still great.

Nikolay: So yeah, SELECTs can be
affected badly.

Michael: Has this changed your
opinion on should you use foreign

keys or not really?

Nikolay: Yeah, this is the first
question I asked to myself and

my team.

I said, like, are we observing
a similar case to sub-transactions

where We, like, I basically started
telling everyone, like, if

you want to grow, try to avoid
them.

At least know all the problems
they have.

By the way, for sub-transactions,
there is a new setting also

interesting to double check in
Postgres 17.

But like, answer is still not 100%.

I'm not 100% sure about the answer
here because like foreign

keys is a good thing in general.

So I think in sub-transactions
case, we saw how to achieve the

problems very fast, like very easily
when you grow.

With foreign keys, again, I'm not
sure.

Actually, the customer I mentioned,
they removed some transactions.

And the problem has gone.

So I think sometimes these problems
come together.

For example, we know that, as we
discussed, foreign keys, they

put a SelectForShare lock.

And this causes multiexact mechanism
being involved.

But sometimes if you work with
SelectForUpdate, you work with

SelectForUpdate.

You don't expect multiexact mechanism
to be involved.

But if sub-transactions are there,
SelectForUpdate might lead

to multiexact mechanism because
every nesting level of your

nested transaction or sub-transaction
is considered a separate

transaction when Postgres puts
xmax value.

And if you lock row SelectForUpdate
and then define save point

and then perform update, this is
similar to select for share.

Although it's select for update.

Because basically multiple transactions
are trying to lock the

same row, so multiexact is needed
here.

And there is a blog post, I don't
remember, a very good blog

post about Performance Cliff.

This is a very blog post which
said, sub-transactions are cursed,

just remove them.

And I became a follower of this
approach.

So in that case, with that customer,
we had 2 hypotheses.

Either sub-transactions removal
will help or it won't help just

because they had this case we demonstrated
in this experiment.

Fortunately, it was related to
sub-transactions, but who knows,

maybe in some case we will see
that no sub-transactions, but

still performance cliff like that.

Does it make sense?

Michael: Yeah, for sure.

And I think I was just looking
up the blog post because it was

bugging me.

Are you talking about the 1 from
Nelson L.

Hage?

Nikolay: I think so.

Yeah, yeah.

I included it to my 4 cases with
sub-transactions.

Michael: We did an episode on sub-transactions
and I found that

in the show notes.

I'll double-check it and I'll link
it up if that's the

Nikolay: right 1.

It's very good, very good blog
post, very precise.

It doesn't explore too far this
situation with Select for Update

and foreign keys.

I think it even doesn't mention
foreign keys.

Yeah, because they had different
problem, but for sure it was

a good starting point for me some
time ago to start understanding

that multi-exact IDs also might
be dangerous.

It's also a cliff due to various
reasons.

So I think we need to postpone
exact answer to this.

And maybe...

Michael: Matthew 14 I had 1 more
thing I wanted to add.

I think there's like, there's definitely
people I know who, who

have been using Postgres for 5
or 10 years and they're riding

this wave of, they adopted Postgres
when it didn't have all the

features they thought they were
going to need.

But as they've needed them, Postgres
has been adding them.

As they've scaled, their business
has been able to stay on Postgres

because it got replication features
or it got parallel query

features or some things that they
started to be able to really

benefit from Postgres improve as
they needed improvements.

So it feels like this could be
another 1 of those things that

Postgres is going to get better
in this area.

There's enough noise around these
things.

You mentioned some, even if these
don't, even if these get reverted

in 17 and don't make it in 17,
something will probably make it

into 18 or 19 or 20.

And there's enough people at scale
pushing Postgres to these

limits that a lot of us that aren't
there yet can probably get

away with adding foreign keys,
using them even in cases where

maybe it's not recommended at scale,
and maybe Postgres will

be improved or will have configurable
settings by the time we

do need them.

Nikolay: Yeah, yeah.

Makes sense.

Maybe.

Yeah, we need to explore those
settings and test various edge

cases.

I would say these performance-critical
cliffheads should be called

edge case.

Michael: Yeah, or corner case is
easier.

Nikolay: Corner case, let's say
this, what I just described,

sub-transactions plus select for
update, this is corner cases.

Michael: 2 edges where they meet.

Nikolay: Exactly, exactly.

This is a perfect example of corner
case.

But others are edge cases.

We just need to explore the territory
and find all the edges

and think if it's possible to move
them to have more room when

needed.

Or just, yeah, at some point probably
foreign keys should be

just dropped in some cases.

Sometimes, maybe.

Michael: Good.

Well, we spoke to some people who
don't use them.

So that's really, yeah.

Nikolay: But in general, I keep
using them.

I keep using them everywhere.

Yeah, me too.

And I keep suggesting use them.

Michael: Great.

Nikolay: Pay your tax.

Michael: And society will be better.

Nikolay: But don't get into jail
with these cliffs.

Michael: Interesting, yeah, thank
you so much Nikolay as always

and catch you next week.

More episodes

Chapters

Creators & Guests

What is Postgres FM?