1
00:00:00,060 --> 00:00:02,560
Michael: Hello and welcome to PostgresFM,
a weekly show about

2
00:00:02,560 --> 00:00:03,460
all things PostgreSQL.

3
00:00:03,680 --> 00:00:06,340
I am Michael, founder of pgMustard,
and this is my co-host

4
00:00:06,340 --> 00:00:08,000
Nikolay, founder of Postgres.AI.

5
00:00:08,000 --> 00:00:10,220
Hey Nikolay, what are we talking
about this week?

6
00:00:10,800 --> 00:00:11,600
Nikolay: Hi Michael.

7
00:00:12,180 --> 00:00:13,880
How was your vacation?

8
00:00:15,180 --> 00:00:16,660
Michael: Oh yeah, really good.

9
00:00:16,720 --> 00:00:20,020
I had a really nice week off pretty
much completely, which is

10
00:00:20,020 --> 00:00:21,740
nice and rare as a founder.

11
00:00:21,740 --> 00:00:22,800
I think you know that.

12
00:00:22,800 --> 00:00:26,580
But I did enjoy listening to the
podcast as a listener for the

13
00:00:26,580 --> 00:00:27,720
first time in a while.

14
00:00:27,720 --> 00:00:29,760
So yeah, really enjoyed your episode
last week.

15
00:00:29,760 --> 00:00:30,820
Thanks for doing that.

16
00:00:31,380 --> 00:00:32,080
Nikolay: Good to hear.

17
00:00:32,080 --> 00:00:37,360
But let's talk about the question,
do we need foreign keys or

18
00:00:37,360 --> 00:00:39,220
we only need American keys?

19
00:00:41,320 --> 00:00:43,660
Michael: Yeah, the word foreign
is interesting, isn't it?

20
00:00:43,660 --> 00:00:44,060
Nikolay: Right.

21
00:00:44,060 --> 00:00:45,700
Foreign tables, foreign keys.

22
00:00:46,420 --> 00:00:46,920
Michael: Yeah.

23
00:00:47,940 --> 00:00:48,140
Yeah.

24
00:00:48,140 --> 00:00:51,800
Should we use them or like when
should we use them, perhaps,

25
00:00:52,040 --> 00:00:53,020
if not always?

26
00:00:53,420 --> 00:00:54,240
What are the pros?

27
00:00:54,240 --> 00:00:55,180
What are the cons?

28
00:00:55,460 --> 00:00:57,380
In what cases can they be a problem?

29
00:00:57,380 --> 00:00:59,160
Or what cases can they really help?

30
00:00:59,160 --> 00:01:00,080
That kind of thing.

31
00:01:00,480 --> 00:01:00,980
Nikolay: Right.

32
00:01:01,560 --> 00:01:05,320
Well, I think it's obvious that
it's better to use them if you

33
00:01:05,320 --> 00:01:10,060
don't expect heavy loads, but if
you are preparing for heavy

34
00:01:10,060 --> 00:01:15,000
loads, until what point you can
use them, right?

35
00:01:15,660 --> 00:01:18,680
And I think dropping them is easier,
maybe.

36
00:01:18,740 --> 00:01:19,780
Maybe not, actually.

37
00:01:19,780 --> 00:01:23,900
If you already designed your system
to, for example, to delete,

38
00:01:24,280 --> 00:01:28,860
to propagate deletes using cascade
option, if you drop foreign

39
00:01:28,860 --> 00:01:31,240
keys, you lose some functionality,
right?

40
00:01:31,500 --> 00:01:32,660
Michael: Oh, I see what you mean.

41
00:01:32,660 --> 00:01:35,380
So like, maybe we'll get to this
at the end, but maybe if you

42
00:01:35,380 --> 00:01:39,620
start with them, and then if they
become a problem, or if you

43
00:01:39,620 --> 00:01:43,860
start to get success and lots and
lots of scale, or lots and

44
00:01:43,860 --> 00:01:48,040
lots of load, migrating to maintaining
that integrity of the

45
00:01:48,040 --> 00:01:52,080
system, quality of data without
them, and processes around data

46
00:01:52,080 --> 00:01:53,660
deletion, things like that.

47
00:01:53,680 --> 00:01:57,380
Nikolay: Right, so they can give
you not only quality of data,

48
00:01:57,380 --> 00:01:58,520
but also some functionality.

49
00:01:59,340 --> 00:02:00,260
Cascade deletes.

50
00:02:00,560 --> 00:02:04,280
And if you rely on them, dropping
to solve some performance problems

51
00:02:04,280 --> 00:02:06,000
in the future will be problematic.

52
00:02:06,580 --> 00:02:10,240
I think I've thought about it many
times, but we have this also,

53
00:02:10,240 --> 00:02:10,460
right?

54
00:02:10,460 --> 00:02:13,760
We cannot drop them sometimes because
we will lose functionality

55
00:02:14,180 --> 00:02:15,560
and this is not good.

56
00:02:16,160 --> 00:02:19,840
But in general, there are people,
I know there are people who

57
00:02:19,840 --> 00:02:21,420
think you should avoid them.

58
00:02:21,780 --> 00:02:25,020
And you know, for example, I can
tell you, you should avoid sub-transactions.

59
00:02:25,400 --> 00:02:28,760
This is my position and I can prove
why you should avoid sub-transactions

60
00:02:28,940 --> 00:02:30,800
unless it's absolutely necessary.

61
00:02:31,980 --> 00:02:38,640
Should we apply similar logic to
foreign keys after you saw some

62
00:02:38,640 --> 00:02:40,740
examples of bad behavior or something?

63
00:02:40,840 --> 00:02:41,960
Or it's different?

64
00:02:41,960 --> 00:02:44,440
This is a question I have in my
head.

65
00:02:46,060 --> 00:02:50,500
And honestly, in this morning,
thinking about this episode, I

66
00:02:50,500 --> 00:02:54,140
was thinking, oh, there are new
options in Postgres 17 we should

67
00:02:54,140 --> 00:02:54,640
explore.

68
00:02:54,640 --> 00:02:55,460
Why we didn't explore?

69
00:02:55,460 --> 00:02:59,440
And then I realized, okay, okay,
it's only beta 1 of Postgres

70
00:02:59,440 --> 00:02:59,820
17.

71
00:02:59,820 --> 00:03:01,020
It's still too early.

72
00:03:01,120 --> 00:03:05,680
But I can't wait when we will see
new results, because we will

73
00:03:05,680 --> 00:03:08,300
talk about it later in this podcast,
in this episode.

74
00:03:08,680 --> 00:03:14,140
But Postgres 17 will bring some
new settings, which are interesting

75
00:03:14,140 --> 00:03:17,840
in the context of some performance
issues you can have with foreign

76
00:03:17,840 --> 00:03:19,060
keys being used.

77
00:03:19,540 --> 00:03:21,920
But let's start with simple things
first.

78
00:03:22,580 --> 00:03:23,260
Michael: Good idea.

79
00:03:24,380 --> 00:03:29,400
Nikolay: We had an episode about
constraints, and we mentioned

80
00:03:29,540 --> 00:03:31,120
there are 6 constraints, right?

81
00:03:31,120 --> 00:03:35,580
6 constraints, and foreign keys
is 1 of the types Postgres offers.

82
00:03:37,240 --> 00:03:42,040
Usually I say foreign keys are
good because I trust database

83
00:03:42,040 --> 00:03:46,440
system much better than anything
else when we talk about data

84
00:03:46,440 --> 00:03:51,340
quality and constraints and integrity
and so on, and referential

85
00:03:51,600 --> 00:03:54,560
integrity, what foreign keys offer
us.

86
00:03:54,560 --> 00:03:59,340
You cannot maintain it in complex
systems without foreign keys,

87
00:03:59,340 --> 00:04:03,260
because even if you implemented
it on your application code,

88
00:04:03,260 --> 00:04:08,600
for example, Python or Ruby, later
your system can be becoming

89
00:04:08,600 --> 00:04:09,640
more and more complex.

90
00:04:10,900 --> 00:04:17,260
And if someone brings other interfaces
to database, new application

91
00:04:18,060 --> 00:04:21,280
code, like in different language,
different frameworks, for example,

92
00:04:21,280 --> 00:04:25,020
or someone is working directly
with database somehow.

93
00:04:25,040 --> 00:04:29,960
It's not uncommon to see multiple
types of code working with

94
00:04:29,960 --> 00:04:31,220
the database, right?

95
00:04:31,340 --> 00:04:36,820
In this case, you have already
multiple implementations of foreign

96
00:04:36,820 --> 00:04:41,460
key, or not foreign key, referential
integrity checks and enforcement

97
00:04:42,340 --> 00:04:42,840
logic.

98
00:04:43,520 --> 00:04:48,420
It means that, For example, imagine
you have Ruby code and you

99
00:04:48,420 --> 00:04:51,800
have Python code somehow working
with the same database.

100
00:04:52,200 --> 00:04:55,340
They both need to have this logic,
and it's hard to maintain.

101
00:04:55,340 --> 00:04:56,020
this logic.

102
00:04:56,910 --> 00:05:03,260
Eventually you will see some cases
where this integrity is not

103
00:05:03,260 --> 00:05:03,760
enforced.

104
00:05:04,440 --> 00:05:08,460
So, while in database foreign keys
are implemented internally

105
00:05:08,540 --> 00:05:13,940
using triggers, triggers is a good
thing in terms of like, it's

106
00:05:13,940 --> 00:05:14,440
inevitable.

107
00:05:15,040 --> 00:05:19,140
Of course, you can say ALTER TABLE
DISABLE TRIGGERS ALL.

108
00:05:19,400 --> 00:05:24,740
In this case, it will disable all
triggers, including these implicit

109
00:05:24,960 --> 00:05:29,840
system triggers, which are supporting
foreign keys.

110
00:05:30,060 --> 00:05:33,500
This is when you can see them in
backslash d table name.

111
00:05:33,600 --> 00:05:38,560
If you ALTER TABLE, DISABLE TRIGGER
all, and then backslash d,

112
00:05:38,560 --> 00:05:42,180
you will suddenly see that, oh,
this table has some triggers.

113
00:05:42,520 --> 00:05:47,740
Because usually they are hidden,
backslash D doesn't show them.

114
00:05:47,740 --> 00:05:51,060
But when they are disabled, suddenly
they show up.

115
00:05:51,820 --> 00:05:53,440
So it's a funny trick.

116
00:05:56,600 --> 00:06:02,100
And these foreign key, these triggers,
unless you disable them,

117
00:06:02,100 --> 00:06:05,160
and you should not disable them
of course, unless you know what

118
00:06:05,160 --> 00:06:05,820
you do.

119
00:06:06,580 --> 00:06:10,520
I mean, during some massive operations
and so on, but in this

120
00:06:10,520 --> 00:06:14,980
case, it's on your shoulders to
ensure that they are followed,

121
00:06:15,360 --> 00:06:19,920
Because when you enable them back,
Postgres won't check them

122
00:06:19,920 --> 00:06:21,340
for existing data.

123
00:06:22,360 --> 00:06:25,700
So if they are enabled, they are
inevitable to work.

124
00:06:26,000 --> 00:06:32,280
So any write you perform is checked
using those triggers, and

125
00:06:32,280 --> 00:06:36,000
if write breaks these rules, it
won't happen.

126
00:06:36,000 --> 00:06:37,400
It will be rolled back.

127
00:06:38,080 --> 00:06:38,900
Which is good.

128
00:06:39,060 --> 00:06:44,040
So it's like more trustworthy approach,
checks on database side.

129
00:06:46,100 --> 00:06:46,760
Make sense?

130
00:06:47,360 --> 00:06:52,060
Michael: Yeah, so I mean to be
super explicit, if you try and

131
00:06:52,060 --> 00:06:56,960
insert a row in the table that
has a referencing column, and

132
00:06:56,960 --> 00:07:01,580
you include an ID that isn't in
the reference table, then you'll

133
00:07:01,580 --> 00:07:03,360
get an error saying so.

134
00:07:04,060 --> 00:07:05,220
That kind of thing.

135
00:07:06,220 --> 00:07:07,080
Nikolay: Right, right.

136
00:07:07,540 --> 00:07:09,440
So this is a good thing, obviously.

137
00:07:09,960 --> 00:07:14,080
Better quality of data, but of
course there are complexities

138
00:07:14,900 --> 00:07:16,740
under heavy loads in large systems.

139
00:07:17,860 --> 00:07:21,040
1 of the complexities is when you
define a foreign key for existing

140
00:07:21,100 --> 00:07:26,880
large tables, or you define foreign
key for a new table, but

141
00:07:26,880 --> 00:07:28,600
it points to a large table.

142
00:07:29,340 --> 00:07:29,840
Yeah.

143
00:07:30,200 --> 00:07:31,220
Of course, in this case-

144
00:07:31,220 --> 00:07:33,400
Michael: not just large, but busy,
right?

145
00:07:33,820 --> 00:07:34,760
Nikolay: Busy, okay.

146
00:07:34,760 --> 00:07:36,420
Michael: Or active in any way,
yeah.

147
00:07:36,580 --> 00:07:38,560
Nikolay: Well, there are 2 types
of problems here.

148
00:07:38,560 --> 00:07:43,040
One is related to busyness, another
is related to large volumes

149
00:07:43,040 --> 00:07:43,660
of data.

150
00:07:43,660 --> 00:07:44,840
Anyway, we need to...

151
00:07:45,900 --> 00:07:48,580
Postgres needs to acquire locks
on both sides,

152
00:07:49,460 --> 00:07:49,960
Michael: which is challenging.

153
00:07:49,960 --> 00:07:50,780
Both tables.

154
00:07:51,260 --> 00:07:51,760
Nikolay: Right.

155
00:07:52,360 --> 00:07:54,100
Different types of logs, but anyway.

156
00:07:54,520 --> 00:08:00,640
And second, it needs to ensure
that existing data already complies

157
00:08:00,820 --> 00:08:02,860
with our constraint.

158
00:08:02,860 --> 00:08:04,040
Michael: Constraint, yeah.

159
00:08:04,080 --> 00:08:04,580
Nikolay: Right.

160
00:08:05,320 --> 00:08:11,680
And if you do it straight, like
in a regular way, in general,

161
00:08:11,820 --> 00:08:12,880
you will have issues.

162
00:08:13,360 --> 00:08:14,480
High risk of issues.

163
00:08:14,480 --> 00:08:16,820
And the risk is becoming higher
and higher.

164
00:08:17,300 --> 00:08:22,540
The more you have data, the higher
are TPS, the higher risks

165
00:08:22,540 --> 00:08:23,040
are.

166
00:08:23,180 --> 00:08:27,460
So at some point, and it's better
to do it earlier, you need

167
00:08:27,460 --> 00:08:30,860
to install foreign keys in a proper
way.

168
00:08:31,080 --> 00:08:32,060
So you need...

169
00:08:33,420 --> 00:08:34,920
Fortunately, Postgres supports...

170
00:08:34,920 --> 00:08:36,800
We discussed this, but let's repeat.

171
00:08:37,360 --> 00:08:42,240
Postgres supports a two-step approach
for constraint creation

172
00:08:42,280 --> 00:08:44,440
here, similar to check constraint.

173
00:08:44,620 --> 00:08:50,040
You can define foreign keys as
not valid state, flag, rights.

174
00:08:50,340 --> 00:08:55,860
It means that they will have flag
not valid, but important to

175
00:08:55,860 --> 00:08:59,060
remember, it doesn't mean they
are not working.

176
00:08:59,060 --> 00:09:01,920
They are immediately working for
all new rights.

177
00:09:02,220 --> 00:09:05,580
They are just not checked for existing
data.

178
00:09:06,180 --> 00:09:10,800
I remember I had a super big mistake
when I designed some complex

179
00:09:10,800 --> 00:09:15,160
procedure related to int4
to int8 conversion, and I

180
00:09:15,160 --> 00:09:20,320
forgot that so-called invalid foreign
keys are working for new

181
00:09:20,320 --> 00:09:25,080
rights, so I kept old foreign keys
at some point just for the

182
00:09:25,080 --> 00:09:30,640
sake of being able to roll back,
I mean to revert all operations.

183
00:09:31,000 --> 00:09:35,280
I decided, oh, I need them because
if a decision will be made

184
00:09:35,280 --> 00:09:38,000
to revert changes, I will already
have them.

185
00:09:38,000 --> 00:09:39,440
I will just validate them.

186
00:09:40,160 --> 00:09:42,940
But new rights were not followed.

187
00:09:44,380 --> 00:09:46,140
New rights were blocked, basically.

188
00:09:46,560 --> 00:09:49,180
And this was a big mistake I learned.

189
00:09:49,920 --> 00:09:52,740
So it happened on production and
it was terrible.

190
00:09:53,300 --> 00:09:54,380
It was my mistake.

191
00:09:54,520 --> 00:09:57,680
So not valid doesn't mean it doesn't
work.

192
00:09:57,680 --> 00:10:01,980
Similar to check constraint, and
actually indexes, if you say

193
00:10:02,080 --> 00:10:06,600
it is valid, false, which is not
actually recommended due to

194
00:10:06,600 --> 00:10:07,480
different reasons.

195
00:10:07,900 --> 00:10:10,060
It also means index actually is
maintained.

196
00:10:10,200 --> 00:10:12,940
So foreign key is maintained for
all new rights.

197
00:10:13,500 --> 00:10:16,840
And then second step, you say alter
table validate.

198
00:10:17,620 --> 00:10:18,720
This is not blocking.

199
00:10:20,320 --> 00:10:21,880
It's blocking briefly, right?

200
00:10:22,420 --> 00:10:30,140
But if you have huge dataset to
check, okay, your DML queries

201
00:10:30,140 --> 00:10:31,080
are not blocked.

202
00:10:31,980 --> 00:10:36,700
Again, they are blocked very quickly,
like briefly, for a short

203
00:10:36,700 --> 00:10:37,660
period of time.

204
00:10:38,140 --> 00:10:43,660
Anyway, this is not a trivial operation,
but it's already well

205
00:10:43,660 --> 00:10:45,160
described in various sources.

206
00:10:45,160 --> 00:10:50,580
For example, as usual, I recommend
GitLab migration style guide.

207
00:10:50,580 --> 00:10:53,580
It's called migration style guide,
but it's about database migrations,

208
00:10:53,600 --> 00:10:56,840
it's about DDL basically, how to
deploy DDL under heavy load

209
00:10:56,840 --> 00:10:57,340
reliably.

210
00:10:58,080 --> 00:11:03,540
So, and they have Ruby code, which
is called Migration Helpers

211
00:11:03,640 --> 00:11:09,240
RB, which demonstrates how to do
it reliably under any heavy

212
00:11:09,240 --> 00:11:09,740
load.

213
00:11:11,200 --> 00:11:15,480
So yeah, it's 1 thing to remember,
you cannot just create them

214
00:11:15,480 --> 00:11:15,980
easily.

215
00:11:17,040 --> 00:11:19,020
But this is true for any DDL.

216
00:11:19,600 --> 00:11:20,660
DDL is dangerous.

217
00:11:21,140 --> 00:11:22,940
We discussed this multiple times.

218
00:11:23,200 --> 00:11:28,060
There's also a headache associated
with installing them to partition

219
00:11:28,080 --> 00:11:28,580
tables.

220
00:11:29,920 --> 00:11:30,420
Right?

221
00:11:30,980 --> 00:11:31,480
Yeah.

222
00:11:31,920 --> 00:11:33,300
I always forget details.

223
00:11:33,340 --> 00:11:36,360
Every time I deal with it, I need
to double check details.

224
00:11:36,560 --> 00:11:39,800
It's always an issue.

225
00:11:39,800 --> 00:11:41,120
Michael: And it changes, right?

226
00:11:41,120 --> 00:11:44,560
The restrictions around partition
tables change every version.

227
00:11:45,020 --> 00:11:45,520
Exactly.

228
00:11:45,700 --> 00:11:47,080
It improves each version.

229
00:11:47,080 --> 00:11:50,940
So things that I thought were not
possible become possible.

230
00:11:51,340 --> 00:11:54,780
So it's good that we check documentation,
documentation is great,

231
00:11:54,960 --> 00:11:58,000
but it's difficult to remember
the details version to version.

232
00:11:58,260 --> 00:11:59,980
Nikolay: Right, the changes are
in a good direction.

233
00:12:00,620 --> 00:12:01,060
Michael: Exactly.

234
00:12:01,060 --> 00:12:01,560
Nikolay: Which is good.

235
00:12:01,560 --> 00:12:05,980
So yeah, if you're an old version,
it's one thing, another version,

236
00:12:05,980 --> 00:12:08,000
like newer version, different thing.

237
00:12:08,000 --> 00:12:12,540
But in general, on newer version,
a lot of things are possible.

238
00:12:12,720 --> 00:12:14,740
So, so yeah.

239
00:12:15,060 --> 00:12:18,760
Michael: One last thing that you
normally mention around creating

240
00:12:18,900 --> 00:12:25,460
these on large tables with, or
like busy or large tables, is

241
00:12:25,840 --> 00:12:28,720
the idea of having timeouts and
retries.

242
00:12:29,540 --> 00:12:33,100
So I think that, I don't think
you mentioned that this time,

243
00:12:33,100 --> 00:12:35,080
but it's an important extra point.

244
00:12:35,080 --> 00:12:36,440
And I'll link to your...

245
00:12:36,540 --> 00:12:38,240
You did a good Twitter...

246
00:12:38,440 --> 00:12:40,920
When you were doing a Twitter marathon,
you did a great post

247
00:12:40,920 --> 00:12:41,640
on this.

248
00:12:41,760 --> 00:12:43,220
I'll link that up as well.

249
00:12:43,260 --> 00:12:44,680
Nikolay: Yeah, I already forgot
about this.

250
00:12:44,680 --> 00:12:45,560
But Yeah, right.

251
00:12:45,560 --> 00:12:49,380
In general, if you need logs, you
can do it explicitly with log

252
00:12:49,380 --> 00:12:51,260
table, but you need to do it with...

253
00:12:51,660 --> 00:12:56,660
So basically, if a two-step approach,
like invalid and then validate,

254
00:12:57,180 --> 00:13:00,160
is implemented in Postgres, generally
you don't need it.

255
00:13:00,720 --> 00:13:03,900
For custom checks and for foreign
keys.

256
00:13:04,080 --> 00:13:09,480
But for some things like adding
some column actually, or dropping

257
00:13:09,480 --> 00:13:10,500
column, it's needed.

258
00:13:10,560 --> 00:13:15,400
You need to deal with locks and
do it with lock timeout and retries

259
00:13:15,800 --> 00:13:16,460
as usual.

260
00:13:16,460 --> 00:13:17,540
But with foreign keys...

261
00:13:17,540 --> 00:13:19,420
Michael: Why not do it with this
1 though?

262
00:13:19,540 --> 00:13:20,380
Like, I would...

263
00:13:20,380 --> 00:13:21,760
I still think it's sensible.

264
00:13:21,760 --> 00:13:25,380
I think, for example, you gave
a good example of if autovacuum

265
00:13:25,440 --> 00:13:28,580
is running in the heavy mode, in

266
00:13:28,580 --> 00:13:29,140
Nikolay: the transaction...

267
00:13:29,140 --> 00:13:30,080
Michael: You are right.

268
00:13:30,560 --> 00:13:31,280
Nikolay: You are right.

269
00:13:31,280 --> 00:13:31,820
You are right.

270
00:13:31,820 --> 00:13:32,220
Yeah.

271
00:13:32,220 --> 00:13:32,720
Yeah.

272
00:13:33,100 --> 00:13:35,600
Michael: So if it doesn't yield
its luck or something else isn't

273
00:13:35,600 --> 00:13:38,940
yielding a luck that you don't
know about, it's so helpful then

274
00:13:38,940 --> 00:13:41,260
to have that timeout on a retry.

275
00:13:41,940 --> 00:13:42,660
Nikolay: You know what?

276
00:13:42,660 --> 00:13:44,520
In general, DDL is hard.

277
00:13:45,280 --> 00:13:49,940
Like under heavy load in large
systems, it's really hard.

278
00:13:50,000 --> 00:13:53,320
I wish it was much simpler in Postgres.

279
00:13:53,360 --> 00:13:56,660
I think there is a huge room for
improvement step by step, and

280
00:13:56,660 --> 00:13:58,480
I hope it will be improved.

281
00:14:00,140 --> 00:14:03,660
But now we need to, it's like art.

282
00:14:04,540 --> 00:14:11,680
You need to improve your tools
and libraries and so on.

283
00:14:11,680 --> 00:14:12,360
You are right.

284
00:14:12,360 --> 00:14:18,400
Even when we install a foreign
key with not valid flag, we still

285
00:14:18,400 --> 00:14:22,780
need logs on both tables, right?

286
00:14:22,860 --> 00:14:26,500
And these logs, if they cannot
be obtained, they will block us.

287
00:14:26,500 --> 00:14:30,700
And if they block us, we start
blocking everything else, including

288
00:14:30,700 --> 00:14:33,900
SELECTs, because it's DDL and SELECTs
is waiting.

289
00:14:34,820 --> 00:14:37,560
So, yeah, it's kind of...

290
00:14:37,820 --> 00:14:40,580
I think for SELECTs, in this case,
it's not that...

291
00:14:41,520 --> 00:14:42,260
Maybe not.

292
00:14:42,260 --> 00:14:43,680
Michael: Or inserts, for example.

293
00:14:43,940 --> 00:14:48,000
Nikolay: Yeah, we need to check,
we need to carefully check types

294
00:14:48,000 --> 00:14:51,140
of locks that need to be installed
on both cases.

295
00:14:51,220 --> 00:14:53,740
I only remember they are different
on both sides.

296
00:14:54,560 --> 00:14:57,680
But in general, this should be
carefully designed.

297
00:14:58,180 --> 00:15:01,580
It's better to follow existing
experience.

298
00:15:01,580 --> 00:15:05,460
For example, GitLabs, maybe some
frameworks already implemented.

299
00:15:05,740 --> 00:15:06,300
I don't know.

300
00:15:06,300 --> 00:15:12,180
Django, I think, Ruby on Rails
in general, they don't offer good

301
00:15:12,180 --> 00:15:13,220
automation here.

302
00:15:13,440 --> 00:15:14,360
It's not enough.

303
00:15:15,040 --> 00:15:19,580
So if you're preparing for heavy
loads, it's definitely worth

304
00:15:20,140 --> 00:15:24,800
keeping an eye on this topic and
maybe to check it earlier than

305
00:15:24,800 --> 00:15:28,580
it starts biting you, because sooner
or later it will.

306
00:15:30,300 --> 00:15:35,320
So first it bites, nobody notices,
right?

307
00:15:35,580 --> 00:15:39,640
Okay, some spike of latencies,
a few seconds during deployment.

308
00:15:40,380 --> 00:15:44,580
But then at some point, especially
if longer transactions are

309
00:15:44,580 --> 00:15:48,860
allowed, at some point it can bite
you so heavily, so you will

310
00:15:48,860 --> 00:15:51,760
notice downtime a few minutes,
for example, not good.

311
00:15:51,760 --> 00:15:54,820
So it's better to double-check
carefully, step by step, and rely

312
00:15:54,820 --> 00:15:56,820
on other people's experience.

313
00:15:57,740 --> 00:16:01,840
But you are right, we need to deal
with low log_timeout and retries

314
00:16:02,240 --> 00:16:03,380
here as well.

315
00:16:03,780 --> 00:16:06,980
And I think especially for partition
table case you need it.

316
00:16:07,940 --> 00:16:09,560
Michael: Why especially there?

317
00:16:10,080 --> 00:16:13,500
Nikolay: Because I think, I don't
remember details, but I remember

318
00:16:13,660 --> 00:16:19,860
when you create foreign key on
partition table, you need more

319
00:16:19,860 --> 00:16:21,300
actions to be done.

320
00:16:21,420 --> 00:16:21,920
First,

321
00:16:22,540 --> 00:16:22,800
Michael: yeah.

322
00:16:22,800 --> 00:16:25,420
Would you have to do it on each
partition and then...

323
00:16:25,760 --> 00:16:26,260
I

324
00:16:26,600 --> 00:16:31,620
Nikolay: think at least until some
version, you cannot create

325
00:16:31,640 --> 00:16:34,780
a not valid foreign key on partition
table.

326
00:16:34,860 --> 00:16:35,740
This is a problem.

327
00:16:35,740 --> 00:16:36,540
Michael: Oh, interesting.

328
00:16:36,980 --> 00:16:37,920
Nikolay: Right, right.

329
00:16:38,300 --> 00:16:38,800
Yeah.

330
00:16:38,940 --> 00:16:40,700
But again, I don't...

331
00:16:41,260 --> 00:16:44,200
Yes, it differs for each version
and so on.

332
00:16:44,340 --> 00:16:48,540
It should be carefully studied,
tested, and so on.

333
00:16:48,540 --> 00:16:53,060
But with proper understanding that
locks are needed, it's solvable.

334
00:16:53,480 --> 00:16:53,980
Right?

335
00:16:54,140 --> 00:16:54,780
It's solvable.

336
00:16:55,680 --> 00:16:57,320
So maintenance overhead, basically.

337
00:16:58,140 --> 00:16:59,100
There is some maintenance.

338
00:16:59,100 --> 00:17:00,920
Oh, let's mention 1 more thing.

339
00:17:01,500 --> 00:17:02,980
Maybe people don't...

340
00:17:03,520 --> 00:17:04,920
Not everyone realizes it.

341
00:17:04,920 --> 00:17:08,360
We also didn't realize until our
clients at some point, it was

342
00:17:08,360 --> 00:17:11,180
a company called Miro, which is
very popular now.

343
00:17:11,600 --> 00:17:14,760
We helped them with some things
to fight bloat and so on.

344
00:17:14,760 --> 00:17:18,480
And we offered to use pg_repack
to fight bloat.

345
00:17:18,960 --> 00:17:22,160
And they had at that time deferred
constraints.

346
00:17:23,160 --> 00:17:27,540
So foreign key constraint checks
were done at commit time, not

347
00:17:27,540 --> 00:17:28,040
immediately.

348
00:17:29,280 --> 00:17:32,380
And I remember that it caused some
issues with pg_repack.

349
00:17:32,380 --> 00:17:34,080
Again, I don't remember all the
details.

350
00:17:34,080 --> 00:17:36,060
There is a good blog post about
this.

351
00:17:36,060 --> 00:17:37,760
I will send it to you.

352
00:17:38,420 --> 00:17:43,260
But additional maintenance overhead,
which is caused by the presence

353
00:17:43,260 --> 00:17:46,980
of foreign keys in this special
state, deferred constraints,

354
00:17:47,080 --> 00:17:48,500
deferred foreign keys.

355
00:17:49,940 --> 00:17:54,640
So definitely if you use foreign
keys, you need to pay attention

356
00:17:54,640 --> 00:17:57,860
to different things when you do
some operations with your database.

357
00:17:58,360 --> 00:17:59,400
This is true.

358
00:18:00,040 --> 00:18:01,480
But I would say it's worth it,
right?

359
00:18:01,480 --> 00:18:04,940
Because you have good referential
integrity and so on.

360
00:18:05,240 --> 00:18:10,180
So I would still choose foreign
keys in spite of these problems,

361
00:18:10,860 --> 00:18:11,360
right?

362
00:18:11,840 --> 00:18:12,340
Yeah,

363
00:18:12,880 --> 00:18:14,080
Michael: I tend to agree.

364
00:18:14,140 --> 00:18:19,640
I think the vast majority of us
are working on systems that don't

365
00:18:19,640 --> 00:18:22,620
have the characteristics where
foreign keys cause problems.

366
00:18:23,000 --> 00:18:26,600
I know we had an episode only a
couple ago where we had some

367
00:18:26,600 --> 00:18:28,020
possible exceptions to that.

368
00:18:28,020 --> 00:18:29,980
Well, maybe we'll get to that in
a moment.

369
00:18:30,180 --> 00:18:33,580
Nikolay: You talk about 100 terabyte
episode, which was also

370
00:18:33,580 --> 00:18:34,760
episode number 100.

371
00:18:34,760 --> 00:18:38,180
We have received good feedback
about this, considered as the

372
00:18:38,180 --> 00:18:39,720
best episode we had.

373
00:18:40,140 --> 00:18:41,040
So, yeah.

374
00:18:41,040 --> 00:18:44,840
Actually, someone told this was
the best podcast episode.

375
00:18:45,400 --> 00:18:46,180
What's so like?

376
00:18:46,180 --> 00:18:47,740
Yeah, We had great guests.

377
00:18:47,980 --> 00:18:51,400
Again, thank you because you organized
all the invitations and

378
00:18:51,400 --> 00:18:52,060
so on.

379
00:18:52,660 --> 00:18:54,020
Michael: I was just pleased to
be there.

380
00:18:54,020 --> 00:18:56,400
It was a good fun 1 to record as
well.

381
00:18:56,400 --> 00:18:59,600
But yeah, not just the fact that
they're 100 terabytes, right?

382
00:18:59,600 --> 00:19:03,420
The fact that they're heavily loaded
systems and they're constantly

383
00:19:03,640 --> 00:19:08,300
busy and they have to worry about
a lot of these scaling limits

384
00:19:08,360 --> 00:19:12,080
and I guess keep hitting different
cliffs or different

385
00:19:13,080 --> 00:19:13,580
Nikolay: cliffs.

386
00:19:14,220 --> 00:19:19,040
Michael: Maybe that's a bad word
but like limits of what you

387
00:19:19,040 --> 00:19:23,360
can do or how many of a certain
thing you can exhaust before

388
00:19:24,140 --> 00:19:26,020
falling back to a different method.

389
00:19:26,180 --> 00:19:31,160
But 99.9% of us are not working
with those systems and don't

390
00:19:31,160 --> 00:19:33,520
have to worry about those and don't
have to worry about them

391
00:19:33,520 --> 00:19:34,440
initially as well.

392
00:19:34,440 --> 00:19:39,620
So I think it was Sammy from Figma
who said as well, and I think

393
00:19:39,620 --> 00:19:43,700
I agree with this, at the beginning
maybe you don't have to design

394
00:19:43,700 --> 00:19:46,680
for this kind of thing and if you
have to solve this, it's a

395
00:19:46,680 --> 00:19:47,240
good problem.

396
00:19:47,240 --> 00:19:48,900
You're probably successful in other
ways.

397
00:19:48,900 --> 00:19:53,300
If you've got to a heavily loaded
Postgres and you're not making

398
00:19:53,300 --> 00:19:57,100
enough money to pay people to fix
this kind of problem, you've

399
00:19:57,100 --> 00:19:58,540
probably done something wrong.

400
00:19:59,380 --> 00:20:02,800
You need a business model that
works with that load, and all

401
00:20:02,800 --> 00:20:04,020
of them did that successfully.

402
00:20:04,300 --> 00:20:08,260
And I can't think of an example
where a company got too successful

403
00:20:09,020 --> 00:20:11,760
with load, but not with finances.

404
00:20:12,120 --> 00:20:15,200
So I think they can normally solve
these problems.

405
00:20:16,120 --> 00:20:18,220
Nikolay: Well, I not fully agree.

406
00:20:18,220 --> 00:20:23,540
I think, yeah, let's talk about
cleaves, but first let's talk

407
00:20:23,540 --> 00:20:30,660
about performance overhead, foreign
keys, and PgBench supports

408
00:20:30,680 --> 00:20:34,040
foreign keys option, which is not
used by default.

409
00:20:35,020 --> 00:20:38,980
And before this podcast, we were
very curious, we didn't like...

410
00:20:39,280 --> 00:20:42,440
Simple question, is it better,
what's the difference between

411
00:20:42,440 --> 00:20:45,400
2 PgBench runs with and without
this option?

412
00:20:45,600 --> 00:20:49,040
Of course, this option should be
used during initialization time,

413
00:20:49,540 --> 00:20:53,080
but we checked and we saw difference
on a small scale, like 1

414
00:20:53,080 --> 00:20:56,180
million rows in PgBench accounts,
scale 10.

415
00:20:57,340 --> 00:20:59,320
We saw difference only 11%.

416
00:21:01,240 --> 00:21:05,820
On my MacBook, it was a quick and
dirty experiment, all in memory

417
00:21:05,820 --> 00:21:06,660
and so on.

418
00:21:07,540 --> 00:21:08,260
As expected.

419
00:21:08,300 --> 00:21:11,420
I expected also like 5 to 10% or
so.

420
00:21:11,920 --> 00:21:13,120
You said 10%, right?

421
00:21:13,280 --> 00:21:13,980
Michael: I guessed.

422
00:21:14,180 --> 00:21:15,780
It was a complete guess.

423
00:21:15,900 --> 00:21:18,960
And also, I don't know why I guessed
before even knowing.

424
00:21:19,020 --> 00:21:23,320
I'd forgotten the breakdown of
what proportion was selects versus

425
00:21:23,320 --> 00:21:26,040
inserts updates and deletes I definitely
shouldn't have guessed

426
00:21:26,040 --> 00:21:26,920
before knowing that.

427
00:21:26,920 --> 00:21:29,920
Nikolay: Why do you care about
selects, deletes, updates and

428
00:21:29,920 --> 00:21:30,580
so on?

429
00:21:31,240 --> 00:21:35,320
Michael: I thought I guessed that
well maybe this is wrong again

430
00:21:35,320 --> 00:21:38,240
I guessed that there would be a
bigger impact on...

431
00:21:39,060 --> 00:21:39,440
Right.

432
00:21:39,440 --> 00:21:39,940
...Inserts.

433
00:21:40,460 --> 00:21:41,260
Yeah, exactly.

434
00:21:41,540 --> 00:21:44,000
Than, so, than the select operations.

435
00:21:44,200 --> 00:21:45,300
That was my guess.

436
00:21:46,100 --> 00:21:50,680
Nikolay: Well, by the way, actually,
maintenance, Let's put it

437
00:21:50,680 --> 00:21:53,580
to the maintenance overhead basket.

438
00:21:54,780 --> 00:22:02,520
When you need to delete a reference
set row, a row in a referenced

439
00:22:02,840 --> 00:22:05,780
table, sometimes we call it parent
or something.

440
00:22:06,820 --> 00:22:11,260
If an index, well, index always
exists on 1 side of foreign key

441
00:22:11,260 --> 00:22:15,540
because otherwise foreign key creation
will complain lack of

442
00:22:15,540 --> 00:22:16,040
index.

443
00:22:16,240 --> 00:22:20,280
You need like unique index on 1,
or primary key, for example.

444
00:22:20,440 --> 00:22:21,240
Primary index.

445
00:22:21,760 --> 00:22:24,280
Index for primary key, which is
also unique index, right?

446
00:22:24,280 --> 00:22:25,660
Michael: You need a unique constraint.

447
00:22:26,100 --> 00:22:27,040
Nikolay: Right, right.

448
00:22:27,180 --> 00:22:32,060
If index is not created on the
other side, which by default is

449
00:22:32,060 --> 00:22:33,480
so, you can...

450
00:22:33,480 --> 00:22:34,520
Michael: Yeah, no, yeah.

451
00:22:34,540 --> 00:22:34,940
Nikolay: Yeah.

452
00:22:34,940 --> 00:22:36,660
And you delete this reference row.

453
00:22:36,660 --> 00:22:42,440
Postgres needs to find all dependent
rows to either say it's

454
00:22:42,440 --> 00:22:46,720
okay, or maybe to propagate delete
if you again use cascade deletes.

455
00:22:47,280 --> 00:22:50,140
And if an index is not there, it will
be a sequential scan of the

456
00:22:50,140 --> 00:22:50,940
whole table.

457
00:22:51,820 --> 00:22:53,900
So delete will be super slow.

458
00:22:54,280 --> 00:22:58,120
And this is also kind of maintenance
overhead to me because if

459
00:22:58,120 --> 00:22:59,440
you expect...

460
00:22:59,540 --> 00:23:03,480
Sometimes people say, oh, we don't
have such deletes, we use

461
00:23:03,480 --> 00:23:04,980
only soft deletes, right?

462
00:23:05,560 --> 00:23:08,140
So, or something, like, we don't
care.

463
00:23:08,140 --> 00:23:10,080
We don't need those indexes to
be present.

464
00:23:10,080 --> 00:23:15,360
But if you have some risks, sometimes,
like, not often, sometimes,

465
00:23:15,600 --> 00:23:18,500
these deletes are happening very
rarely in some systems.

466
00:23:19,140 --> 00:23:22,620
You need to maintain these indexes
and need to remember about

467
00:23:22,660 --> 00:23:24,880
them and need to create them and
so on.

468
00:23:25,120 --> 00:23:28,020
So it's also a kind of overhead.

469
00:23:28,780 --> 00:23:30,180
Michael: I see this quite often.

470
00:23:30,180 --> 00:23:32,880
And I actually didn't write a blog
post about this, it was quite

471
00:23:32,880 --> 00:23:33,980
a while ago now.

472
00:23:36,260 --> 00:23:38,440
But yeah, it's the time I see triggers
causing performance issues

473
00:23:38,440 --> 00:23:39,960
the most often, because

474
00:23:39,960 --> 00:23:40,760
Nikolay: it's on blast.

475
00:23:40,760 --> 00:23:43,740
Michael: That's exactly that's
how you spot them in the explain

476
00:23:43,740 --> 00:23:44,120
plans.

477
00:23:44,120 --> 00:23:44,620
Yeah.

478
00:23:44,640 --> 00:23:46,560
So I'll include I'll include that.

479
00:23:46,560 --> 00:23:52,460
But my view is normally it makes
sense to index your referencing

480
00:23:52,740 --> 00:23:53,240
columns.

481
00:23:54,160 --> 00:23:56,720
Also for select perform, like often
people, I find they're a

482
00:23:56,720 --> 00:23:59,020
good candidate for selects anyway.

483
00:24:01,160 --> 00:24:01,460
Yeah.

484
00:24:01,460 --> 00:24:04,860
Like there's loads of workloads
and patterns where it makes sense

485
00:24:04,860 --> 00:24:05,740
to have them indexed.

486
00:24:05,740 --> 00:24:08,300
Of course, if you're already adding
like a multi-column index

487
00:24:08,300 --> 00:24:11,260
with them in the leading column,
you don't need an additional

488
00:24:11,320 --> 00:24:11,820
index.

489
00:24:12,100 --> 00:24:16,560
But what I mean is like an index
on that so that they can be

490
00:24:16,560 --> 00:24:17,320
looked up.

491
00:24:17,320 --> 00:24:18,540
It is normally.

492
00:24:18,740 --> 00:24:22,640
Nikolay: Yeah, our Postgres checkup
tool also has such kind of

493
00:24:22,640 --> 00:24:28,580
report, non-indexed, like foreign
keys without supporting indexes.

494
00:24:29,440 --> 00:24:33,000
But we usually say it's very special.

495
00:24:33,520 --> 00:24:36,220
And some people, I see this as
well.

496
00:24:36,220 --> 00:24:39,960
I see that people say, well, we
don't need that because our work

497
00:24:39,960 --> 00:24:40,780
is different.

498
00:24:40,920 --> 00:24:43,640
Michael: I remember a blog post,
I think it was from Pocona,

499
00:24:44,380 --> 00:24:47,860
arguing the opposite case, saying,
please don't automatically

500
00:24:48,120 --> 00:24:48,620
index.

501
00:24:48,680 --> 00:24:49,940
I'll include that as well.

502
00:24:49,940 --> 00:24:52,620
Nikolay: I would say as well, I
would say don't automatically

503
00:24:52,680 --> 00:24:57,100
index because every index has a
different kind of overhead.

504
00:24:57,260 --> 00:24:58,780
It has a penalty, right?

505
00:24:59,280 --> 00:25:01,540
Michael: Well we've discussed that
many times, haven't we?

506
00:25:02,200 --> 00:25:07,200
Not just insert overhead, but also
killing hot updates in some

507
00:25:07,200 --> 00:25:07,700
planning.

508
00:25:08,140 --> 00:25:09,180
Yeah, lots of,

509
00:25:09,180 --> 00:25:09,980
Nikolay: lots of.

510
00:25:10,640 --> 00:25:17,000
So many dangers around when you
need to grow and have good performance.

511
00:25:17,860 --> 00:25:18,360
Right.

512
00:25:18,520 --> 00:25:23,340
So back to these experiments with
PgBench, with and without foreign

513
00:25:23,340 --> 00:25:27,980
keys, You say you expect problems
only in write operations.

514
00:25:28,320 --> 00:25:28,820
Interesting.

515
00:25:30,060 --> 00:25:31,580
Michael: Or at least more overhead.

516
00:25:32,220 --> 00:25:35,460
Nikolay: Let's remember that because
I have a case, you know

517
00:25:35,460 --> 00:25:37,860
I have a case where it's very different.

518
00:25:38,360 --> 00:25:43,040
Okay, in this case, PgBench,
we observe roughly 10%.

519
00:25:43,860 --> 00:25:47,680
I'm quite sure it will be the same
at larger scale, on different

520
00:25:47,680 --> 00:25:50,580
machines, and so on, with PgBench,
I think.

521
00:25:50,580 --> 00:25:53,220
PgBench is my PgBench type of workload.

522
00:25:54,520 --> 00:25:55,440
Roughly 10%.

523
00:25:55,800 --> 00:25:59,840
Would you pay this price, knowing
that everything is slowed down?

524
00:25:59,920 --> 00:26:02,380
Right, operations are slowed down
by 10%.

525
00:26:03,180 --> 00:26:05,840
Michael: Most systems I work with
that have been easy.

526
00:26:05,840 --> 00:26:10,080
Yes, like, most systems aren't
maxed out on CPU aren't maxed

527
00:26:10,080 --> 00:26:12,640
like it's, it's an easy decision.

528
00:26:13,080 --> 00:26:16,860
And for the for the anybody that's
ever dealt with bad data,

529
00:26:17,040 --> 00:26:20,640
Like bad data can be so painful,
as in bad quality data.

530
00:26:20,640 --> 00:26:23,720
I mean, you have to look through
a system and you're like, this

531
00:26:23,720 --> 00:26:24,640
doesn't make sense.

532
00:26:24,640 --> 00:26:28,980
Like these are referencing something
or maybe like a product

533
00:26:28,980 --> 00:26:30,220
that just doesn't exist.

534
00:26:30,220 --> 00:26:32,820
What plan are these customers on?

535
00:26:35,220 --> 00:26:38,140
Or, you know, if you're doing analytics
and you're trying to

536
00:26:38,300 --> 00:26:41,920
categorize things, and there's
bad data in there, it just becomes

537
00:26:41,920 --> 00:26:43,940
such a nightmare at those times.

538
00:26:44,280 --> 00:26:48,060
Nikolay: Or somebody just disabled
foreign keys, did some writes

539
00:26:48,060 --> 00:26:49,620
and then enabled them back.

540
00:26:50,980 --> 00:26:55,180
Or some back, sometimes, or human
cases without foreign keys

541
00:26:55,180 --> 00:26:55,580
actually.

542
00:26:55,580 --> 00:26:59,280
Michael: But I also want- I'm talking
about, yeah, I'm talking

543
00:26:59,280 --> 00:27:02,900
about cases without, but I mean,
anybody that's ever dealt with

544
00:27:02,900 --> 00:27:05,540
that, and it's probably kind of
slightly more seasoned folks

545
00:27:05,540 --> 00:27:07,980
that have had to deal with it once
or twice in their career,

546
00:27:08,160 --> 00:27:11,520
it's just so painful that you then
think, do you know what, I

547
00:27:11,520 --> 00:27:14,340
don't want to, I'll happily pay
10% going forwards to not have

548
00:27:14,340 --> 00:27:15,540
to deal with that again.

549
00:27:15,620 --> 00:27:18,560
Nikolay: I remember cases when
the financial integrity was broken,

550
00:27:18,700 --> 00:27:21,360
even with foreign keys, but it
was a bug in Postgres that was

551
00:27:21,360 --> 00:27:22,320
very long ago.

552
00:27:22,780 --> 00:27:27,260
Since then they became very reliable
and I would pay this price,

553
00:27:27,260 --> 00:27:34,940
10%, to have these checks constantly
performed by Postgres to

554
00:27:34,940 --> 00:27:38,460
ensure the data quality is higher
in this area.

555
00:27:40,120 --> 00:27:45,420
And let's emphasize that this overhead
is not a cliff.

556
00:27:46,720 --> 00:27:51,320
It doesn't matter how many transactions
per second you have,

557
00:27:51,740 --> 00:27:53,240
how many rows you have.

558
00:27:53,260 --> 00:27:56,320
This extra penalty is added to
all write operations all the time

559
00:27:56,320 --> 00:27:58,280
because extra work needs to be
done.

560
00:27:58,320 --> 00:28:01,880
For example, you insert, Postgres
needs to make sure that you

561
00:28:01,880 --> 00:28:03,480
insert a value to child table.

562
00:28:03,480 --> 00:28:05,240
Let's call it parent-child for
simplicity.

563
00:28:05,280 --> 00:28:09,440
It's not necessarily meaning of
the relationship can be different,

564
00:28:09,440 --> 00:28:10,140
but let's call it.

565
00:28:10,140 --> 00:28:15,920
You insert a record to a child
table, Postgres needs to ensure

566
00:28:15,920 --> 00:28:18,840
that parent record exists because
you're referring to it.

567
00:28:19,060 --> 00:28:22,600
So it performs basically implicit
select to check that the record

568
00:28:22,600 --> 00:28:26,080
exists, and this is extra work
that always needs to be done.

569
00:28:26,340 --> 00:28:30,700
And even if you have 1 transaction
per second, very tiny workload,

570
00:28:31,240 --> 00:28:33,200
still penalty will be there.

571
00:28:34,300 --> 00:28:37,660
We can check this actually with
EXPLAIN (ANALYZE, BUFFERS) as

572
00:28:37,660 --> 00:28:42,420
well and see that more buffers
need to be involved when you insert

573
00:28:43,700 --> 00:28:47,460
a row with foreign key versus without
foreign key, right?

574
00:28:47,980 --> 00:28:49,040
Michael: Maybe 1 more?

575
00:28:49,820 --> 00:28:50,580
Nikolay: Well, depends.

576
00:28:51,260 --> 00:28:55,820
Depends how big the table is and
how many hops index scan

577
00:28:56,120 --> 00:28:56,960
Michael: should show.

578
00:28:57,120 --> 00:29:00,320
True, actually, definitely not
1, at least 2.

579
00:29:00,580 --> 00:29:01,560
Nikolay: Again, it depends.

580
00:29:03,460 --> 00:29:04,990
But maybe, I don't know, maybe
it's index only scan.

581
00:29:04,990 --> 00:29:06,040
I think there's the

582
00:29:06,040 --> 00:29:07,840
Michael: page, there's like the
top...

583
00:29:08,000 --> 00:29:11,000
I don't think you'll ever get data
in that root node, but I could

584
00:29:11,000 --> 00:29:11,380
be wrong.

585
00:29:11,380 --> 00:29:12,420
Nikolay: Well, yeah, okay.

586
00:29:12,560 --> 00:29:15,460
Anyway, we agree that this is not
a cliff.

587
00:29:15,460 --> 00:29:16,020
It's constant work.

588
00:29:16,020 --> 00:29:17,240
Michael: Yes, true, true, true.

589
00:29:17,400 --> 00:29:19,700
Nikolay: Additional workload, which
is like...

590
00:29:20,460 --> 00:29:23,540
Additional weight you need to carry
with performing this work

591
00:29:23,560 --> 00:29:24,340
all the time.

592
00:29:24,780 --> 00:29:27,860
You have a contract to do this
all the time.

593
00:29:28,260 --> 00:29:29,940
It's like a tax, actually.

594
00:29:31,560 --> 00:29:32,680
Foreign key taxes.

595
00:29:34,060 --> 00:29:36,480
Michael: That actually is a good
metaphor, I think.

596
00:29:36,480 --> 00:29:39,640
Nikolay: Yeah, like 10% tax from
foreign keys.

597
00:29:40,640 --> 00:29:43,360
So we have consensus here.

598
00:29:43,360 --> 00:29:47,840
We are ready to pay those taxes
to live in the world with better

599
00:29:47,840 --> 00:29:48,620
data quality.

600
00:29:48,900 --> 00:29:52,040
But there are performance cliffs,
as you already mentioned, and

601
00:29:52,040 --> 00:29:53,740
they are terrible sometimes.

602
00:29:54,520 --> 00:30:01,200
So one of them was raised a year
ago by Lukas Fittl, at pganalyze

603
00:30:02,600 --> 00:30:05,320
series of small videos which I
like.

604
00:30:05,320 --> 00:30:06,040
Michael: 5 Minutes of Postgres.

605
00:30:06,040 --> 00:30:07,340
Michael: 5 Minutes of Postgres.

606
00:30:07,340 --> 00:30:08,540
Nikolay: Right, right, right.

607
00:30:08,800 --> 00:30:13,420
They are always based on some other
focus materials, and Lukas

608
00:30:13,420 --> 00:30:15,600
reflects very well in the video.

609
00:30:15,980 --> 00:30:21,840
So it was based on Christophe Pettus
blog post, where a simple

610
00:30:22,060 --> 00:30:24,020
workload was described.

611
00:30:25,080 --> 00:30:31,020
Imagine a Twitch stream and a new
stream is starting, and we

612
00:30:31,020 --> 00:30:36,140
have a streams table, kind of a
parent table, and we have some

613
00:30:36,140 --> 00:30:39,880
viewers or something table, and
immediately like 1,000,000 or so

614
00:30:39,920 --> 00:30:44,440
viewers join the stream and you
need to insert them to this,

615
00:30:44,760 --> 00:30:47,060
let's say, child table, parent
and child.

616
00:30:47,800 --> 00:30:51,360
And if you do it in separate inserts,
separate transactions,

617
00:30:52,540 --> 00:30:57,180
when Postgres performs insert,
not only it needs to check that

618
00:30:57,180 --> 00:31:01,280
we've selected that row exists,
during the duration of insert

619
00:31:01,280 --> 00:31:06,380
this writing transaction, we need
to lock that row with access

620
00:31:06,380 --> 00:31:07,180
share lock.

621
00:31:08,940 --> 00:31:11,620
Basically, for key share lock.

622
00:31:12,500 --> 00:31:18,220
If it was explicit lock, it would
be for key share lock.

623
00:31:18,580 --> 00:31:23,160
So to ensure that this row won't
disappear during our transaction,

624
00:31:23,200 --> 00:31:23,600
right?

625
00:31:23,600 --> 00:31:24,100
Michael: Yeah.

626
00:31:24,960 --> 00:31:28,760
Nikolay: And imagine now like a
million inserts are happening

627
00:31:29,440 --> 00:31:30,300
around the same time.

628
00:31:30,300 --> 00:31:31,700
Of course, it's not possible.

629
00:31:31,780 --> 00:31:35,640
It depends on how many resources
you have, but many of them will

630
00:31:35,640 --> 00:31:38,300
collide and try to happen at the
same time.

631
00:31:39,020 --> 00:31:42,620
And in this case, since multiple
transactions need to perform

632
00:31:42,620 --> 00:31:47,860
a row-level lock on this parent
table, Postgres starts using multiexact

633
00:31:48,160 --> 00:31:48,660
IDs.

634
00:31:49,220 --> 00:31:49,720
Right?

635
00:31:50,280 --> 00:31:51,200
Multi-exact IDs.

636
00:31:51,200 --> 00:31:52,320
I'm like, oh.

637
00:31:53,100 --> 00:31:59,020
So multi-exact IDs are created
and multi-exact IDs are used when

638
00:31:59,020 --> 00:32:01,420
multiple transactions lock the
same row.

639
00:32:03,580 --> 00:32:09,780
And this mechanism is very easy
to achieve some performance cliffs

640
00:32:09,780 --> 00:32:15,040
when you grow, when multiexacts
are actively used.

641
00:32:15,660 --> 00:32:20,200
Well, by the way, Lulas also mentions
another problem with multiexacts.

642
00:32:21,260 --> 00:32:26,720
Many, many monitoring systems,
and PgAnalysis is a good exclusion

643
00:32:26,720 --> 00:32:30,400
here, as well as our version of
PgWatch, PgWatch PostgreSQL Edition,

644
00:32:30,580 --> 00:32:33,060
They both care about this.

645
00:32:33,340 --> 00:32:37,840
But many systems care about only
regular transaction ID wraparound.

646
00:32:38,720 --> 00:32:43,160
They forget that multi-exact ID
wraparound also might happen.

647
00:32:43,380 --> 00:32:45,680
It's just very rare.

648
00:32:45,720 --> 00:32:50,520
I never heard about this case so
far, but it's still possible.

649
00:32:50,940 --> 00:32:57,480
So you need to monitor and ensure
that autovacuum freezes rows

650
00:32:57,480 --> 00:33:02,520
properly and takes care of rows
which were involved in this multiexact

651
00:33:02,780 --> 00:33:03,280
mechanism.

652
00:33:04,280 --> 00:33:05,080
So 1 problem.

653
00:33:05,080 --> 00:33:06,980
But another problem, this cliff.

654
00:33:07,540 --> 00:33:13,340
So in source, we have contention
on multi-exact mechanism.

655
00:33:14,160 --> 00:33:19,900
Multi-exact has SLRU, S-L-R-U,
and buffers are quite small.

656
00:33:20,280 --> 00:33:22,320
The number of buffers, like there
are default.

657
00:33:22,640 --> 00:33:26,620
So it's not default, it's hard-coded
until Postgres version 17.

658
00:33:27,900 --> 00:33:30,780
And yeah, so it's like...

659
00:33:31,520 --> 00:33:35,120
I saw this problem with SLRUs in
different cases, in sub-transactions

660
00:33:35,440 --> 00:33:38,960
case many years ago, not many,
several years ago.

661
00:33:39,960 --> 00:33:44,280
And I remember I touched it a little
bit at that time already.

662
00:33:44,680 --> 00:33:47,060
And sub-transactions is also a
performance cliff.

663
00:33:47,300 --> 00:33:50,060
Everything is right, your scale
is right, and then suddenly the

664
00:33:50,060 --> 00:33:55,120
system is down, basically, because
overhead becomes enormous.

665
00:33:55,600 --> 00:34:00,560
It usually happens when you achieve
the limits of these SLRU

666
00:34:00,600 --> 00:34:01,100
buffers.

667
00:34:02,380 --> 00:34:09,280
We needed it for sub-trans SLRU,
but this is similar to multi-exact

668
00:34:09,380 --> 00:34:09,880
SLRU.

669
00:34:10,580 --> 00:34:14,000
Around the time I learned that
Andrei Borodin already proposed

670
00:34:14,120 --> 00:34:18,620
improvements in algorithm, not
only improvements, but also make

671
00:34:19,000 --> 00:34:19,820
this tunable.

672
00:34:20,640 --> 00:34:23,700
And good news, this work is committed
to Postgres 17.

673
00:34:26,040 --> 00:34:29,160
So if we observe this performance
cliff at some point, at some

674
00:34:29,160 --> 00:34:32,820
rates of inserts, you observe performance
becomes terrible, like

675
00:34:33,840 --> 00:34:36,540
system unresponsive downtime, right?

676
00:34:36,940 --> 00:34:42,380
You might try to tune to postpone
this cliff from your current

677
00:34:42,380 --> 00:34:42,880
situation.

678
00:34:44,480 --> 00:34:45,480
But the only-

679
00:34:45,480 --> 00:34:46,280
Michael: At what cost?

680
00:34:46,280 --> 00:34:47,980
Like a little bit of extra memory?

681
00:34:49,490 --> 00:34:49,990
Yeah,

682
00:34:51,500 --> 00:34:56,140
Nikolay: algorithm was improved
instead of sequential capital

683
00:34:56,140 --> 00:34:58,760
O of N, I think.

684
00:34:59,760 --> 00:35:02,180
Or maybe even quadratical algorithm.

685
00:35:02,640 --> 00:35:05,240
It was improved, better algorithm.

686
00:35:05,800 --> 00:35:10,200
But also you can adjust configuration
and have more buffers,

687
00:35:10,200 --> 00:35:10,880
for example.

688
00:35:11,200 --> 00:35:16,360
I remember we tested some earlier
versions of these patches.

689
00:35:17,780 --> 00:35:20,140
There are multiple patches in this
work.

690
00:35:20,780 --> 00:35:23,180
By the way, big congratulations
to Andrei.

691
00:35:23,500 --> 00:35:28,980
More and more works he was working
on, like many years, they

692
00:35:28,980 --> 00:35:30,580
are committed.

693
00:35:30,580 --> 00:35:31,300
It's great.

694
00:35:31,720 --> 00:35:35,380
But I remember there are some complexities
in our particular

695
00:35:35,380 --> 00:35:36,600
case related to sub-transactions.

696
00:35:37,200 --> 00:35:41,900
But I think we should test these
things and study this exactly,

697
00:35:41,940 --> 00:35:45,860
this very performance cliff Christoph
describes in his blog post,

698
00:35:45,860 --> 00:35:46,360
right?

699
00:35:47,100 --> 00:35:50,640
Michael: Yeah, well, so I think
that's worth mentioning because

700
00:35:51,580 --> 00:35:57,280
this is 1 solution, but Christoph
also mentions a couple of alternative

701
00:35:57,440 --> 00:35:59,910
approaches to that schema design.

702
00:35:59,910 --> 00:36:01,540
Nikolay: He mentions reliable approaches.

703
00:36:01,640 --> 00:36:03,820
1 of them is just drop foreign
key.

704
00:36:03,820 --> 00:36:06,800
In this case, no more performance
cliff, because multi-exact

705
00:36:07,340 --> 00:36:08,680
mechanism is not involved.

706
00:36:09,400 --> 00:36:10,500
Absolutely true.

707
00:36:12,040 --> 00:36:15,260
Michael: But also alternative approach,
like batching the updates,

708
00:36:15,580 --> 00:36:17,960
or there was another idea.

709
00:36:17,980 --> 00:36:18,520
Nikolay: What updates?

710
00:36:18,520 --> 00:36:19,020
Inserts.

711
00:36:19,020 --> 00:36:20,060
It's a very important...

712
00:36:21,600 --> 00:36:25,580
We always say you need to perform
deletes and updates in batches.

713
00:36:26,140 --> 00:36:29,360
But we never said this about inserts.

714
00:36:29,960 --> 00:36:33,200
Inserts is better to do in a single
transaction if you can.

715
00:36:34,140 --> 00:36:37,500
Because inserts are usually not
blocking.

716
00:36:37,760 --> 00:36:38,800
Not in this case.

717
00:36:38,800 --> 00:36:44,680
Here we need to deal with this
multi-exact row level lock.

718
00:36:45,060 --> 00:36:48,040
But if you can do it in single
transaction, it's great, even

719
00:36:48,040 --> 00:36:50,420
if it's a million rows, it's okay.

720
00:36:51,040 --> 00:36:56,780
Or multiple bigger batches, like
a thousand inserts.

721
00:36:59,340 --> 00:37:01,880
But straightforward implementation
of this approach, like, oh,

722
00:37:01,880 --> 00:37:03,400
viewers, just join our channel.

723
00:37:03,400 --> 00:37:06,860
It will be separate inserts because
there happens different sessions,

724
00:37:07,580 --> 00:37:08,080
right?

725
00:37:09,060 --> 00:37:13,040
You need to know about this problem
to design system differently

726
00:37:13,180 --> 00:37:15,980
and start batching these inserts.

727
00:37:16,440 --> 00:37:18,660
So it's a dangerous performance
cliff.

728
00:37:19,260 --> 00:37:22,220
But this performance cliff demonstrates
the problem for inserts.

729
00:37:22,840 --> 00:37:26,120
I have another case, which is not
yet published.

730
00:37:26,120 --> 00:37:28,760
I hope we will publish some blog
post about this.

731
00:37:30,040 --> 00:37:33,160
It was not observed in production,
but it was inspired by some

732
00:37:33,160 --> 00:37:38,400
problems we had with some customers
related to multi-exact IDs

733
00:37:38,400 --> 00:37:38,900
contention.

734
00:37:40,240 --> 00:37:44,820
So imagine we have parent and a
couple of records, just 1 and

735
00:37:44,820 --> 00:37:52,440
2, ID1, ID2, and child table with
some inserts happening.

736
00:37:53,360 --> 00:37:55,460
Not at a huge rate.

737
00:37:55,760 --> 00:37:59,560
Not at a rate where we already
have this performance cliff.

738
00:38:00,600 --> 00:38:05,420
And then also we want to have a
lot of selects, selecting, for

739
00:38:05,420 --> 00:38:10,220
example, the latest child inserted
to the child table.

740
00:38:10,640 --> 00:38:11,760
Just with join, right?

741
00:38:11,760 --> 00:38:16,560
So we join, we select parent ID
1, and we select what was the

742
00:38:16,560 --> 00:38:19,120
latest child inserted recently.

743
00:38:19,340 --> 00:38:22,240
We have timestamp, for example,
to do this, or we just order

744
00:38:22,240 --> 00:38:27,040
by primary key, if it's just an
integer primary key for the child

745
00:38:27,040 --> 00:38:27,540
table.

746
00:38:28,220 --> 00:38:30,940
And now, we think, Well, that's
it.

747
00:38:30,940 --> 00:38:37,000
So we inserted, like, say, we insert
10 per second, for example.

748
00:38:37,120 --> 00:38:38,740
And we select as much as we can.

749
00:38:38,740 --> 00:38:43,680
Like, imagine we have a lot of
viewers trying to read the latest

750
00:38:44,240 --> 00:38:45,780
inserted record to child.

751
00:38:46,200 --> 00:38:50,720
And now we start updating the parent
id equals 1.

752
00:38:51,500 --> 00:38:52,540
This is the cliff.

753
00:38:53,740 --> 00:38:57,420
We perform terrible for everyone,
including selects.

754
00:38:57,820 --> 00:39:00,200
This is the key for this demonstration.

755
00:39:01,160 --> 00:39:03,480
It can be like 100 times worse.

756
00:39:04,660 --> 00:39:08,160
100 times bigger latency, 100 times
lower

757
00:39:09,000 --> 00:39:09,500
Michael: TPS.

758
00:39:10,360 --> 00:39:13,700
It really surprised me when you
told me this, when you showed

759
00:39:13,700 --> 00:39:14,400
me this.

760
00:39:14,540 --> 00:39:18,980
Nikolay: Well, We need to research
all the details, but I suspect

761
00:39:19,020 --> 00:39:22,640
this is related to the fact that
in this case, when you have

762
00:39:22,640 --> 00:39:27,460
inserts and updates of parent inserts
to child, they also...

763
00:39:27,580 --> 00:39:31,000
Well, first of all, all those inserts,
if they happen around

764
00:39:31,000 --> 00:39:33,080
the same time, it's also multixact
involved.

765
00:39:33,080 --> 00:39:37,060
But updates also need row-level
lock, right?

766
00:39:37,360 --> 00:39:38,160
And they also...

767
00:39:39,140 --> 00:39:44,640
Actually, multixact ID is being
put to XMax hidden column,

768
00:39:44,760 --> 00:39:46,040
system column, right?

769
00:39:46,100 --> 00:39:52,860
So XMax column defines the transaction
ID when the row becomes

770
00:39:53,200 --> 00:39:53,700
dead.

771
00:39:54,320 --> 00:39:57,260
And if the transaction succeeds,
okay, this tuple is dead.

772
00:39:57,520 --> 00:40:01,020
If the transaction is cancelled,
nobody knows.

773
00:40:01,020 --> 00:40:05,780
It's like it would be XMAX being
now, right?

774
00:40:06,380 --> 00:40:07,260
Live tuple.

775
00:40:08,160 --> 00:40:12,840
But in this situation, Postgres
puts not regular transaction

776
00:40:12,840 --> 00:40:14,560
IDs, but multixact IDs.

777
00:40:15,360 --> 00:40:20,340
And produces some volume of dead
tuples all the time with multixact

778
00:40:20,440 --> 00:40:21,840
IDs inside XMAX.

779
00:40:22,920 --> 00:40:27,760
And I suspect that when selects
are trying to find live tuples

780
00:40:27,880 --> 00:40:32,880
for parent, it's really expensive
because multixact mechanism,

781
00:40:32,960 --> 00:40:37,760
we need to check which transactions
were canceled.

782
00:40:38,740 --> 00:40:39,840
It's very expensive.

783
00:40:40,840 --> 00:40:47,480
So, selects need to scan a lot
of dead tuples, performing expensive

784
00:40:47,560 --> 00:40:51,060
verification, and they degrade
a lot.

785
00:40:51,820 --> 00:40:56,660
So this workload, I think, anyone
can experience.

786
00:40:57,660 --> 00:41:02,280
And I remember I designed a few
systems where, for the sake to

787
00:41:02,560 --> 00:41:09,520
have fast counts, when inserting
to child table, for example,

788
00:41:09,520 --> 00:41:14,380
comments or something, or views
or something, I don't know, like

789
00:41:14,380 --> 00:41:18,260
something, I decided to update
some counter in the parent table,

790
00:41:18,640 --> 00:41:21,800
to have denormalized data, to avoid
slow count.

791
00:41:22,460 --> 00:41:25,320
So I just insert an update in the
same transaction.

792
00:41:25,320 --> 00:41:28,260
In different transactions, I also
insert an update, plus 1, plus

793
00:41:28,260 --> 00:41:28,700
1.

794
00:41:28,700 --> 00:41:32,800
Of course, this already is not
good, because these updates have

795
00:41:32,800 --> 00:41:33,560
some contention.

796
00:41:33,700 --> 00:41:40,080
You're updating the same row, you
cannot update 2 times, it will

797
00:41:40,080 --> 00:41:42,540
be sequential, like Postgres will.

798
00:41:42,840 --> 00:41:47,360
It will be contentional heavy locks.

799
00:41:47,620 --> 00:41:51,940
You can improve this, but anyway,
even if you perform batched

800
00:41:51,940 --> 00:41:55,120
updates, for example, in separate
transactions, you have high

801
00:41:55,120 --> 00:42:00,260
risks of multi-exact IDs being
involved, and then you already

802
00:42:00,360 --> 00:42:04,080
can have a lightweight lock contention
on multixact mechanism.

803
00:42:04,860 --> 00:42:05,640
Michael: A favorite.

804
00:42:06,160 --> 00:42:08,140
Nikolay: Yeah, overlock multixact
offset.

805
00:42:09,400 --> 00:42:12,680
And yeah, so this is not good.

806
00:42:13,620 --> 00:42:20,580
And I imagine this can happen with
very good chances.

807
00:42:20,860 --> 00:42:23,940
And it also can be seen as a performance
cliff.

808
00:42:23,940 --> 00:42:29,940
So suddenly, it was fine, but then
it quickly becomes terrible.

809
00:42:31,520 --> 00:42:33,780
Yeah, so what to do?

810
00:42:33,780 --> 00:42:34,700
I don't know.

811
00:42:34,740 --> 00:42:38,160
I think we need to study, research
all those cliffs, document

812
00:42:38,240 --> 00:42:41,440
them and understand how to predict
them maybe.

813
00:42:42,160 --> 00:42:43,940
At what rates everything is fine.

814
00:42:43,940 --> 00:42:47,360
Because I don't see them under
very heavy loads, I don't see

815
00:42:47,360 --> 00:42:47,860
them.

816
00:42:48,240 --> 00:42:50,780
But it doesn't mean they're not
there.

817
00:42:51,180 --> 00:42:52,320
Michael: Or they're not coming.

818
00:42:53,480 --> 00:42:54,640
Nikolay: Or they're not coming.

819
00:42:54,680 --> 00:42:57,220
At some point, they might bite
you back.

820
00:42:58,180 --> 00:42:58,980
Michael: That's interesting.

821
00:43:00,040 --> 00:43:03,540
Nikolay: For those who are interested,
let's put links.

822
00:43:03,900 --> 00:43:07,480
I have a series of benchmarks performed
with our PostgreSQL bot

823
00:43:07,860 --> 00:43:08,820
with some visualization.

824
00:43:09,240 --> 00:43:13,620
Of course, it requires explanation
maybe, but if you know how

825
00:43:13,620 --> 00:43:17,260
PgBench is organized, everything
is there, including all, it's

826
00:43:17,480 --> 00:43:18,380
easy to reproduce.

827
00:43:19,380 --> 00:43:21,800
Michael: If you're talking about
the graph that I remember seeing,

828
00:43:22,080 --> 00:43:26,080
it doesn't take that much for explanation,
because you have 4

829
00:43:26,100 --> 00:43:30,700
fairly large bars on the bar chart,
all showing lots of transactions

830
00:43:30,820 --> 00:43:35,340
per second, and then 1 that is
so tiny, you have to almost like

831
00:43:35,340 --> 00:43:36,420
zoom in to see it.

832
00:43:36,420 --> 00:43:40,360
And it's quite easy to say, oh,
that 1 didn't do so well.

833
00:43:40,360 --> 00:43:42,720
Nikolay: Yeah, let's explain a
little bit.

834
00:43:42,720 --> 00:43:45,160
The tiny one is this cliff demonstrated.

835
00:43:46,560 --> 00:43:47,280
3 others.

836
00:43:47,320 --> 00:43:49,940
One was, let's just remove updates.

837
00:43:51,960 --> 00:43:56,980
Another one, let's keep updates,
but update ID 2, not 1.

838
00:43:58,680 --> 00:44:00,360
Another one, let's remove inserts.

839
00:44:02,860 --> 00:44:05,300
And the last one, let's remove foreign
keys.

840
00:44:05,500 --> 00:44:06,000
Yes.

841
00:44:06,180 --> 00:44:07,240
Which is fair, right?

842
00:44:07,240 --> 00:44:10,680
In this case, just to show that
foreign keys is the key of evil

843
00:44:10,680 --> 00:44:11,180
here.

844
00:44:11,400 --> 00:44:11,900
Yeah.

845
00:44:12,440 --> 00:44:15,460
And all bars are TPS for selects.

846
00:44:16,340 --> 00:44:17,140
Maybe latency.

847
00:44:17,380 --> 00:44:20,460
Actually, it's more correct to
show latency, not TPS.

848
00:44:20,460 --> 00:44:21,520
But okay, TPS.

849
00:44:21,580 --> 00:44:26,120
Somehow people tend to like TPS
numbers more, but latency is

850
00:44:26,120 --> 00:44:28,260
what matters here for sure.

851
00:44:30,300 --> 00:44:35,040
Michael: Then we'd see 1 really
large 1 and 3 small ones.

852
00:44:35,440 --> 00:44:37,680
It's not as dramatic a graph.

853
00:44:38,740 --> 00:44:39,520
Nikolay: Yeah, maybe.

854
00:44:39,520 --> 00:44:42,100
Yeah, smaller is worse.

855
00:44:42,800 --> 00:44:44,340
Smaller is not better, yeah.

856
00:44:44,340 --> 00:44:48,580
Because latency is inverted logic.

857
00:44:49,740 --> 00:44:50,800
Smaller is better.

858
00:44:50,800 --> 00:44:53,000
Michael: Anyway, it's still great.

859
00:44:53,360 --> 00:44:57,420
Nikolay: So yeah, SELECTs can be
affected badly.

860
00:44:58,380 --> 00:45:01,400
Michael: Has this changed your
opinion on should you use foreign

861
00:45:01,400 --> 00:45:03,580
keys or not really?

862
00:45:03,700 --> 00:45:07,260
Nikolay: Yeah, this is the first
question I asked to myself and

863
00:45:07,260 --> 00:45:08,000
my team.

864
00:45:09,400 --> 00:45:13,120
I said, like, are we observing
a similar case to sub-transactions

865
00:45:13,740 --> 00:45:20,820
where We, like, I basically started
telling everyone, like, if

866
00:45:20,820 --> 00:45:23,800
you want to grow, try to avoid
them.

867
00:45:24,380 --> 00:45:26,640
At least know all the problems
they have.

868
00:45:26,960 --> 00:45:30,880
By the way, for sub-transactions,
there is a new setting also

869
00:45:31,020 --> 00:45:33,220
interesting to double check in
Postgres 17.

870
00:45:34,000 --> 00:45:38,860
But like, answer is still not 100%.

871
00:45:39,160 --> 00:45:44,620
I'm not 100% sure about the answer
here because like foreign

872
00:45:44,620 --> 00:45:46,420
keys is a good thing in general.

873
00:45:46,920 --> 00:45:53,000
So I think in sub-transactions
case, we saw how to achieve the

874
00:45:53,000 --> 00:45:56,980
problems very fast, like very easily
when you grow.

875
00:45:57,440 --> 00:46:01,400
With foreign keys, again, I'm not
sure.

876
00:46:01,820 --> 00:46:04,620
Actually, the customer I mentioned,
they removed some transactions.

877
00:46:05,140 --> 00:46:06,140
And the problem has gone.

878
00:46:06,140 --> 00:46:09,060
So I think sometimes these problems
come together.

879
00:46:09,060 --> 00:46:12,780
For example, we know that, as we
discussed, foreign keys, they

880
00:46:13,100 --> 00:46:15,300
put a SelectForShare lock.

881
00:46:16,220 --> 00:46:19,280
And this causes multiexact mechanism
being involved.

882
00:46:19,280 --> 00:46:23,760
But sometimes if you work with
SelectForUpdate, you work with

883
00:46:23,760 --> 00:46:24,260
SelectForUpdate.

884
00:46:24,480 --> 00:46:27,040
You don't expect multiexact mechanism
to be involved.

885
00:46:27,040 --> 00:46:31,860
But if sub-transactions are there,
SelectForUpdate might lead

886
00:46:31,860 --> 00:46:38,040
to multiexact mechanism because
every nesting level of your

887
00:46:38,040 --> 00:46:40,440
nested transaction or sub-transaction
is considered a separate

888
00:46:40,440 --> 00:46:44,440
transaction when Postgres puts
xmax value.

889
00:46:45,400 --> 00:46:51,240
And if you lock row SelectForUpdate
and then define save point

890
00:46:51,300 --> 00:46:56,620
and then perform update, this is
similar to select for share.

891
00:46:57,280 --> 00:46:58,680
Although it's select for update.

892
00:47:00,700 --> 00:47:05,200
Because basically multiple transactions
are trying to lock the

893
00:47:05,200 --> 00:47:07,480
same row, so multiexact is needed
here.

894
00:47:07,900 --> 00:47:11,820
And there is a blog post, I don't
remember, a very good blog

895
00:47:11,820 --> 00:47:13,240
post about Performance Cliff.

896
00:47:14,100 --> 00:47:17,860
This is a very blog post which
said, sub-transactions are cursed,

897
00:47:17,860 --> 00:47:18,980
just remove them.

898
00:47:19,540 --> 00:47:22,700
And I became a follower of this
approach.

899
00:47:23,440 --> 00:47:27,280
So in that case, with that customer,
we had 2 hypotheses.

900
00:47:28,080 --> 00:47:32,860
Either sub-transactions removal
will help or it won't help just

901
00:47:32,860 --> 00:47:37,520
because they had this case we demonstrated
in this experiment.

902
00:47:37,900 --> 00:47:41,040
Fortunately, it was related to
sub-transactions, but who knows,

903
00:47:41,040 --> 00:47:45,300
maybe in some case we will see
that no sub-transactions, but

904
00:47:45,300 --> 00:47:48,000
still performance cliff like that.

905
00:47:48,860 --> 00:47:49,800
Does it make sense?

906
00:47:51,300 --> 00:47:52,060
Michael: Yeah, for sure.

907
00:47:52,060 --> 00:47:54,860
And I think I was just looking
up the blog post because it was

908
00:47:54,860 --> 00:47:55,820
bugging me.

909
00:47:55,840 --> 00:47:58,500
Are you talking about the 1 from
Nelson L.

910
00:47:58,500 --> 00:47:59,000
Hage?

911
00:47:59,920 --> 00:48:00,540
Nikolay: I think so.

912
00:48:00,540 --> 00:48:01,140
Yeah, yeah.

913
00:48:01,280 --> 00:48:04,140
I included it to my 4 cases with
sub-transactions.

914
00:48:05,140 --> 00:48:07,260
Michael: We did an episode on sub-transactions
and I found that

915
00:48:07,260 --> 00:48:07,900
in the show notes.

916
00:48:07,900 --> 00:48:10,080
I'll double-check it and I'll link
it up if that's the

917
00:48:10,080 --> 00:48:10,120
Nikolay: right 1.

918
00:48:10,120 --> 00:48:12,840
It's very good, very good blog
post, very precise.

919
00:48:13,680 --> 00:48:18,480
It doesn't explore too far this
situation with Select for Update

920
00:48:18,520 --> 00:48:19,560
and foreign keys.

921
00:48:19,900 --> 00:48:22,220
I think it even doesn't mention
foreign keys.

922
00:48:22,640 --> 00:48:26,120
Yeah, because they had different
problem, but for sure it was

923
00:48:26,120 --> 00:48:30,040
a good starting point for me some
time ago to start understanding

924
00:48:30,080 --> 00:48:33,240
that multi-exact IDs also might
be dangerous.

925
00:48:34,340 --> 00:48:38,140
It's also a cliff due to various
reasons.

926
00:48:38,180 --> 00:48:43,800
So I think we need to postpone
exact answer to this.

927
00:48:44,840 --> 00:48:45,520
And maybe...

928
00:48:45,620 --> 00:48:47,940
Michael: Matthew 14 I had 1 more
thing I wanted to add.

929
00:48:48,920 --> 00:48:53,120
I think there's like, there's definitely
people I know who, who

930
00:48:53,120 --> 00:48:56,760
have been using Postgres for 5
or 10 years and they're riding

931
00:48:56,820 --> 00:49:01,160
this wave of, they adopted Postgres
when it didn't have all the

932
00:49:01,160 --> 00:49:02,980
features they thought they were
going to need.

933
00:49:02,980 --> 00:49:07,400
But as they've needed them, Postgres
has been adding them.

934
00:49:08,760 --> 00:49:11,920
As they've scaled, their business
has been able to stay on Postgres

935
00:49:11,920 --> 00:49:16,640
because it got replication features
or it got parallel query

936
00:49:16,640 --> 00:49:20,080
features or some things that they
started to be able to really

937
00:49:20,080 --> 00:49:23,320
benefit from Postgres improve as
they needed improvements.

938
00:49:23,600 --> 00:49:27,280
So it feels like this could be
another 1 of those things that

939
00:49:27,660 --> 00:49:29,700
Postgres is going to get better
in this area.

940
00:49:30,060 --> 00:49:32,040
There's enough noise around these
things.

941
00:49:32,040 --> 00:49:35,360
You mentioned some, even if these
don't, even if these get reverted

942
00:49:35,380 --> 00:49:38,860
in 17 and don't make it in 17,
something will probably make it

943
00:49:38,860 --> 00:49:40,760
into 18 or 19 or 20.

944
00:49:41,100 --> 00:49:44,440
And there's enough people at scale
pushing Postgres to these

945
00:49:44,440 --> 00:49:48,640
limits that a lot of us that aren't
there yet can probably get

946
00:49:48,640 --> 00:49:52,900
away with adding foreign keys,
using them even in cases where

947
00:49:52,900 --> 00:49:56,280
maybe it's not recommended at scale,
and maybe Postgres will

948
00:49:56,280 --> 00:49:59,900
be improved or will have configurable
settings by the time we

949
00:49:59,900 --> 00:50:00,880
do need them.

950
00:50:01,760 --> 00:50:03,140
Nikolay: Yeah, yeah.

951
00:50:04,680 --> 00:50:05,440
Makes sense.

952
00:50:05,600 --> 00:50:06,100
Maybe.

953
00:50:06,660 --> 00:50:10,960
Yeah, we need to explore those
settings and test various edge

954
00:50:10,960 --> 00:50:11,460
cases.

955
00:50:11,540 --> 00:50:14,220
I would say these performance-critical
cliffheads should be called

956
00:50:14,220 --> 00:50:14,980
edge case.

957
00:50:15,920 --> 00:50:17,520
Michael: Yeah, or corner case is
easier.

958
00:50:17,520 --> 00:50:20,340
Nikolay: Corner case, let's say
this, what I just described,

959
00:50:20,500 --> 00:50:24,280
sub-transactions plus select for
update, this is corner cases.

960
00:50:24,280 --> 00:50:25,680
Michael: 2 edges where they meet.

961
00:50:25,680 --> 00:50:26,640
Nikolay: Exactly, exactly.

962
00:50:26,640 --> 00:50:30,040
This is a perfect example of corner
case.

963
00:50:30,660 --> 00:50:32,220
But others are edge cases.

964
00:50:32,220 --> 00:50:35,240
We just need to explore the territory
and find all the edges

965
00:50:35,280 --> 00:50:40,160
and think if it's possible to move
them to have more room when

966
00:50:40,160 --> 00:50:40,660
needed.

967
00:50:41,400 --> 00:50:44,660
Or just, yeah, at some point probably
foreign keys should be

968
00:50:44,660 --> 00:50:46,320
just dropped in some cases.

969
00:50:47,500 --> 00:50:49,080
Sometimes, maybe.

970
00:50:49,940 --> 00:50:50,140
Michael: Good.

971
00:50:50,140 --> 00:50:52,120
Well, we spoke to some people who
don't use them.

972
00:50:52,120 --> 00:50:53,420
So that's really, yeah.

973
00:50:53,420 --> 00:50:55,460
Nikolay: But in general, I keep
using them.

974
00:50:55,460 --> 00:50:56,833
I keep using them everywhere.

975
00:50:56,833 --> 00:50:57,890
Yeah, me too.

976
00:50:57,890 --> 00:51:01,040
And I keep suggesting use them.

977
00:51:02,080 --> 00:51:02,580
Michael: Great.

978
00:51:02,860 --> 00:51:03,900
Nikolay: Pay your tax.

979
00:51:06,300 --> 00:51:07,960
Michael: And society will be better.

980
00:51:08,700 --> 00:51:11,320
Nikolay: But don't get into jail
with these cliffs.

981
00:51:11,320 --> 00:51:11,820
Oh

982
00:51:13,700 --> 00:51:17,240
Michael: Interesting, yeah, thank
you so much Nikolay as always

983
00:51:17,420 --> 00:51:19,040
and catch you next week.