1
00:00:00,060 --> 00:00:01,900
Nikolay: Hello, hello, this is
Postgres.FM.

2
00:00:02,220 --> 00:00:05,920
I'm Nik, Postgres.AI, and Michael,
pgMustard.

3
00:00:06,100 --> 00:00:06,860
Hi, Michael.

4
00:00:07,240 --> 00:00:08,300
Michael: Hello, Nik.

5
00:00:08,800 --> 00:00:15,780
Nikolay: So today we talk about locks, heavy
locks, because in Postgres

6
00:00:16,880 --> 00:00:21,440
there is also a concept of lightweight
locks in some other systems

7
00:00:21,440 --> 00:00:22,200
called latches.

8
00:00:23,110 --> 00:00:28,860
And if we say lock without heavy,
it means heavy by default.

9
00:00:30,100 --> 00:00:30,600
Right?

10
00:00:31,460 --> 00:00:32,340
Michael: I think so, yeah.

11
00:00:32,340 --> 00:00:36,660
I think this would be a very tough
discussion or a very shallow

12
00:00:36,660 --> 00:00:39,800
discussion if we were trying to
cover everything that uses the

13
00:00:39,800 --> 00:00:40,940
word lock in Postgres.

14
00:00:40,940 --> 00:00:43,840
But yeah, LW locks are the kind
of latches, the lightweight locks,

15
00:00:43,840 --> 00:00:44,340
right?

16
00:00:44,440 --> 00:00:46,960
Nikolay: So database objects and
row level locks.

17
00:00:46,960 --> 00:00:49,940
This is what we are going to focus
on today.

18
00:00:50,380 --> 00:00:50,880
Perfect.

19
00:00:51,100 --> 00:00:54,400
Yeah, more like closer to application
development, actually,

20
00:00:55,080 --> 00:00:57,540
rather than to internals.

21
00:00:58,680 --> 00:01:02,960
And this is one of the very popular
topics we discuss when we talk

22
00:01:02,960 --> 00:01:07,440
with clients and solve problems
because many incidents are caused

23
00:01:07,440 --> 00:01:11,660
by lack of understanding when people
don't realize how locking

24
00:01:12,040 --> 00:01:18,180
works, don't go into details there
and they have storms of so-called

25
00:01:18,340 --> 00:01:20,040
lock contention, right?

26
00:01:20,280 --> 00:01:26,280
Spikes of active sessions, waiting
on lock acquisition, and sometimes

27
00:01:26,280 --> 00:01:28,380
database can be down, even.

28
00:01:28,780 --> 00:01:30,700
This is the problem we can...

29
00:01:30,700 --> 00:01:32,840
Michael: Yeah, well, it came up
in our recent episode, didn't

30
00:01:32,840 --> 00:01:33,080
it?

31
00:01:33,080 --> 00:01:35,580
On the 10 dangerous issues.

32
00:01:35,820 --> 00:01:36,680
Nikolay: Oh yeah, exactly.

33
00:01:36,780 --> 00:01:39,780
So it's one of the 10 dangers we
talked about.

34
00:01:40,240 --> 00:01:42,000
And where to start?

35
00:01:42,260 --> 00:01:43,140
Where to start?

36
00:01:43,140 --> 00:01:52,440
I will start with simple, fundamental
idea, which should be written

37
00:01:53,120 --> 00:01:56,440
everywhere in bold you know.

38
00:01:57,180 --> 00:02:00,240
Lock cannot be released until the
very end of transaction.

39
00:02:01,600 --> 00:02:05,400
Heavy lock can be released only
at commit or rollback time, not

40
00:02:05,400 --> 00:02:05,900
earlier.

41
00:02:06,340 --> 00:02:08,080
Michael: Yeah, that's, I like that.

42
00:02:08,320 --> 00:02:09,780
Nikolay: Yeah, it's a good place
to start.

43
00:02:09,780 --> 00:02:15,300
Any lock, regardless of any heavy
lock, again, don't touch lightweight

44
00:02:15,300 --> 00:02:15,800
locks.

45
00:02:15,920 --> 00:02:19,780
Any heavy lock can be released only
when transaction ends.

46
00:02:20,660 --> 00:02:21,600
This should be...

47
00:02:22,360 --> 00:02:25,520
Michael: Once you understand that,
the other issues start to

48
00:02:25,520 --> 00:02:26,020
become

49
00:02:26,400 --> 00:02:27,080
Nikolay: more complex.

50
00:02:27,440 --> 00:02:27,940
Yeah, yeah, yeah.

51
00:02:27,940 --> 00:02:32,920
And that's why we don't want, additionally,
don't want long-running

52
00:02:33,040 --> 00:02:33,540
transactions.

53
00:02:35,660 --> 00:02:38,940
And this is regardless of transaction
isolation level, right?

54
00:02:38,940 --> 00:02:43,440
So it doesn't matter if you acquired
a lock to be held until

55
00:02:43,440 --> 00:02:46,360
the very end, until commit or rollback.

56
00:02:47,220 --> 00:02:53,140
So yeah, even, even very gentle
lock, like access share lock can

57
00:02:53,140 --> 00:02:53,900
be harmful.

58
00:02:55,240 --> 00:02:55,560
Right.

59
00:02:55,560 --> 00:03:00,380
So you should be, you should don't,
don't hold locks.

60
00:03:00,380 --> 00:03:02,220
You don't need to do long.

61
00:03:02,780 --> 00:03:07,120
It's like simple advice, but this
is like understanding.

62
00:03:07,700 --> 00:03:13,520
It sounds super obvious, super
obvious, but I wonder how come

63
00:03:13,520 --> 00:03:19,940
these super obvious things become
obvious only became in my particular

64
00:03:19,940 --> 00:03:22,620
career, they became obvious only
in the middle of it.

65
00:03:22,720 --> 00:03:27,440
Like maybe first time I heard it
from somebody else was maybe

66
00:03:27,440 --> 00:03:29,940
like 10 years ago or so, not 20.

67
00:03:30,480 --> 00:03:33,440
It should be 20 years ago when
I started working with Postgres

68
00:03:33,440 --> 00:03:35,440
and databases like deeper.

69
00:03:36,040 --> 00:03:40,680
It's super obvious, but somehow
you spend like 10 years of your

70
00:03:40,680 --> 00:03:44,320
career thinking you're expert,
database expert, not realizing

71
00:03:44,380 --> 00:03:45,140
this obvious thing.

72
00:03:45,140 --> 00:03:48,840
Maybe feeling it some partially,
but not realizing it.

73
00:03:48,840 --> 00:03:53,300
Maybe it's not delivered properly
in documentation and educational

74
00:03:53,440 --> 00:03:54,720
courses or something.

75
00:03:55,080 --> 00:03:56,140
It should be delivered.

76
00:03:56,720 --> 00:04:00,240
Michael: I think the documentation
does lay it out quite nicely,

77
00:04:00,240 --> 00:04:02,300
but it doesn't give you practical
tips.

78
00:04:02,720 --> 00:04:04,300
Nikolay: As usual, we have no documentation.

79
00:04:04,760 --> 00:04:06,580
Lack of practical tips, yes.

80
00:04:06,580 --> 00:04:08,800
But this is super important practical
tip.

81
00:04:09,940 --> 00:04:12,540
Don't expect you will release it
earlier.

82
00:04:12,540 --> 00:04:17,920
It's super basic, super obvious
to, I think, most of our listeners

83
00:04:17,960 --> 00:04:18,720
right now.

84
00:04:18,940 --> 00:04:22,660
I'm pretty sure like 90% know this,
but I wanted to spend some

85
00:04:22,660 --> 00:04:27,040
time here because I know new people
need it.

86
00:04:27,660 --> 00:04:30,780
Michael: Yeah, well, I mean, it's
also kind of maybe another

87
00:04:30,780 --> 00:04:34,960
case of that, you know, the midwit
meme with the kind of the

88
00:04:34,960 --> 00:04:36,540
beginner has a certain opinion.

89
00:04:36,960 --> 00:04:40,040
As you get more experience in something,
you start to think you

90
00:04:40,040 --> 00:04:41,760
need to do all these different
things.

91
00:04:41,840 --> 00:04:44,760
And then as you get more expert,
maybe in a lot of cases, you

92
00:04:44,760 --> 00:04:46,320
go back to that early opinion.

93
00:04:46,720 --> 00:04:50,600
Nikolay: You start appreciating
basics more and more because

94
00:04:50,600 --> 00:04:54,380
you also think about other people
and you want them to be on

95
00:04:54,380 --> 00:04:58,240
the same page with you if you work
in teams, for example, right?

96
00:04:58,680 --> 00:05:01,980
Michael: Yeah, you value different
things perhaps like reliability

97
00:05:02,200 --> 00:05:05,060
and not having weird edge case
bugs and things.

98
00:05:05,060 --> 00:05:09,680
So yeah, actually in terms of where
to start, we also have, we

99
00:05:09,680 --> 00:05:13,460
had a listener request for this
and they asked, and I think they

100
00:05:13,460 --> 00:05:16,200
phrased it in an interesting way
because I think it's not quite

101
00:05:16,200 --> 00:05:18,240
how I'd like to attack the problem.

102
00:05:18,240 --> 00:05:18,520
Yeah.

103
00:05:18,520 --> 00:05:21,600
They've said, could you explain
various kinds of locks Postgres

104
00:05:21,600 --> 00:05:23,820
has and what to do to avoid them?

105
00:05:23,860 --> 00:05:25,520
You cannot avoid locks.

106
00:05:25,960 --> 00:05:30,260
Yeah, but then what so what is
the question?

107
00:05:30,260 --> 00:05:35,040
I think it's more around understanding
when they're necessary

108
00:05:35,660 --> 00:05:39,060
and what to do to reduce the severity.

109
00:05:39,240 --> 00:05:43,780
So how heavy a lock or what else
is allowed concurrently at the

110
00:05:43,780 --> 00:05:44,620
same time.

111
00:05:45,020 --> 00:05:45,290
Yeah.

112
00:05:45,290 --> 00:05:47,980
So and tips and tricks around that
basically.

113
00:05:48,600 --> 00:05:51,980
Nikolay: Right, so we cannot avoid
locks.

114
00:05:51,980 --> 00:05:55,180
Locks are needed to work in multi-user
environment.

115
00:05:55,660 --> 00:06:02,260
They are mechanism to handle concurrency,
like how many users

116
00:06:02,260 --> 00:06:07,660
can work with the same thing at
the same time somehow and not

117
00:06:07,660 --> 00:06:12,180
interfere with each other to avoid
inconsistency or too many

118
00:06:12,180 --> 00:06:13,300
errors or something.

119
00:06:13,840 --> 00:06:15,180
And you cannot avoid locks.

120
00:06:15,180 --> 00:06:17,860
Even if you read something, you're
already locking something.

121
00:06:19,300 --> 00:06:20,840
And what we want to avoid...

122
00:06:20,860 --> 00:06:23,680
So I will tell you what you wanted,
right?

123
00:06:25,640 --> 00:06:27,040
So right question is...

124
00:06:27,040 --> 00:06:28,440
I hate this term, honestly.

125
00:06:28,620 --> 00:06:32,660
I hate this phrase, but this is
exactly when it's needed.

126
00:06:33,340 --> 00:06:38,060
The right question is how to avoid
waiting on lock acquisition.

127
00:06:39,800 --> 00:06:41,780
This is what we want to reduce.

128
00:06:42,340 --> 00:06:45,160
Again, completely, you cannot get
rid of it completely, but you

129
00:06:45,160 --> 00:06:49,180
want to reduce it as much as possible
because this is when database

130
00:06:49,200 --> 00:06:52,940
cannot do work and you just have
wasted time.

131
00:06:53,360 --> 00:06:56,360
And eventually, again, like I just
said, if you have a lot of

132
00:06:56,360 --> 00:07:01,360
active sessions waiting, it can
even cause downtime, cause outage,

133
00:07:01,500 --> 00:07:02,000
right?

134
00:07:02,460 --> 00:07:07,740
So we want to avoid too much of
waiting and too much again, remembering

135
00:07:07,820 --> 00:07:11,820
our very first episode, what is
too much in OLTP context?

136
00:07:12,720 --> 00:07:16,080
100 milliseconds is already quite
slow SQL query.

137
00:07:16,300 --> 00:07:21,340
If we wait 100 milliseconds, 100,
200 milliseconds, people already

138
00:07:21,340 --> 00:07:25,220
will notice because human perception
is 200 milliseconds roughly,

139
00:07:25,520 --> 00:07:28,840
plus minus, like depending on various
people.

140
00:07:30,060 --> 00:07:31,780
I'm like we are not cats.

141
00:07:32,860 --> 00:07:34,700
Cats have much better reaction.

142
00:07:35,380 --> 00:07:37,620
Imagine if everyone will be a cat.

143
00:07:38,160 --> 00:07:38,760
I'm glad we're

144
00:07:38,760 --> 00:07:40,220
Michael: not making apps for cats.

145
00:07:40,320 --> 00:07:42,580
Nikolay: Yeah, we would need to
optimize much harder.

146
00:07:46,300 --> 00:07:51,060
Because they would recognize slowness
much earlier.

147
00:07:51,900 --> 00:07:53,900
So higher standards.

148
00:07:54,520 --> 00:07:59,600
Anyway, this defines how long we
can allow ourselves to wait,

149
00:07:59,600 --> 00:08:02,800
and also we need to include the
actual work time, right?

150
00:08:02,800 --> 00:08:08,460
So ideally we should wait very
like, okay, close to 0, or not

151
00:08:08,460 --> 00:08:09,400
wait at all.

152
00:08:10,360 --> 00:08:15,260
If we do things properly, we can
almost avoid waiting time.

153
00:08:15,840 --> 00:08:16,860
So this is it.

154
00:08:16,860 --> 00:08:20,820
This is the right question, how
to avoid waiting on lock acquisition.

155
00:08:21,820 --> 00:08:22,320
Agreed?

156
00:08:23,260 --> 00:08:24,860
Michael: Yes, yes, I like that
a lot.

157
00:08:26,140 --> 00:08:31,900
And it's not just about how to
avoid waiting, it's also techniques

158
00:08:32,420 --> 00:08:35,360
to bail out.

159
00:08:35,640 --> 00:08:40,640
So I think maybe it's the same
thing, but I think lock timeouts,

160
00:08:40,640 --> 00:08:45,600
for example, is not just about
shortening the wait time, right?

161
00:08:46,400 --> 00:08:50,220
It's about bailing out and saying,
actually, we don't need to

162
00:08:50,220 --> 00:08:52,620
do this right now, let's wait for
a better time.

163
00:08:52,640 --> 00:08:53,260
Nikolay: Great point.

164
00:08:53,260 --> 00:08:56,500
So you're talking about the same
thing, but from different end.

165
00:08:56,760 --> 00:09:05,040
It's like, we want to avoid, I
mean, we, like causing some queries,

166
00:09:05,140 --> 00:09:09,620
transactions, sending them to database,
we want to avoid waiting.

167
00:09:09,620 --> 00:09:12,520
For example, if we say for select
for bed skip locked, we won't

168
00:09:12,520 --> 00:09:19,100
be waiting or no wait to have an
error sooner.

169
00:09:19,600 --> 00:09:27,280
But if we do something, it's good
to think about others and don't

170
00:09:27,280 --> 00:09:29,600
let them wait on us too much.

171
00:09:29,820 --> 00:09:31,660
That's why I walk them out, right?

172
00:09:32,120 --> 00:09:34,780
So it's a two-sided problem.

173
00:09:35,600 --> 00:09:37,940
We don't wait and others don't
wait too much.

174
00:09:38,260 --> 00:09:38,860
I like this.

175
00:09:38,860 --> 00:09:39,360
Perfect.

176
00:09:39,640 --> 00:09:40,880
This completes the picture.

177
00:09:41,600 --> 00:09:41,680
Michael: Yeah.

178
00:09:41,680 --> 00:09:42,180
Nice.

179
00:09:42,980 --> 00:09:46,460
So in terms of the first part of
the question, though, kind of

180
00:09:46,460 --> 00:09:47,580
various kinds of locks.

181
00:09:47,580 --> 00:09:49,940
Should we start on the table level
ones?

182
00:09:50,380 --> 00:09:51,040
Do you reckon?

183
00:09:51,040 --> 00:09:54,900
Nikolay: Right, but before we go
there, honestly, like, I don't

184
00:09:55,640 --> 00:09:57,440
keep this in my mind all the time.

185
00:09:57,440 --> 00:09:57,880
Michael: Yeah, good.

186
00:09:57,880 --> 00:10:01,020
Nikolay: Yeah, this baggage, like,
I just want, like, this is

187
00:10:01,020 --> 00:10:03,400
simple, simple way to think about
it.

188
00:10:03,580 --> 00:10:05,940
There are exclusive locks and share
locks.

189
00:10:06,760 --> 00:10:10,700
There are row-level locks and table-level
locks, database object-level

190
00:10:10,840 --> 00:10:11,340
locks.

191
00:10:11,680 --> 00:10:13,400
This is enough for understanding.

192
00:10:13,520 --> 00:10:18,300
Then you can use documentation
as reference, or there was a great

193
00:10:18,480 --> 00:10:22,540
Citus blog post, several actually
blog posts, but I especially

194
00:10:22,580 --> 00:10:27,680
like the old 1 by Marco Slot, Postgres
rocks except when it blocks,

195
00:10:28,180 --> 00:10:29,420
understanding locks.

196
00:10:29,540 --> 00:10:36,040
I like it because it translates
the table of conflicts the documentation

197
00:10:36,100 --> 00:10:39,780
has, it translates it to some more
practical form.

198
00:10:41,880 --> 00:10:45,680
What kind of operation can be blocked
by what kind?

199
00:10:45,920 --> 00:10:50,280
So instead of, it shifts language
from locks to operations we

200
00:10:50,280 --> 00:10:52,940
have, like selects, alter table,
and so on.

201
00:10:53,120 --> 00:10:58,640
So I don't keep this reference
style in my head almost never.

202
00:10:58,840 --> 00:11:02,160
I know something experience gives
me, But understanding this,

203
00:11:02,160 --> 00:11:05,580
there can be share lock, exclusive
lock, row level, table level,

204
00:11:05,580 --> 00:11:08,680
it's already good enough, you know?

205
00:11:08,680 --> 00:11:11,580
Michael: Yeah, because it makes
you think, before I do this,

206
00:11:11,580 --> 00:11:12,900
what kind of lock?

207
00:11:12,900 --> 00:11:15,560
And then you can look that up,
you can look up for the specific

208
00:11:15,780 --> 00:11:19,460
command or the specific thing we
want to be able to do, what

209
00:11:19,460 --> 00:11:21,000
does the documentation say?

210
00:11:21,140 --> 00:11:24,140
Or, and this is maybe a good time
to bring up, there's a new-ish

211
00:11:24,140 --> 00:11:28,140
tool, I think it came out maybe
a year or 2 ago, from Hussein

212
00:11:28,140 --> 00:11:31,180
Nasser, a friend of the show, called...
is it pglocks.org?

213
00:11:32,380 --> 00:11:32,860
Nikolay: Yes.

214
00:11:32,860 --> 00:11:36,500
It's a very good attempt to translate
what documentation has

215
00:11:36,500 --> 00:11:41,960
to some better form to consume
with easier I think actually I

216
00:11:41,960 --> 00:11:45,020
think actually documentation could
could just have it as is it

217
00:11:45,020 --> 00:11:47,240
would be great Maybe

218
00:11:47,380 --> 00:11:48,360
Michael: that would be awesome.

219
00:11:48,520 --> 00:11:51,860
Yeah, but in in addition I actually
thought it was a relatively

220
00:11:51,960 --> 00:11:53,760
good, like, it's very browsable.

221
00:11:54,100 --> 00:11:56,280
And I think it's quite a good educational
tool.

222
00:11:56,280 --> 00:11:59,680
Like, oh, I didn't realize this
could be done at the same time

223
00:11:59,680 --> 00:12:00,360
as this.

224
00:12:00,720 --> 00:12:05,420
Or this couldn't be this would
take this kind of level of lock

225
00:12:05,500 --> 00:12:09,220
so it was quite a nice way of browsing
as well I thought for

226
00:12:09,220 --> 00:12:13,260
me personally from some commands
I hadn't considered like I hadn't

227
00:12:13,260 --> 00:12:16,500
really considered for example the
locks that for example that

228
00:12:16,500 --> 00:12:19,280
like vacuum I had thought about
but analyze I hadn't thought

229
00:12:19,280 --> 00:12:22,660
about the kind of analyze from
a locking perspective.

230
00:12:22,720 --> 00:12:25,800
So it's really interesting seeing
that in some of the lists and

231
00:12:25,800 --> 00:12:29,960
thinking, oh, that means you couldn't,
you know, run 2 at the

232
00:12:29,960 --> 00:12:33,060
same time, or some, there were
some other interesting things

233
00:12:33,100 --> 00:12:35,040
and I had never considered before.

234
00:12:35,800 --> 00:12:40,740
Nikolay: Yeah yeah yeah so you
analyze like everything is locking

235
00:12:40,940 --> 00:12:48,020
everything yeah even selects are
locking and not only like selects

236
00:12:48,580 --> 00:12:53,580
if you have planning time by default
you have in queries it locks

237
00:12:53,580 --> 00:12:56,620
all all indexes as well with share
lock, right?

238
00:12:57,040 --> 00:12:57,780
Share lock.

239
00:12:57,780 --> 00:12:59,100
And we talked about it.

240
00:12:59,140 --> 00:13:04,000
This can be bottleneck if you need
to run a lot of these queries

241
00:13:04,000 --> 00:13:07,160
per second, like thousands of them
per second, it can be bottleneck

242
00:13:07,200 --> 00:13:10,220
that you need to lock a lot of
queries.

243
00:13:10,520 --> 00:13:15,060
There we, this topic has a bridge
to lightweight lock discussion,

244
00:13:15,060 --> 00:13:16,740
but it starts with heavy locks.

245
00:13:17,020 --> 00:13:20,740
So the need to have many heavy
locks can lead to lightweight

246
00:13:20,740 --> 00:13:21,440
lock contention.

247
00:13:23,260 --> 00:13:29,380
So yeah, and this means that understanding
what's locking, what's

248
00:13:29,380 --> 00:13:31,580
blocking others is great.

249
00:13:32,160 --> 00:13:38,640
But again, at a very basic level,
if you change something, you

250
00:13:38,640 --> 00:13:40,920
need to have exclusive lock.

251
00:13:40,920 --> 00:13:45,280
For example, if you update some
rows, you need an exclusive lock

252
00:13:45,280 --> 00:13:46,200
on these rows.

253
00:13:47,600 --> 00:13:50,880
And this means it will conflict
with others who want to change

254
00:13:50,880 --> 00:13:51,380
it.

255
00:13:51,780 --> 00:13:56,180
Michael: Yeah, so exclusive is
you can only have 1 at a time.

256
00:13:56,200 --> 00:13:56,700
Exactly.

257
00:13:56,880 --> 00:13:59,140
Share means you can have multiple
at the same time.

258
00:13:59,140 --> 00:14:03,220
Nikolay: If you read rows, Well,
for rows we don't have shared

259
00:14:03,220 --> 00:14:08,480
locks, but with table level, if
you change table schema, again,

260
00:14:08,480 --> 00:14:11,740
it's table level exclusive lock.

261
00:14:11,840 --> 00:14:15,680
If you just deal with table, including
selects, you need shared

262
00:14:15,680 --> 00:14:16,980
lock on that table, right?

263
00:14:17,080 --> 00:14:18,020
At table level.

264
00:14:18,740 --> 00:14:23,240
So of course, database level, object
level locks and row level

265
00:14:23,240 --> 00:14:26,260
locks, they are kind of different
beasts, right?

266
00:14:27,440 --> 00:14:29,840
Different beasts, because their
implementation is different and

267
00:14:29,840 --> 00:14:30,340
so on.

268
00:14:30,340 --> 00:14:31,320
Behavior is different.

269
00:14:31,420 --> 00:14:34,100
But still, the rule can be applied
directly.

270
00:14:34,400 --> 00:14:39,640
If we have multiple reading operations
that don't conflict, and

271
00:14:39,640 --> 00:14:42,840
they, by default, they are not
blocked by others.

272
00:14:43,020 --> 00:14:44,700
But they can be blocked indirectly.

273
00:14:46,520 --> 00:14:50,740
And this is my favorite example,
when we have long-lasting transaction

274
00:14:50,800 --> 00:14:54,060
which just read something from
a table, maybe 0 rows, no rows

275
00:14:54,060 --> 00:14:58,060
read, like just select from table,
limit 0, right?

276
00:14:58,320 --> 00:15:01,080
But it already acquired a share
lock on this table.

277
00:15:01,080 --> 00:15:05,860
Then some other table comes, it
waits on us.

278
00:15:06,140 --> 00:15:10,080
And then we have a lot of selects
after it coming to this table

279
00:15:10,080 --> 00:15:14,700
and they cannot read because they
are waiting on this man in

280
00:15:14,700 --> 00:15:16,500
the middle, transaction in the
middle.

281
00:15:17,540 --> 00:15:21,140
So yeah, we have a chain or queue
of locking issues and this

282
00:15:21,140 --> 00:15:25,520
is a nasty situation which is inevitable
if you don't have lock

283
00:15:25,520 --> 00:15:26,600
timeout and retries.

284
00:15:27,400 --> 00:15:27,900
Right?

285
00:15:29,340 --> 00:15:33,480
And this feels like we have a select
which is blocking other

286
00:15:33,480 --> 00:15:33,980
selects.

287
00:15:34,340 --> 00:15:35,140
How come?

288
00:15:35,860 --> 00:15:38,300
Okay, because there is an alter
table in the middle.

289
00:15:38,360 --> 00:15:39,120
That's why.

290
00:15:39,600 --> 00:15:40,100
Okay.

291
00:15:41,320 --> 00:15:44,240
Michael: But yeah, I like that
description because it hammers

292
00:15:44,240 --> 00:15:46,100
home a few things we've talked
about in the past.

293
00:15:46,100 --> 00:15:50,740
Like, this is a reason not to have
excessive long running transactions

294
00:15:51,040 --> 00:15:56,980
because it can be that 1 blocking
or it can get in the way of

295
00:15:56,980 --> 00:16:00,920
a misguided migration or table
or something.

296
00:16:01,800 --> 00:16:02,940
But yeah, then there's...

297
00:16:03,080 --> 00:16:04,140
Nikolay: But it's not enough.

298
00:16:04,900 --> 00:16:07,540
Michael: Yes, but on its own it's
not enough, because we also

299
00:16:07,540 --> 00:16:10,420
need the lock timeout and retries
for the migrations.

300
00:16:11,200 --> 00:16:13,340
And maybe 1 on their own would
actually be fine.

301
00:16:13,340 --> 00:16:15,760
If you only implemented 1 or the
other for a while, you might

302
00:16:15,760 --> 00:16:16,420
be okay.

303
00:16:16,800 --> 00:16:17,560
But then

304
00:16:17,560 --> 00:16:18,900
Nikolay: the vacuum comes,

305
00:16:19,220 --> 00:16:19,720
Michael: right?

306
00:16:19,840 --> 00:16:20,000
And

307
00:16:20,000 --> 00:16:23,740
Nikolay: the vacuum always is blocking,
but it has a detection

308
00:16:23,800 --> 00:16:24,300
mechanism.

309
00:16:26,080 --> 00:16:30,140
It's blocking always, but if it's
not running in transaction

310
00:16:30,140 --> 00:16:33,440
ID wraparound prevention mode,
it will kill itself.

311
00:16:34,660 --> 00:16:38,520
But if it is running in transaction
ID wraparound it will not

312
00:16:38,520 --> 00:16:42,700
kill itself and this is terrible
because like you don't have

313
00:16:42,700 --> 00:16:45,480
long running transactions but you
still have this problem I think

314
00:16:46,260 --> 00:16:50,260
having low lock_timeout and retries
for DDL is inevitable.

315
00:16:51,260 --> 00:16:54,520
It's really needed for everyone.

316
00:16:56,680 --> 00:17:00,060
Michael: It's a good point though,
is vacuum the only thing that

317
00:17:00,060 --> 00:17:05,720
will take a lock and kill itself
if something important comes

318
00:17:05,720 --> 00:17:06,140
along?

319
00:17:06,140 --> 00:17:07,080
Nikolay: I don't know.

320
00:17:07,660 --> 00:17:09,060
I don't remember either.

321
00:17:09,240 --> 00:17:10,240
Yeah, maybe, yeah.

322
00:17:10,240 --> 00:17:10,680
Michael: Yeah.

323
00:17:10,680 --> 00:17:14,100
I mean, it makes sense because
it's kind of like background work

324
00:17:14,100 --> 00:17:15,940
that can be done at a later point.

325
00:17:16,260 --> 00:17:18,840
There isn't much else that fits
that bill.

326
00:17:19,540 --> 00:17:20,780
But yeah, cool.

327
00:17:22,040 --> 00:17:22,480
Nikolay: Good.

328
00:17:22,480 --> 00:17:28,120
So what else I wanted to say, it's
it feels like monitoring systems

329
00:17:28,840 --> 00:17:31,800
don't care about real problems.

330
00:17:32,120 --> 00:17:35,640
They show you this is number of
share locks, this is number of

331
00:17:35,640 --> 00:17:37,060
exclusive locks, bye bye.

332
00:17:37,060 --> 00:17:38,940
Datadog does it, for example.

333
00:17:38,940 --> 00:17:40,240
It's super popular, right?

334
00:17:40,680 --> 00:17:44,100
And people come to us saying, oh,
we have issues with locks, and

335
00:17:44,100 --> 00:17:49,780
they show us like big spike of
locks, like access share locks.

336
00:17:50,420 --> 00:17:55,580
Okay, you had some intensive reading
happening, so why should

337
00:17:55,580 --> 00:17:57,320
we care about this at all, right?

338
00:17:59,340 --> 00:18:06,080
And so such thing as good lock dashboard
doesn't exist yet.

339
00:18:06,880 --> 00:18:08,300
I haven't seen it yet.

340
00:18:08,360 --> 00:18:14,420
There are good attempts, but the
problem also lies in the fact

341
00:18:14,440 --> 00:18:17,780
that it's really hard to understand
who is waiting on what.

342
00:18:18,040 --> 00:18:22,900
So it's really easy to understand,
but to reconstruct the whole

343
00:18:23,320 --> 00:18:30,560
chain of or tree of blocking events
like processes, backends,

344
00:18:30,600 --> 00:18:31,100
right?

345
00:18:31,380 --> 00:18:36,220
1 backend can be waiting on other,
that 1 can be waiting on another,

346
00:18:36,220 --> 00:18:36,600
and so on.

347
00:18:36,600 --> 00:18:37,660
It can be a tree.

348
00:18:38,140 --> 00:18:41,640
And actually, it can be a forest
of trees, because there might

349
00:18:41,640 --> 00:18:44,620
be several roots and you have multiple
trees.

350
00:18:45,360 --> 00:18:49,860
And in this case, what helps is
a function called...

351
00:18:50,280 --> 00:18:52,500
Well, let's step back.

352
00:18:53,400 --> 00:19:00,360
So, I recommend to everyone to
enable log_lock_waits, because by

353
00:19:00,360 --> 00:19:02,420
default it's not enabled, it's
off.

354
00:19:02,960 --> 00:19:04,700
Everyone should have it on.

355
00:19:05,220 --> 00:19:11,340
In this case, you will see every
time some session waits for

356
00:19:11,520 --> 00:19:16,580
1 second, not being able to acquire
a lock.

357
00:19:16,760 --> 00:19:18,980
This situation will be logged.

358
00:19:19,640 --> 00:19:20,820
Actually, not 1 second.

359
00:19:21,680 --> 00:19:24,260
More precisely, deadlock_timeout,
right?

360
00:19:24,380 --> 00:19:27,180
Because after deadlock_timeout,
some checks are happening.

361
00:19:27,180 --> 00:19:29,800
This is exactly this logging can
happen in Postgres.

362
00:19:30,480 --> 00:19:33,280
Michael: Yet another parameter
that's being used for 2 things.

363
00:19:33,840 --> 00:19:36,980
Nikolay: Yeah, well it's indirect,
it's not straightforward,

364
00:19:37,260 --> 00:19:37,720
and

365
00:19:37,720 --> 00:19:38,200
it is

366
00:19:38,200 --> 00:19:39,260
what it is, right.

367
00:19:39,600 --> 00:19:42,880
But, and also some people change
it, sometimes it's 2 seconds

368
00:19:42,880 --> 00:19:43,820
or 5 seconds.

369
00:19:44,040 --> 00:19:50,080
Some people try to postpone that
lock detection, thinking maybe

370
00:19:50,080 --> 00:19:52,040
it will be resolved.

371
00:19:52,040 --> 00:19:52,920
Michael: Resolve itself.

372
00:19:53,420 --> 00:19:54,520
Nikolay: Not resolve itself.

373
00:19:55,800 --> 00:19:58,240
Michael: Well, but the problem
is you might want different settings.

374
00:19:58,260 --> 00:20:01,020
Nikolay: I would tune it in a different
direction, resolve earlier

375
00:20:01,320 --> 00:20:01,820
maybe.

376
00:20:02,300 --> 00:20:04,240
It's an interesting topic, right?

377
00:20:04,240 --> 00:20:07,360
So maybe we should talk about deadlocks
in a few minutes.

378
00:20:07,800 --> 00:20:11,660
So yeah, I wanted to like, speaking
of observability, how to

379
00:20:11,660 --> 00:20:15,040
deal with locks, it's really not
easy because you enable this,

380
00:20:15,040 --> 00:20:16,580
It's already something, it's great.

381
00:20:17,200 --> 00:20:22,580
You see process ID of victim, let's
use this word, why not?

382
00:20:23,420 --> 00:20:24,120
Michael: And- Waiter.

383
00:20:24,520 --> 00:20:25,940
Nikolay: Yeah, offenders.

384
00:20:27,500 --> 00:20:29,560
Sometimes maybe multiple, right?

385
00:20:30,060 --> 00:20:35,500
And then you see, since it's your
session who is victim, well,

386
00:20:35,500 --> 00:20:40,280
not your session, but it's the
session for which this analysis

387
00:20:40,280 --> 00:20:45,540
was performed, so we see the text
of the query for the session

388
00:20:45,540 --> 00:20:49,660
who is waiting, but we don't see
details for transactions.

389
00:20:49,920 --> 00:20:52,220
There might be actually an idle
transaction.

390
00:20:52,720 --> 00:20:55,900
There might be no query, although
in pg_stat_activity we would

391
00:20:55,900 --> 00:20:58,760
see the last query executed in
that transaction.

392
00:20:59,240 --> 00:21:03,480
It would probably help, but sometimes
it's active state, state

393
00:21:03,480 --> 00:21:07,540
equals active in participant activity,
and if we were able to

394
00:21:07,540 --> 00:21:09,640
see the query, it would help.

395
00:21:09,720 --> 00:21:11,660
Unfortunately, in logging, you
cannot.

396
00:21:12,040 --> 00:21:17,120
This is, this is like a little
bit annoying because if you need

397
00:21:17,120 --> 00:21:21,720
to do post-mortem analysis like
root cause analysis for something

398
00:21:21,860 --> 00:21:28,260
in recent past you see only 1 side
query but you don't see another

399
00:21:28,260 --> 00:21:32,000
side and it takes time to understand
like actually it's impossible

400
00:21:32,040 --> 00:21:34,460
sometimes to understand what caused
it.

401
00:21:35,060 --> 00:21:41,400
So, and I remember we even implemented
our own mechanism to log

402
00:21:41,400 --> 00:21:43,280
additional information in such
cases.

403
00:21:43,580 --> 00:21:48,400
So using just raise notice in PL/pgSQL,
but it's not fun, honestly,

404
00:21:48,480 --> 00:21:49,420
to do this.

405
00:21:50,320 --> 00:21:53,300
And in some cases, it's reasonable
to do this if you, for example,

406
00:21:53,300 --> 00:21:58,520
have very strong tooling for log
analysis, like enterprise level.

407
00:21:58,520 --> 00:22:03,960
In this case, I would do more logging
using PL/pgSQL, for example,

408
00:22:03,960 --> 00:22:09,100
and let the tool visualize the
number of errors we have and provide

409
00:22:09,100 --> 00:22:11,620
details and maybe notifications
and so on.

410
00:22:11,820 --> 00:22:16,960
So in this case, the question is
how to implement good monitoring

411
00:22:16,960 --> 00:22:17,620
for this.

412
00:22:18,420 --> 00:22:21,720
And good monitoring for analysis,
we have a query, right?

413
00:22:22,060 --> 00:22:28,380
Among our how-tos I wrote, there
is a huge query which had big

414
00:22:28,380 --> 00:22:28,880
evolution.

415
00:22:30,040 --> 00:22:32,820
I took it from some people, they
took it from other people.

416
00:22:33,320 --> 00:22:35,020
So some evolution happened there.

417
00:22:35,020 --> 00:22:36,240
And it's great.

418
00:22:36,620 --> 00:22:41,320
I think 100 plus lines of query,
which provides all the details.

419
00:22:41,320 --> 00:22:45,960
And you see the whole forest of
trees, including all states,

420
00:22:47,020 --> 00:22:50,340
weight events, sometimes it matters,
and also queries.

421
00:22:51,220 --> 00:22:55,520
The only 1 problem with this, because
it requires pg blocking

422
00:22:55,600 --> 00:22:56,540
pids call.

423
00:22:57,340 --> 00:23:02,380
This function gives you the whole
list of process IDs which is

424
00:23:02,380 --> 00:23:03,940
blocking this process ID.

425
00:23:05,740 --> 00:23:09,640
And the problem with it, it's quite
like sometimes expensive.

426
00:23:09,720 --> 00:23:10,740
It's not for free.

427
00:23:10,980 --> 00:23:15,620
So limit yourself with some statement
timeout, not to cause observer

428
00:23:15,620 --> 00:23:21,100
effect because when there is a
huge acute spike or not acute

429
00:23:21,100 --> 00:23:26,700
just spike of some like storm of
lock acquisition weight events

430
00:23:27,800 --> 00:23:32,860
in this case this function can
cause additional trouble sometimes.

431
00:23:33,480 --> 00:23:35,900
Michael: So is the, is, does that
mean that the times it would

432
00:23:35,900 --> 00:23:39,320
be most useful is the most likely
for it to time out?

433
00:23:40,240 --> 00:23:41,380
Nikolay: Well, yeah, yeah.

434
00:23:41,380 --> 00:23:45,520
In general, in most cases, I, I
warned about this, but it's not,

435
00:23:45,560 --> 00:23:47,540
It doesn't mean that it always
happens.

436
00:23:47,540 --> 00:23:49,700
It happens rarely, this thing.

437
00:23:49,900 --> 00:23:53,140
But it's quite wise to limit yourself,
I don't know, like half

438
00:23:53,140 --> 00:23:54,880
a second, maybe up to 1 second.

439
00:23:55,240 --> 00:23:56,120
Michael: Okay, yeah, that's

440
00:23:56,120 --> 00:23:56,280
Nikolay: quite.

441
00:23:56,280 --> 00:24:00,040
And not to call it too frequently,
like not every 100 milliseconds,

442
00:24:00,060 --> 00:24:03,480
like I do sometimes with some queries,
not with this.

443
00:24:03,480 --> 00:24:03,980
Yeah.

444
00:24:04,060 --> 00:24:08,300
During observing something, like
manually, like I just see like

445
00:24:08,300 --> 00:24:10,820
almost animated state of some results.

446
00:24:11,400 --> 00:24:15,360
But in this case, it's better to
do it less frequently with statement

447
00:24:15,360 --> 00:24:15,860
timeout.

448
00:24:15,940 --> 00:24:19,040
But once you do this, you bring
this to monitoring.

449
00:24:19,200 --> 00:24:19,940
It's beautiful.

450
00:24:21,020 --> 00:24:21,700
It's great.

451
00:24:23,080 --> 00:24:23,580
Michael: Yeah.

452
00:24:23,940 --> 00:24:27,860
With monitoring, it feels like
sometimes we want to catch every

453
00:24:28,200 --> 00:24:31,240
example of an issue for some types
of issue.

454
00:24:31,240 --> 00:24:35,380
But sometimes it's OK if we're
just sampling and if we catch

455
00:24:35,380 --> 00:24:38,000
that sometimes it's an issue, that's
a sign that there's a problem

456
00:24:38,000 --> 00:24:41,420
in our code or a problem in our
team and that we don't have the

457
00:24:41,420 --> 00:24:43,560
education around these types of
problems.

458
00:24:44,120 --> 00:24:47,900
And I understand that this might
be 1 of those ones where you

459
00:24:47,900 --> 00:24:51,100
do want to catch as many of them
as you can, but I think it's

460
00:24:51,100 --> 00:24:55,260
more important that we realize
that we even have any migrations

461
00:24:55,340 --> 00:25:00,420
that don't have this, or if we
are doing updates in an order

462
00:25:00,420 --> 00:25:03,780
or transactions in an order that
could end up with deadlocks

463
00:25:03,820 --> 00:25:06,000
or complex locking trees.

464
00:25:06,740 --> 00:25:07,060
I don't

465
00:25:07,060 --> 00:25:10,120
Nikolay: know, I would prefer to
have exact information here

466
00:25:10,120 --> 00:25:13,300
in the void sample, but I live
in a world where max connections

467
00:25:13,320 --> 00:25:14,660
doesn't equal 10,000.

468
00:25:15,240 --> 00:25:19,460
If you're on RDS you can have like
multiple thousand max connections,

469
00:25:19,460 --> 00:25:20,340
multiple thousand.

470
00:25:20,460 --> 00:25:25,120
In this case, you have a storm
of active sessions, many thousands

471
00:25:25,120 --> 00:25:26,120
of active sessions.

472
00:25:26,120 --> 00:25:30,980
And then they like to, of course,
in this case, it's a different

473
00:25:30,980 --> 00:25:31,420
situation.

474
00:25:31,420 --> 00:25:34,080
But for me, it's not a healthy
situation.

475
00:25:34,160 --> 00:25:39,060
I would prefer to have like we
have number of cores multiplied

476
00:25:39,400 --> 00:25:40,960
maximum by 10, maximum.

477
00:25:41,320 --> 00:25:43,840
This should be your absolute maximum
for max connections.

478
00:25:43,840 --> 00:25:48,540
In this case you cannot have too
many records to analyze, right?

479
00:25:48,820 --> 00:25:52,260
And in this case it's good to have
exact information because

480
00:25:52,300 --> 00:25:56,280
if you start sampling you might
miss something because 1 line

481
00:25:56,280 --> 00:25:58,260
can matter a lot in this analysis,
you know?

482
00:25:58,260 --> 00:25:58,760
Yeah.

483
00:25:58,860 --> 00:25:59,360
Yeah.

484
00:25:59,820 --> 00:26:04,660
So of course You have like, I don't
know, 96 cores, your max

485
00:26:04,660 --> 00:26:05,940
connections should be 400.

486
00:26:07,240 --> 00:26:10,540
In the worst case, we will have
400 backends to analyze.

487
00:26:12,260 --> 00:26:16,900
It's a lot, but It will be only
during incident.

488
00:26:17,220 --> 00:26:18,940
Hopefully it's not every day.

489
00:26:19,860 --> 00:26:22,540
If you have incident every day,
you know what to do.

490
00:26:24,640 --> 00:26:26,680
Ask people who can help, right?

491
00:26:27,140 --> 00:26:33,560
Anyway, 5,000, I agree, like 5,
000 of same thing, actually.

492
00:26:33,680 --> 00:26:35,500
They all are waiting on the same
thing.

493
00:26:35,860 --> 00:26:37,780
Why should we log them all?

494
00:26:38,140 --> 00:26:40,820
Or why should we bring them all
to monitoring?

495
00:26:41,180 --> 00:26:42,600
Of course, it doesn't make sense.

496
00:26:43,360 --> 00:26:46,440
But if it's just a few hundreds,
I don't know.

497
00:26:46,720 --> 00:26:47,940
Should be fine.

498
00:26:47,980 --> 00:26:48,840
Should be fine.

499
00:26:49,460 --> 00:26:54,300
Again, under normal circumstances
with 96 cores, you should have

500
00:26:54,380 --> 00:26:57,420
up to, I'd say, 50 active sessions.

501
00:26:58,140 --> 00:27:02,760
And most of them are not waiting, so these analysis won't bring  

502
00:27:02,760 --> 00:27:03,260
anything.  

503
00:27:04,280 --> 00:27:09,300
And if your system is healthy, in most cases, this query will  

504
00:27:09,300 --> 00:27:10,900
produce empty results.  

505
00:27:11,600 --> 00:27:12,100
Empty.  

506
00:27:12,560 --> 00:27:13,300
That's great.  

507
00:27:13,440 --> 00:27:18,360
So normal state is 0 rows in the result set.  

508
00:27:19,140 --> 00:27:22,800
Michael: Well, I guess normal state could still be, like, normal  

509
00:27:22,800 --> 00:27:26,360
state is still up to a few rows, but the main point is that they  

510
00:27:26,360 --> 00:27:27,860
shouldn't have been waiting for too long.  

511
00:27:27,860 --> 00:27:28,720
Is that fair?  

512
00:27:29,160 --> 00:27:29,800
Nikolay: Yeah, yeah, yeah.  

513
00:27:29,800 --> 00:27:33,880
Well, you can wait a little bit, but sometimes you have spikes,  

514
00:27:33,900 --> 00:27:39,820
some small spikes, but in healthy systems, we just see that 000,  

515
00:27:40,080 --> 00:27:42,460
even if the system is very heavily loaded.  

516
00:27:42,740 --> 00:27:45,420
And then some small spike and then 000, like this.  

517
00:27:45,420 --> 00:27:45,920
Yeah.  

518
00:27:46,420 --> 00:27:52,160
So, yeah, also wanted to mention that, again, connection to indirect  

519
00:27:52,160 --> 00:27:53,740
connection to lightweight locks.  

520
00:27:54,020 --> 00:27:58,620
So wait event when some backend is waiting on another to acquire  

521
00:27:58,620 --> 00:28:04,400
a lock, heavy lock, you will see wait event equals lock, right?  

522
00:28:04,920 --> 00:28:06,560
Or wait event type lock.  

523
00:28:07,360 --> 00:28:09,740
So this means we are waiting.  

524
00:28:10,440 --> 00:28:13,220
And again, lock means heavy lock.  

525
00:28:15,480 --> 00:28:16,160
What else?  

526
00:28:17,780 --> 00:28:20,460
Michael: Well, is it a good time to go to, I mean, you mentioned  

527
00:28:20,540 --> 00:28:24,480
that these lock trees could get complex, but they could also  

528
00:28:24,480 --> 00:28:26,020
be loops, right?  

529
00:28:26,200 --> 00:28:28,320
In the case of deadlocks, is that is?  

530
00:28:28,320 --> 00:28:29,320
Nikolay: Oh, that's interesting.  

531
00:28:29,640 --> 00:28:29,900
Yeah.  

532
00:28:29,900 --> 00:28:30,160
Yeah.  

533
00:28:30,160 --> 00:28:34,700
So it's like the only solution to deadlock somebody should die.  

534
00:28:36,600 --> 00:28:37,400
Michael: Well, yeah.  

535
00:28:38,320 --> 00:28:41,340
But now I'm talking more about prevention, right?  

536
00:28:41,440 --> 00:28:41,680
Oh, yeah.  

537
00:28:41,680 --> 00:28:46,420
I know it can get complex, but I feel like most cases of deadlocks  

538
00:28:46,480 --> 00:28:52,600
I've heard about have been what I would class as kind of poor  

539
00:28:52,600 --> 00:28:54,320
design on the application side.  

540
00:28:54,320 --> 00:28:54,820
Nikolay: Exactly.  

541
00:28:57,440 --> 00:29:01,080
Michael: So very avoidable with the right consideration around  

542
00:29:01,080 --> 00:29:05,040
locking and around ordering of what you do in which order.  

543
00:29:05,460 --> 00:29:10,840
Nikolay: Yeah, actually, let me correct myself a few phrases  

544
00:29:10,840 --> 00:29:11,200
ago.  

545
00:29:11,200 --> 00:29:14,500
I said wait event, it should be wait event type equals lock  

546
00:29:14,500 --> 00:29:18,020
and there are several additional wait events in this category  

547
00:29:18,120 --> 00:29:22,580
or under this type, which can help understand at which level,  

548
00:29:22,580 --> 00:29:28,300
for example, we have a lock we are trying to acquire.  

549
00:29:29,240 --> 00:29:34,500
Is it relation or row-level tuple or something else, like maybe  

550
00:29:34,500 --> 00:29:36,000
advisory locks also.  

551
00:29:36,280 --> 00:29:36,780
Yeah.

552
00:29:36,780 --> 00:29:37,700
You remember this?

553
00:29:37,700 --> 00:29:39,800
Like user-defined mechanism.

554
00:29:40,760 --> 00:29:41,500
So yeah.

555
00:29:41,920 --> 00:29:47,180
And speaking of deadlocks, it's always
Here exactly we can apply

556
00:29:47,180 --> 00:29:49,440
this anti-methodology by Brendan
Gregg.

557
00:29:49,840 --> 00:29:51,720
Blame somebody else.

558
00:29:53,620 --> 00:29:57,320
It's always a problem with, unfortunately,
it's always a problem

559
00:29:57,320 --> 00:30:00,260
with applications, with workloads,
maybe users.

560
00:30:00,320 --> 00:30:02,060
So it's not a database problem.

561
00:30:03,280 --> 00:30:09,520
So the sequence of events is not
good, and it should be changed.

562
00:30:09,520 --> 00:30:13,860
It should be redesigned so people
don't conflict in this way.

563
00:30:14,180 --> 00:30:17,860
And in many cases, I remember dealing
with deadlocks was really

564
00:30:17,860 --> 00:30:22,200
challenging because if you if it's
not you who wrote this code

565
00:30:22,200 --> 00:30:26,320
of And you need to explain they
need to change in many cases.

566
00:30:26,320 --> 00:30:30,040
It's easier like to just to accept
the loss You know if it's

567
00:30:30,040 --> 00:30:32,840
just a couple of deadlocks per
day, it's okay.

568
00:30:32,840 --> 00:30:36,300
If you have a heavily loaded system,
it's not that bad.

569
00:30:37,820 --> 00:30:42,280
Michael: Well, and then often the
solution is to make sure that

570
00:30:42,280 --> 00:30:44,880
transactions can at least be retried,
right?

571
00:30:45,020 --> 00:30:46,720
Nikolay: Because they- Definitely
so, 100%.

572
00:30:47,700 --> 00:30:51,660
Of course, there are cases where
having deadlock is a serious

573
00:30:51,660 --> 00:30:55,940
problem, because if money is involved,
then it's a problem, because

574
00:30:56,400 --> 00:30:58,080
you can have some issues.

575
00:30:58,080 --> 00:30:59,280
And definitely, application...

576
00:31:01,120 --> 00:31:02,260
It's a right point.

577
00:31:02,260 --> 00:31:08,470
Maybe instead of trying to completely
avoid and prevent deadlocks,

578
00:31:08,470 --> 00:31:13,480
maybe it's better to design and,
okay, we have a couple of deadlocks

579
00:31:13,580 --> 00:31:14,080
among...

580
00:31:15,800 --> 00:31:19,020
We have a billion transactions
per day with 2 deadlocks, but

581
00:31:19,020 --> 00:31:26,020
we have retry logic, so nothing
is lost and no users are complaining.

582
00:31:27,840 --> 00:31:28,600
That's it.

583
00:31:28,940 --> 00:31:29,780
It's smart.

584
00:31:30,860 --> 00:31:32,580
So, yeah, I agree with this.

585
00:31:34,460 --> 00:31:34,960
Michael: Nice.

586
00:31:35,020 --> 00:31:37,060
Do you want to touch on advisory
locks quickly?

587
00:31:37,360 --> 00:31:38,140
Or not really?

588
00:31:38,260 --> 00:31:38,980
Nikolay: Oh, yes.

589
00:31:39,480 --> 00:31:40,680
We had a case recently.

590
00:31:40,760 --> 00:31:42,180
It was an interesting case.

591
00:31:42,800 --> 00:31:48,280
Obviously engineers were very smart
engineers and they also read

592
00:31:48,280 --> 00:31:50,240
documentation a lot and so on.

593
00:31:50,640 --> 00:31:51,140
Michael: Nice.

594
00:31:51,280 --> 00:31:55,360
Nikolay: And that like quite advanced
code is written and so

595
00:31:55,360 --> 00:31:58,580
on, but they had issues with heavy
lock contention.

596
00:31:59,640 --> 00:32:05,020
And turned out it was storms of
advisory lock acquisition wait

597
00:32:05,020 --> 00:32:11,600
events and we just talked through
this and my advice was quite

598
00:32:11,600 --> 00:32:12,100
simple.

599
00:32:12,740 --> 00:32:17,060
Let's get rid of advisory locks
because we have data in table,

600
00:32:17,860 --> 00:32:21,840
this data is organized as rows,
we can lock rows.

601
00:32:23,420 --> 00:32:27,220
And we talked about reasons why
we should lock, and it was like,

602
00:32:27,700 --> 00:32:31,920
there are reasons originally, but
when we just think, could we

603
00:32:31,920 --> 00:32:34,380
switch to regular row level locks?

604
00:32:34,760 --> 00:32:36,600
And the question was, yes.

605
00:32:36,600 --> 00:32:39,220
There's no big reason not to use
it.

606
00:32:39,380 --> 00:32:44,280
And once we switch, everything
becomes quite clear how to resolve

607
00:32:44,860 --> 00:32:45,900
contention completely.

608
00:32:45,960 --> 00:32:48,780
You just start, SELECT FOR UPDATE.

609
00:32:49,980 --> 00:32:53,480
Before you update or delete row,
you just SELECT FOR UPDATE and

610
00:32:53,480 --> 00:32:55,300
then SKIP LOCKED or NOWAIT.

611
00:32:55,760 --> 00:33:01,620
Plus retry, so if you want to fail
immediately and then retry,

612
00:33:01,620 --> 00:33:02,300
it's NOWAIT.

613
00:33:02,300 --> 00:33:07,580
If you want to, depending on application,
depending on logic

614
00:33:07,580 --> 00:33:11,940
and data and so on, sometimes you
can just take next batch of

615
00:33:11,940 --> 00:33:13,240
rows to process.

616
00:33:13,340 --> 00:33:14,740
In this case, SKIP LOCKED.

617
00:33:15,120 --> 00:33:18,720
If you must work on the same batch,
okay, come later.

618
00:33:19,400 --> 00:33:23,160
In this case, NOWAIT and just
retry.

619
00:33:23,680 --> 00:33:25,580
And that's it, it's an easy solution.

620
00:33:26,060 --> 00:33:30,540
Well, advisory locks can still be
helpful in some cases, but working

621
00:33:30,540 --> 00:33:34,640
with data in general, I think it's
some mechanism which feels

622
00:33:34,640 --> 00:33:39,480
like kind of extra and not needed.
Right, if you just need to process

623
00:33:39,480 --> 00:33:44,540
some data in many parallel
sessions, just work with row

624
00:33:44,540 --> 00:33:45,300
level locks.

625
00:33:45,860 --> 00:33:49,540
Michael: Yeah, I feel like it's
a sledgehammer and sometimes

626
00:33:50,140 --> 00:33:53,860
sometimes you need a sledgehammer,
right? Yeah, if you think about

627
00:33:53,860 --> 00:33:57,660
it, we have in the UK, we
keep our garden tools in a shed.

628
00:33:57,660 --> 00:34:01,400
Normally, not many people have a
sledgehammer in their shed, like

629
00:34:01,400 --> 00:34:03,780
you don't need it that often,
right?

630
00:34:04,080 --> 00:34:04,900
Nikolay: I agree.

631
00:34:07,300 --> 00:34:11,380
Michael: Yeah, well anyway, but the point is, every now and

632
00:34:11,380 --> 00:34:14,040
again if you're doing like a remodeling
and you want to get rid

633
00:34:14,040 --> 00:34:18,280
of a wall, maybe you need to bring
in the big guns and actually

634
00:34:18,280 --> 00:34:19,580
do something more heavy.

635
00:34:19,760 --> 00:34:23,300
Nikolay: I use this word, I apply
this word, maybe you remember,

636
00:34:23,360 --> 00:34:25,220
to materialized views in the past.

637
00:34:25,760 --> 00:34:30,300
This definitely feels like, just
you solve your problem with

638
00:34:30,300 --> 00:34:34,780
performance but it's so like massive
it always needs to be refreshed

639
00:34:34,780 --> 00:34:39,060
fully. So same feeling, like it's
a tool, it's reasonable

640
00:34:39,060 --> 00:34:44,620
in some cases, but if you overuse
it, you cannot do precise work

641
00:34:44,620 --> 00:34:49,080
like you lose some precision, right?
So maybe, maybe.

642
00:34:49,080 --> 00:34:52,960
Michael: Well, and going back to the original question

643
00:34:52,960 --> 00:34:55,840
is what can we do to avoid these
locks? And we were saying well

644
00:34:55,840 --> 00:34:59,180
actually what can we do to avoid
long waits? And what can we do

645
00:34:59,180 --> 00:35:05,080
to make locks lighter in general
or take less time or retry when

646
00:35:05,080 --> 00:35:05,580
needed.

647
00:35:06,260 --> 00:35:07,740
This is a perfect example.

648
00:35:08,120 --> 00:35:12,340
Is there a way of taking less extreme
locks or releasing them

649
00:35:12,340 --> 00:35:12,840
quicker?

650
00:35:12,900 --> 00:35:17,800
Or, you know, so it feels like
another example of how to minimize

651
00:35:17,860 --> 00:35:21,360
locking or at least shorten its impact.

652
00:35:22,900 --> 00:35:26,760
Nikolay: Yeah, so yeah short transactions don't hold locks too

653
00:35:26,760 --> 00:35:27,260
long.

654
00:35:28,040 --> 00:35:34,940
If you wait give up, give up sooner, don't wait too much because

655
00:35:35,320 --> 00:35:37,060
others can wait behind you.

656
00:35:38,000 --> 00:35:38,680
What else?

657
00:35:40,080 --> 00:35:42,420
Use skip locked or no wait.

658
00:35:42,520 --> 00:35:44,120
Select for update, skip locked, no wait.

659
00:35:44,120 --> 00:35:45,360
This is major mechanism.

660
00:35:45,700 --> 00:35:47,820
Finally MySQL even has it, right?

661
00:35:48,180 --> 00:35:51,280
So this is great to have it and use it

662
00:35:51,280 --> 00:35:54,020
Michael: and I guess the more obvious

663
00:35:54,240 --> 00:36:00,680
Nikolay: with only 1 comment for select for update and select

664
00:36:00,680 --> 00:36:08,040
for share select for share can lead to multixact SLRU issues.

665
00:36:08,420 --> 00:36:14,380
I barely remember already, didn't touch it for a while, but there

666
00:36:14,380 --> 00:36:17,780
is an interesting effect with select for update.

667
00:36:19,140 --> 00:36:22,620
If you have foreign keys, select for update, and you use sub-transactions,

668
00:36:22,920 --> 00:36:28,760
you can have issues with multixact IDs somehow indirectly and

669
00:36:29,280 --> 00:36:29,780
unexpectedly.

670
00:36:30,400 --> 00:36:36,920
So select for Update can feel like differently if you have sub-transactions.

671
00:36:37,200 --> 00:36:41,360
So, again, like I'm a big fan of sub-transactions, so to speak.

672
00:36:42,740 --> 00:36:46,260
I would not use Select or Update and sub-transactions at the

673
00:36:46,260 --> 00:36:47,000
same time.

674
00:36:47,220 --> 00:36:49,700
This can be a problem, as I remember.

675
00:36:50,500 --> 00:36:52,620
But in general, select for update is great.

676
00:36:52,800 --> 00:36:53,500
In general.

677
00:36:54,620 --> 00:36:58,040
Michael: And up to, I mean, you're talking mostly about very

678
00:36:58,040 --> 00:36:59,320
heavily loaded projects.

679
00:37:00,060 --> 00:37:04,760
And this problem of locking is a problem even, in my opinion,

680
00:37:04,760 --> 00:37:06,220
is a problem way before that.

681
00:37:06,220 --> 00:37:08,800
It can be a problem in much, much, much smaller projects.

682
00:37:08,800 --> 00:37:09,480
Nikolay: I agree.

683
00:37:09,780 --> 00:37:15,060
It can be, even in a small project, it can be very annoying to

684
00:37:15,060 --> 00:37:17,060
have contention on heavy locks.

685
00:37:17,500 --> 00:37:17,720
Right.

686
00:37:17,720 --> 00:37:19,820
Michael: Well and can cause downtime still.

687
00:37:19,940 --> 00:37:20,380
Nikolay: Mm-hmm.

688
00:37:20,380 --> 00:37:20,860
Yeah.

689
00:37:20,860 --> 00:37:21,360
100%.

690
00:37:21,860 --> 00:37:22,360
Michael: Yeah.

691
00:37:23,440 --> 00:37:25,120
So yeah I think that's great.

692
00:37:25,520 --> 00:37:27,840
Anything else you wanted to make sure we covered?

693
00:37:28,200 --> 00:37:29,820
Nikolay: No, just that's it.

694
00:37:29,820 --> 00:37:34,360
Like no long transactions, no long waits and you'll be fine.

695
00:37:35,140 --> 00:37:35,640
Michael: Yeah.

696
00:37:36,300 --> 00:37:39,780
Actually I had 1, I had 1 more but I think it's almost at the

697
00:37:39,780 --> 00:37:41,180
risk of being too obvious.

698
00:37:43,040 --> 00:37:46,420
I think we had a whole episode actually on 0 downtime migrations,

699
00:37:46,640 --> 00:37:51,620
but there are schema changes you can make in different ways to

700
00:37:51,620 --> 00:37:54,260
avoid the heaviest of lock types.

701
00:37:54,520 --> 00:37:57,880
As simple as, you know, the create  
index concurrently type things,  

702
00:37:57,880 --> 00:38:00,680
or we had a whole episode on  
pg_squeeze and things.  

703
00:38:01,400 --> 00:38:07,460
I feel like that's another topic  
of avoiding a certain type of  

704
00:38:07,460 --> 00:38:09,600
lock in favor of different types  
of lock.  

705
00:38:09,640 --> 00:38:19,160
Nikolay: You can avoid lock contention  
issues doing DDL but you  

706
00:38:19,160 --> 00:38:22,100
need to sacrifice transactional  
capability.  

707
00:38:23,300 --> 00:38:23,860
Michael: Yes, yes.  

708
00:38:24,720 --> 00:38:26,360
Nikolay: This is the key, unfortunately.  

709
00:38:26,960 --> 00:38:35,980
So you cannot have transactional  
and like atomic steps and 0  

710
00:38:35,980 --> 00:38:38,760
risk of having a lock contention  
unfortunately at the same time  

711
00:38:39,440 --> 00:38:42,680
so yeah and creating this concurrently  
is a great example here  

712
00:38:42,720 --> 00:38:46,500
or attaching detaching partitions  
concurrently and so on yeah  

713
00:38:47,080 --> 00:38:47,940
good okay